These notes are intended for use by students in cs1501 at the University of Pittsburgh and no one else



Download 186.06 Kb.
Page4/24
Date03.05.2017
Size186.06 Kb.
1   2   3   4   5   6   7   8   9   ...   24

Digital Search Trees

    • Similar problem with keys of different lengths
      • What if a key is a prefix of another key that is already present?
    • Data is not sorted
      • If we want sorted data, we would need to extract all of the data from the tree and sort it
    • May do b comparisons (of entire key) to find a key
      • If a key is long and comparisons are costly, this can be inefficient

Radix Search Tries

  • Let's address the last problem
    • How to reduce the number of comparisons (of the entire key)?
    • We'll modify our tree slightly
      • All keys will be in exterior nodes at leaves in the tree
      • Interior nodes will not contain keys, but will just direct us down the tree toward the leaves
    • This gives us a Radix Search Trie
      • Trie is from reTRIEval (see text)

Radix Search Tries

  • Benefit of simple Radix Search Tries
    • Fewer comparisons of entire key than DSTs
  • Drawbacks
    • The tree will have more overall nodes than a DST
      • Each external node with a key needs a unique bit-path to it
    • Internal and External nodes are of different types
    • Insert is somewhat more complicated
      • Some insert situations require new internal as well as external nodes to be created
        • We need to create new internal nodes to ensure that each object has a unique path to it
        • See example

Radix Search Tries

  • Run-time is similar to DST
    • Since tree is binary, average tree height for N keys is O(log2N)
    • Worst case path length is again b
      • However, now at worst b bit comparisons are required
      • We only need one comparison of the entire key
    • So, again, the benefit to RST is that the entire key must be compared only one time

Improving Tries

  • How can we improve tries?
    • Can we reduce the heights somehow?
      • Average height now is O(log2N)
    • Can we simplify the data structures needed (so different node types are not required)?
    • Can we simplify the Insert?
  • We will examine a couple of variations that improve over the basic Trie

Multiway Tries

  • RST that we have seen considers the key 1 bit at a time
    • This causes a maximum height in the tree of up to b, and gives an average height of O(log2N) for N keys
    • If we considered m bits at a time, then we could reduce the worst and average heights
      • Maximum height is now b/m since m bits are consumed at each level
      • Let M = 2m
        • Average height for N keys is now O(logMN), since we branch in M directions at each node

Multiway Tries

    • Let's look at an example
      • Consider 220 (1 meg) keys of length 32 bits
        • Simple RST will have
          • Worst Case height = 32
          • Ave Case height = O(log2[220])  20
        • Multiway Trie using 8 bits would have
          • Worst Case height = 32/8 = 4
          • Ave Case height = O(log256[220])  2.5
    • This is a considerable improvement
    • Let's look at an example using character data
      • We will consider a single character (8 bits) at each level
      • Go over on board

Multiway Tries

    • So what is the catch (or cost)?
      • Memory
        • Multiway Tries use considerably more memory than simple tries
      • Each node in the multiway trie contains M pointers/references
        • In example with ASCII characters, M = 256
      • Many of these are unused, especially
        • During common paths (prefixes), where there is no branching (or "one-way" branching)
          • Ex: through and throughout
        • At the lower levels of the tree, where previous branching has likely separated keys already

Patricia Trees

  • Idea:
    • Save memory and height by eliminating all nodes in which no branching occurs
      • See example on board
    • Note now that since some nodes are missing, level i does not necessarily correspond to bit (or character) i
      • So to do a search we need to store in each node which bit (character) the node corresponds to
        • However, the savings from the removed nodes is still considerable

Patricia Trees

    • Also, keep in mind that a key can match at every character that is checked, but still not be actually in the tree
      • Example for tree on board:
        • If we search for TWEEDLE, we will only compare the T**E**E
        • However, the next node after the E is at index 8. This is past the end of TWEEDLE so it is not found
    • Run-time?

Patricia Trees

    • So Patricia trees
      • Reduce tree height by removing "one-way" branching nodes
      • Text also shows how "upwards" links enable us to use only one node type
        • TEXT VERSION makes the nodes homogeneous by storing keys within the nodes and using "upwards" links from the leaves to access the nodes
          • So every node contains a valid key. However, the keys are not checked on the way "down" the tree – only after an upwards link is followed
      • Thus Patricia saves memory but makes the insert rather tricky, since new nodes may have to be inserted between other nodes
        • See text

de la Briandais Trees

  • Even with Patricia trees, there are many unused pointers/references, especially after the first few levels
    • Continuing our example of character data, each node has 256 pointers in it
      • Many of these will never be used in most applications
      • Consider words in English language
        • Not every permutation of letters forms a legal word
        • Especially after the first or second level, few pointers in a node will actually be used
    • How can we eliminate many of these references?

Download 186.06 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   24




The database is protected by copyright ©sckool.org 2020
send message

    Main page