

Page  4/24  Date  03.05.2017  Size  186.06 Kb. 
  Similar problem with keys of different lengths
 What if a key is a prefix of another key that is already present?
 Data is not sorted
 If we want sorted data, we would need to extract all of the data from the tree and sort it
 May do b comparisons (of entire key) to find a key
 If a key is long and comparisons are costly, this can be inefficient
Radix Search Tries  Let's address the last problem
 How to reduce the number of comparisons (of the entire key)?
 We'll modify our tree slightly
 All keys will be in exterior nodes at leaves in the tree
 Interior nodes will not contain keys, but will just direct us down the tree toward the leaves
 This gives us a Radix Search Trie
 Trie is from reTRIEval (see text)
Radix Search Tries  Benefit of simple Radix Search Tries
 Fewer comparisons of entire key than DSTs
 Drawbacks
 The tree will have more overall nodes than a DST
 Each external node with a key needs a unique bitpath to it
 Internal and External nodes are of different types
 Insert is somewhat more complicated
 Some insert situations require new internal as well as external nodes to be created
 We need to create new internal nodes to ensure that each object has a unique path to it
 See example
Radix Search Tries  Runtime is similar to DST
 Since tree is binary, average tree height for N keys is O(log2N)
 Worst case path length is again b
 However, now at worst b bit comparisons are required
 We only need one comparison of the entire key
 So, again, the benefit to RST is that the entire key must be compared only one time
Improving Tries  How can we improve tries?
 Can we reduce the heights somehow?
 Average height now is O(log2N)
 Can we simplify the data structures needed (so different node types are not required)?
 Can we simplify the Insert?
 We will examine a couple of variations that improve over the basic Trie
Multiway Tries  RST that we have seen considers the key 1 bit at a time
 This causes a maximum height in the tree of up to b, and gives an average height of O(log2N) for N keys
 If we considered m bits at a time, then we could reduce the worst and average heights
 Maximum height is now b/m since m bits are consumed at each level
 Let M = 2m
 Average height for N keys is now O(logMN), since we branch in M directions at each node
Multiway Tries  Let's look at an example
 Consider 220 (1 meg) keys of length 32 bits
 Simple RST will have
 Worst Case height = 32
 Ave Case height = O(log2[220]) 20
 Multiway Trie using 8 bits would have
 Worst Case height = 32/8 = 4
 Ave Case height = O(log256[220]) 2.5
 This is a considerable improvement
 Let's look at an example using character data
 We will consider a single character (8 bits) at each level
 Go over on board
Multiway Tries  So what is the catch (or cost)?
 Memory
 Multiway Tries use considerably more memory than simple tries
 Each node in the multiway trie contains M pointers/references
 In example with ASCII characters, M = 256
 Many of these are unused, especially
 During common paths (prefixes), where there is no branching (or "oneway" branching)
 Ex: through and throughout
 At the lower levels of the tree, where previous branching has likely separated keys already
Patricia Trees  Idea:
 Save memory and height by eliminating all nodes in which no branching occurs
 Note now that since some nodes are missing, level i does not necessarily correspond to bit (or character) i
 So to do a search we need to store in each node which bit (character) the node corresponds to
 However, the savings from the removed nodes is still considerable
Patricia Trees  Also, keep in mind that a key can match at every character that is checked, but still not be actually in the tree
 Example for tree on board:
 If we search for TWEEDLE, we will only compare the T**E**E
 However, the next node after the E is at index 8. This is past the end of TWEEDLE so it is not found
 Runtime?
Patricia Trees  So Patricia trees
 Reduce tree height by removing "oneway" branching nodes
 Text also shows how "upwards" links enable us to use only one node type
 TEXT VERSION makes the nodes homogeneous by storing keys within the nodes and using "upwards" links from the leaves to access the nodes
 So every node contains a valid key. However, the keys are not checked on the way "down" the tree – only after an upwards link is followed
 Thus Patricia saves memory but makes the insert rather tricky, since new nodes may have to be inserted between other nodes
 Even with Patricia trees, there are many unused pointers/references, especially after the first few levels
 Continuing our example of character data, each node has 256 pointers in it
 Many of these will never be used in most applications
 Consider words in English language
 Not every permutation of letters forms a legal word
 Especially after the first or second level, few pointers in a node will actually be used
 How can we eliminate many of these references?
Share with your friends: 
The database is protected by copyright ©sckool.org 2020
send message

