|
|
Page | 4/24 | Date | 03.05.2017 | Size | 186.06 Kb. |
| - Similar problem with keys of different lengths
- What if a key is a prefix of another key that is already present?
- Data is not sorted
- If we want sorted data, we would need to extract all of the data from the tree and sort it
- May do b comparisons (of entire key) to find a key
- If a key is long and comparisons are costly, this can be inefficient
Radix Search Tries - Let's address the last problem
- How to reduce the number of comparisons (of the entire key)?
- We'll modify our tree slightly
- All keys will be in exterior nodes at leaves in the tree
- Interior nodes will not contain keys, but will just direct us down the tree toward the leaves
- This gives us a Radix Search Trie
- Trie is from reTRIEval (see text)
Radix Search Tries - Benefit of simple Radix Search Tries
- Fewer comparisons of entire key than DSTs
- Drawbacks
- The tree will have more overall nodes than a DST
- Each external node with a key needs a unique bit-path to it
- Internal and External nodes are of different types
- Insert is somewhat more complicated
- Some insert situations require new internal as well as external nodes to be created
- We need to create new internal nodes to ensure that each object has a unique path to it
- See example
Radix Search Tries - Run-time is similar to DST
- Since tree is binary, average tree height for N keys is O(log2N)
- Worst case path length is again b
- However, now at worst b bit comparisons are required
- We only need one comparison of the entire key
- So, again, the benefit to RST is that the entire key must be compared only one time
Improving Tries - How can we improve tries?
- Can we reduce the heights somehow?
- Average height now is O(log2N)
- Can we simplify the data structures needed (so different node types are not required)?
- Can we simplify the Insert?
- We will examine a couple of variations that improve over the basic Trie
Multiway Tries - RST that we have seen considers the key 1 bit at a time
- This causes a maximum height in the tree of up to b, and gives an average height of O(log2N) for N keys
- If we considered m bits at a time, then we could reduce the worst and average heights
- Maximum height is now b/m since m bits are consumed at each level
- Let M = 2m
- Average height for N keys is now O(logMN), since we branch in M directions at each node
Multiway Tries - Let's look at an example
- Consider 220 (1 meg) keys of length 32 bits
- Simple RST will have
- Worst Case height = 32
- Ave Case height = O(log2[220]) 20
- Multiway Trie using 8 bits would have
- Worst Case height = 32/8 = 4
- Ave Case height = O(log256[220]) 2.5
- This is a considerable improvement
- Let's look at an example using character data
- We will consider a single character (8 bits) at each level
- Go over on board
Multiway Tries - So what is the catch (or cost)?
- Memory
- Multiway Tries use considerably more memory than simple tries
- Each node in the multiway trie contains M pointers/references
- In example with ASCII characters, M = 256
- Many of these are unused, especially
- During common paths (prefixes), where there is no branching (or "one-way" branching)
- Ex: through and throughout
- At the lower levels of the tree, where previous branching has likely separated keys already
Patricia Trees - Idea:
- Save memory and height by eliminating all nodes in which no branching occurs
- Note now that since some nodes are missing, level i does not necessarily correspond to bit (or character) i
- So to do a search we need to store in each node which bit (character) the node corresponds to
- However, the savings from the removed nodes is still considerable
Patricia Trees - Also, keep in mind that a key can match at every character that is checked, but still not be actually in the tree
- Example for tree on board:
- If we search for TWEEDLE, we will only compare the T**E**E
- However, the next node after the E is at index 8. This is past the end of TWEEDLE so it is not found
- Run-time?
Patricia Trees - So Patricia trees
- Reduce tree height by removing "one-way" branching nodes
- Text also shows how "upwards" links enable us to use only one node type
- TEXT VERSION makes the nodes homogeneous by storing keys within the nodes and using "upwards" links from the leaves to access the nodes
- So every node contains a valid key. However, the keys are not checked on the way "down" the tree – only after an upwards link is followed
- Thus Patricia saves memory but makes the insert rather tricky, since new nodes may have to be inserted between other nodes
- Even with Patricia trees, there are many unused pointers/references, especially after the first few levels
- Continuing our example of character data, each node has 256 pointers in it
- Many of these will never be used in most applications
- Consider words in English language
- Not every permutation of letters forms a legal word
- Especially after the first or second level, few pointers in a node will actually be used
- How can we eliminate many of these references?
Share with your friends: |
The database is protected by copyright ©sckool.org 2020
send message
|
|