2023-11-9-BinarySearchTree
Implementations and analysis of binary search tree.
Binary Search Tree
Basic definition
In the passage about binary trees, we want to improve the runtime for the search method. In order to find a node without traversing through the whole tree, we can make a sorted tree.
A binary search tree is defined as below:
- \( T = \lbrace \text{key}, T_{\text{left}}, T_{\text{right}} \rbrace \)
- All nodes to the left of a root have smaller keys than the root, and all nodes to the right of a root have greater keys.

1 |
|
BST implementations
BST find
Based on our definition of a binary search tree, we don’t have to traverse the whole tree while finding a key:

1 | template <typename K, typename V> |
Since at every level, we make one comparison with the node and look for the subtree that potentially contains the query, the runtime of _find is O(h), in which h denotes the height of the tree.
BST insert
Since the _find function returns the **reference to the pointer, **we can first use the _find function to locate the position we want to insert and then directly create a new node.

1 | template <typename K, typename V> |
Since insert() uses _find() and the additional step is only O(1), the running time for insert() is also O(h).
BST remove
Similarly, we can use _find(key) to find the node we are going to remove. However, we would want to preserve the feature of the BST after removing a node. There are three circumstances to consider.
No child remove
The simplest case is that the node we are going to remove has no child.

In this case, we can simply use _find() and deallocate the pointer to the node.
One-child remove
One-child remove is like the remove function of a linked list.

First, we allocate the place to remove with _find(), and preserve the node to be removed with a pointer. Using the reference to the pointer returned by _find(), we can replace the removed node with its only child.
Two-children remove
There are more things to consider when the node to be removed has two children: after the node is removed, which node are we going to fill in the blank so that we can still preserve the feature of the BST? (all the left children have keys smaller than the key in the subroot and all right children have keys larger than the key in the subroot)

1 | template <class K, class V> |
As shown in the graph, if we remove the node with key 17, the possible keys that fit in the blank location are values from 10 to 19. Additionally, we would want to replace the removed node with nodes existing in the tree instead of constructing a new node, so the two available nodes for replacing node(17) is node(10) and node(19). Generally, they are the in-order predecessor (the largest key smaller than the subroot) and in-order successor (the smallest key larger than the subroot).

In summary, for two-children removal, after finding the mode to remove, we find either the IOP or the IOS with respect to the node. We then swap the subroot (removal node) with the IOP/IOS, which still preserves the feature of BST. At last, we call remove on the removal node again, but this time, it is either a zero-child remove or a one-child remove.
As for remove(), we can see that we first use _find(), which is O(h), and we also have to find IOP/IOS, which is another O(h). The swapping is only O(1). In summary, the running time for remove() is O(h).
1 | template <class K, class V> |
BST running time analysis
| implementation | running time |
| find | O(h) |
| insert | O(h) |
| remove | O(h) |
| traverse | O(n) |
Now, most of the implementations run in O(h) time, but what exactly is O(h)?
To answer this question, we have to relate h, the height of a BST, to n, the number of nodes in a BST.

We can see that for a BST with n = 5, the height varies by the sequence in which the nodes are inserted. More generally, for a given n, h has an upper bound of O(n), when the nodes are arranged like a singly linked list, and a lower bound of O(logn), when almost every level at height i is fully loaded with 2^i nodes.

Although both forms are valid BST, there is no doubt that the form that guarantees the running time of O(logn) is preferable.
Then how can we modify our tree to guarantee the ideal running time?
2023-11-9-BinarySearchTree
http://gong208.github.io/2023/11/09/2023-11-9-BinarySearchTree/