Binary Trees
All nodes have at most 2 children
class BinaryNode {
public:
// ...
private:
int element;
BinaryNode *left;
BinaryNode *right;
};
 |
 |
Binary Trees: diagram details
In reality, any child not shown is really a NULL pointer, as shown here; but these are generally omitted from the diagrams
 |
 |
Binary Search Trees (BST)
- Each node has a key value that can be compared
- Binary search tree property:
- For a given node, which we will call the root...
- Every node in left subtree has a key whose value is less than the root's key value, AND
- Every node in right subtree has a key whose value is greater than the root's key value
- We assume that duplicate values are not allowed
BST: Example

BST: Counter-example

The difference
- Both binary trees and binary search trees have zero, one, or two children per node
- But a binary search tree is sorted
- However, most people, when they say "binary tree", really mean a "binary search tree"
- Note that we assume that we can NOT have duplicate elements in a BST
BST: find
- Basic idea:
- Compare value to be found to key of the root of the tree
- If they are equal, then done
- If not equal, recurse depending on which half of tree the value should be in if it is in tree
- If you hit a
NULL
pointer, then you have "run off" the bottom of the tree, and the value is not in the tree
BST: Find

- Trying to find 3 will go, from the root, left → left → right
- Trying to find 6 will go, from the root, right → left → left
- At that point, we have "run off" the bottom of the tree (via 7's left-child pointer, which is
NULL
), and thus the value is not in the tree
BST: find
(no external source code)
BinaryNode * BST::find(int x, BinaryNode *curNode) {
// handle case where a NULL pointer could be passed
// curNode->right or curNode->left might be NULL
if (curNode == NULL) // we've "run" off the bottom
return NULL;
else if (x < curNode->element)
return find(x, curNode->left); // search left
else if (x > curNode->element)
return find(x, curNode->right); //search right
else
return curNode; // matched
}
BST: insert
Do a find, and when we reach a NULL
pointer, create a new node there
(no external source code)
void BST::insert(int x, BinaryNode * & curNode) {
if (curNode==NULL)
curNode = new BinaryNode(x,NULL,NULL);
else if (x < curNode->element)
insert(x, curNode->left);
else if (x > curNode->element)
insert(x, curNode->right);
else
; // duplicate... do nothing
}
BST: findMax(), findMin()
To find the maximum element in BST, traverse down the right subtree links
Similarly down the left subttree links for findMin()
BST: remove
- Disrupts the tree structure
- Basic idea:
- Find node to be removed
- Three cases:
- Node has no children
- Node has one child
- Node has two children
BST: remove: no children
- Just remove the node (reclaiming memory), adjusting the parent pointer to
NULL
- In this case, 9's left child link is changed to
NULL
 |
→ |
 |
BST: remove: one child
- Adjust pointer of parent to point at child, and reclaim memory
- In this case, 4's left pointer is changed to point to 3
 |
→ |
 |
BST: remove: two children
- Replace node with successor, then remove successor from tree
- This requires running
findMin()
on the right sub-tree, and then removing that element
- In this case, 5 is replaced by 7 (and the node that had 7 is removed)
 |
→ |
 |
BST Height
- n-node BST: Worst case depth is n-1
- This can easily happen if the data to be inserted is already sorted
- Claim: The maximum number nodes in a binary tree of height h is 2h+1-1
|
 |
Proof by Induction on h
- Claim: the maximum number nodes in a binary tree of height h is 2h+1-1
- For h=0, tree has one node, which yields 20+1-1 = 1 nodes
- Assume the claim is true for any tree of height h
- This would mean n ≤ 2h+2-1 for a tree of height h+1
- Any tree of height h+1 has at most 2 subtrees of height h; each subtree has 2h+1-1 nodes; add one more for the root
- Thus, our new tree of height h+1 has:
- 2(2h+1-1)+1 = 2h+2-1 nodes
- If we put h+1 into our inductive hypothesis (instead of h), we get the same value; thus, it is proven
Relationship between h and n
- Given n nodes and height h, then by the claim (proven on the previous slide): n ≤ 2h+1-1
- We can simplify:
- n+1 ≤ 2h+1
- log2(n+1) ≤ log2(2h+1)
- log2(n+1) ≤ h+1
- Thus h ≥ log2(n+1)-1
- This means that the "shortest" tree we can achieve for n nodes is proportional to the base-2 log of the height
Perfect Binary Tree

- All leaves have the same depth
- And all nodes have zero or two children, but not one
- Number of leaves: 2h
- Number of nodes: 1 + 2 + 22 + 23 + ... + 2h = 2h+1-1
- Problem: a perfect binary tree can only hold n values where n = 2h+1-1
- So you can't have, say, 5 values in a perfect binary tree!
- A good AVL tree animation tool is here
- A mirror that also contains the animation tool is here
- We'll be using this website throughout this slide set
AVL Trees
- Motivation: to guarantee Θ(log n) running time on find, insert, and remove
- Idea: Keep tree balanced after each operation
- Solution: AVL trees
- Named after the inventors, Adelson-Velskii and Landis
AVL Tree Structure Property
For every node in the tree, the height of the left and right sub-trees differs at most by 1
AVL Tree

AVL balance factor
- Each node of a BST holds:
- The data
- Left and right child pointers
- Possibly a parent node pointer
- An AVL tree also holds a balance factor
- The height of the right subtree minus the height of the left subtree
- We could have it be left minus right, but the convention in this class is to always have it be right minus left
- Can be computed on the fly, as well, but that's VERY slow, and defeats the purpose of using AVL trees for speed
AVL tree balance
- "Balanced" trees
- 0 means balanced
- 1 means the right subtree is one longer than the left subtree
- -1 means the left subtree is one longer than the right subtree
- "Unbalanced" trees
- A balance factor of -2 or 2
- We'll fix the tree
- Will we ever hit -3 or 3?
AVL Tree, with balance factors
By definition, a BST is a valid AVL tree if the balance factor for EVERY node is -1, 0, or 1
Not an AVL Tree
Not balanced: height difference greater than 1
AVL Trees: find, insert
- find: same as BST find
- insert: same as BST insert, except might need to "fix" the AVL tree after the insert (via rotations)
- Runtime analysis:
- Θ(d), where d is the depth of the node being found/inserted
- What is the maximum height of an n-node AVL tree?
AVL tree operations
- Perform the operation (insert, delete)
- Move back up to the root, updating the balance factors
- Why only those nodes?
- Because those are the only ones who have had their subtrees altered
- Do tree rotations where the balance factors are 2 or -2
How many times to "fix" the tree?
- Any single insert will only modify the balance factor by one
- So we fix the lowest off-balance nodes
- Then everything above it is then balanced
- This means that we will have to only look at the bottom two unbalanced nodes
AVL insert
- Let x be the deepest node where imbalance occurs
- Four cases where the insert happened:
- In the left subtree of the left child of x
- In the right subtree of the left child of x
- In the left subtree of the right child of x
- In the right subtree of the right child of x
- Cases 1 & 4: perform a single rotation
- Cases 2 & 3: perform a double rotation
AVL single right rotation
 |
→ |
 |
- The node just inserted was node 1 (blue)
- The lowest node, immediately after the insert, with an imbalance is node 3 (red)
- Because node 1 is in the "left subtree of the left child" of node 3, this means we need to perform a single right rotation
AVL single left rotation
 |
→ |
 |
- The node just inserted was node 3 (red)
- The lowest node, immediately after the insert, with an imbalance is node 1 (blue)
- Because node 3 is in the "right subtree of the right child" of node 1, this means we need to perform a single left rotation
A side-effect of tree rotations
 |
→ |
 |
- This is the single right rotation
- Note that at least one node moves "up" (depth decreases)
- In this case, nodes 1 and 2 both move up
- And at least one node moves "down" (depth increases)
- In this case, node 3 moves down
- Similarly for a left rotation
AVL single right rotation: before & after
 |
→ |
 |
- Node 1 (red) is what is being inserted
- The lowest node with an imbalance is node 5 (balance: -2)
- Because the insert was in 5's "left subtree of the left child", we perform a single right rotation on 5 (and its left child, 3)
AVL single right rotation: before & after
 |
→ |
 |
- From the previous slide, we know we perform a single right rotation on 5 (and its left child, 3)
- Thus, the two blue nodes are the 'pivots' of the rotation
- Note that node 4 changes parents (from 3's right to 5's left)
AVL single right rotation: general case
 |
→ |
 |
X<b<Y<a<Z
The insert is into sub-tree X, increasing its height to h+1
Notice how sub-tree Y changes parent
Right and left rotations
Note that the trees shown are not necessarily AVL trees, but the rotations are correct

Cases 2 & 3: attempt a single rotation
 |
→ |
 |
X<b<Y<a<Z
The insert is into sub-tree Y, increasing its height to h+1
Failure! b's left subtree has height h+1; right is h+3
Double rotation
- Node 5 (red) was just inserted
- The lowest node with an imbalance is node 8 (balance factor: -2)
- When discussing these rotations, we will call this the "parent" node
- Because the insert happened in 8's
"right subtree of the left child", we perform a double rotation
- This consists of a single left rotation on the "child" (node 4), followed by a single right rotation on the "parent" (node 8)
|
 |
- Note that the two rotations are in different directions!
|
Double rotation, step 1
 |
→ |
 |
This is the single left rotation on the "child". The red node is what was inserted; the blue nodes are the 'pivots' of this single left rotation.
Double rotation, step 2
 |
→ |
 |
This is the single right rotation on the "parent". The red node is what was inserted; the green nodes are the 'pivots' of this single right rotation.
AVL double rotation: before & after
 |
→ |
 |
The red node is what was inserted
AVL double rotation: general case
 |
→ |  |
| | W<b<X<c<Y<a<Z | | The insert happens into X |
Notice sub-trees X and Y change parents |
|
Ack! Terminology
- Some people will state a 'double left rotation'
- But is that a left-right? Or a right-left?
- We'll call them 'double left-right' and 'double right-left', which specifies the order to perform the operation on the child and then the parent
AVL insert, again
- Let x be the deepest node where imbalance occurs
- Four cases where the insert happened:
- In the left subtree of the left child of x
- In the right subtree of the left child of x
- In the left subtree of the right child of x
- In the right subtree of the right child of x
- Cases 1 & 4: perform a single rotation
- Cases 2 & 3: perform a double rotation
Algorithmic determination of rotation
 |
 |
 |
 |
left-left case | right-right case | left-right case | right-left case |
| | | |
- Given the lowest unbalanced node, and the child in the direction of the insert, compare the balance factors
- -2/+1 means a double left-right, +2/+1 means a singe left, etc.
AVL Tree: Runtime Analysis
- Find: Θ(log n) time: height of tree is always Θ(log n)
- Insert: Θ(log n) time: find() takes Θ(log n), then may have to visit every node on the path back up to root to perform up to 2 single rotations
- Remove: Θ(log n): left as an exercise
- Print: Θ(n): no matter the data structure, it will still take n steps to print n elements