Subtitles section Play video
In our previous lesson, we talked about binary trees in general. Now, in this lesson we are
going to talk about binary search tree, a special kind of binary tree which is an efficient
structure to organize data for quick search as well as quick update. But before I start
talking about binary search tree, I want you to think of a problem. What data structure
will you use to store a modifiable collection? So, lets say you have a collection and it
can be a collection of any data type, records in the collection can be of any type. Now,
you want to store this collection in computers memory in some structure and then you want
to be able to quickly search for a record in the collection and you also want to be
able to modify the collection. You want to be able to insert an element in the collection
or remove an element from the collection. So, what data structure will you use? Well,
you can use an array or a linked list. These are two well known data structure in which
we can store a collection. Now, what will be running time of these operations - search,
insertion or removal, If we will use an array or a linked list. Lets first talk about arrays
and for sale of simplicity, lets say we want to store integers. To store a modifiable list
or collection of integers, we can create a large enough array and we can store the records
in some part of the array. We can keep the end of the list marked. In this array that
I am showing here, we have integers from 0 till 3. We have records from 0 till 3 and
rest of the array is available space. Now to search some X in the collection, we will
have to scan the array from index 0 till end and in worst case, we may have to look at
all the elements in the list. If n is the number of elements in the list, time taken
will be proportional to n or in other words, we can say that time complexity will be big-oh
of n. Ok, now what will be the cost of insertion. Lets say we want to insert number 5 in this
list. So, if there is some available space, all the cells in yellow are available, we
can add one more cell by incrementing this marker 'end' and fill in the integer to be
added. Time taken for this operation will be constant. Running time will not depend
upon number of elements in the collection. So, we can say that time complexity will be
big-oh of 1. Ok, now what about removal. Lets say we want to remove 1 from the collection.
What we'll have to do is, we'll have to shift all records to the right of one by one position
to the left and then we can decrement end. The cost of removal once again will be big-oh
of n. In worst case, we will have to shift n-1 elements. Here, the cost of insertion
will be big-oh of 1, if the array will have some available space. So, the array has to
be large enough. If the array gets filled, what we can do is, we can create a new larger
array, typically we create an array twice the size of the filled up array. So, we can
create a new larger array and then we can copy the content of the filled up array into
this new larger array. The copy operation will cost us big-oh of n. We have discussed
this idea of dynamic array quite a bit in our previous lessons. So, insertion will be
big-oh of 1 if array is not filled up and it will be big-oh of n if array is filled
up. For now, lets just assume that the array will always be large enough. Lets now discuss
the cost of these operations if we will use a linked list. If we would use a linked list,
I have drawn a linked list of integers here, data type can be anything, the cost of search
operation once again will be big-oh of n where n is number of records in the collection or
number of nodes in the linked list. To search, in worst case, we will have to traverse the
whole list. We will have to look at all the nodes. The cost of insertion in a linked list
is big-oh of 1 at head and its big-oh of n at tail. We can choose to insert at head to
keep the cost low. So, running time of insertion, we can say is big-oh of 1 or in other words,
we will take constant time. Removal once again will be big-oh of n. We will first have to
traverse the linked list and search the record and in worst case, we may have to look at
all the nodes. Ok, so this is the cost of operations if we are going to use array or
linked list. Insertion definitely is fast. But, how good is big-oh of n for an operation
like search. What do you think? if we are searching for a record X, then in the worst
case, we will have to compare this record X with all the n records in the collection.
Lets say, our machine can perform a million comparisons in one second. So, we can say
that machine can perform 10 the power 6 comparisons in one second. So, cost of one comparison
will be 10 to the power -6 second. Machines in today's world deal with really large data.
Its very much possible for real world data to have 100 million records or billion records.
A lot of countries in this world have population more than 100 million. 2 countries have more
than a billion people living in them. If we will have data about all the people living
in a country, then it can easily be 100 million records. Ok, so if we are saying that the
cost of 1 comparison is 10 the power -6 second. If n will be 100 million, time taken will
be 100 seconds. 100 seconds for a search is not reasonable and search may be a frequently
performed operation. Can we do something better? Can we do better than big-oh of n. Well, in
an array, we can perform binary search if its sorted and the running time of binary
search is big-oh of log n which is the best running time to have. I have drawn this array
of integers here. Records in the array are sorted. Here the data type is integer. For
some other data type, for some complex data type, we should be able to sort the collection
based on some property or some key of the records. We should be able to compare the
keys of records and the comparison logic will be different for different data types. For
a collection of strings, for example, we may want to have the records sorted in dictionary
or lexicographical order. So, we will compare and see which string will come first in dictionary
order. Now this is the requirement that we have for binary search. The data structure
should be an array and the records must be sorted. Ok, so the cost of search operation
can be minimized if we will use a sorted array. But, in insertion or removal, we will have
to make sure that the array is sorted afterwards. In this array, if I want to insert number
5 at this stage, i can't simply put 5 at index 6. what I'll have to do is, I'll first have
to find the position at which I can insert 5 in the sorted list. We can find the position
in order of log n time using binary search. We can perform a binary search to find the
first integer greater than 5 in the list. So, we can find the position quickly. In this
case, its index 2. But then, we will have to shift all the records starting this position
one position to the right. And now, I can insert 5. So, even though we can find the
position at which a record should be inserted quickly in big-oh of log n, this shifting
in worst case will cost us big-oh of n. So, the running time overall for an insertion
will be big-oh of n and similarly, the cost of removal will also be big-oh of n. We will
have to shift some records. Ok, so when we are using sorted array, cost of search operation
is minimized. In binary search for n records, we will have at max log n to the base 2 comparisons.
So, if we can perform million comparisons in a second, then for n equal 2 to the power
31 which is greater than 2 billion, we are going to take only 31 micro-seconds. log of
2 to the power 31 to base 2 will be 31. Ok, we are fine with search now. we will be good
for any practical value of n. But what about insertion and removal. They are still big-oh
of n. Can we do something better here? Well, if we will use this data structure called
binary search tree, I am writing it in short - BST for binary search tree, then the cost
of all these 3 operations can be big-oh of log n in average case. The cost of all the
operations will be big-oh of n in worst case. But we can avoid the worst case by making
sure that the tree is always balanced. We had talked about balanced binary tree in our
previous lesson. Binary search tree is only a special kind of binary tree. To make sure
that the cost of these operations is always big-oh of log n, we should keep the binary
search tree balanced. We'll talk about this in detail later. Let's first see what a binary
search tree is and how cost of these operations is minimized when we use a binary search tree.
Binary search tree is a binary tree in which for each node, value of all the nodes in left
sub-tree is lesser and value of all the nodes in right sub-tree is greater. I have drawn
binary tree as a recursive structure here. As we know, in a binary tree, each node can
have at most 2 children. We can call one of the children left child. If we will look at
the tree as recursive structure, left child will be the root of left sub-tree and similarly,
right child will be the root of right sub-tree. Now, for a binary tree to be called binary
search tree, value of all the nodes in left sub-tree must be lesser or we can say lesser
or equal to handle duplicates and the value of all the nodes in right sub-tree must be
greater and this must be true for all the nodes. So, in this recursive structure here,
both left and right sub-trees must also be binary search trees. I'll draw a binary search
tree of integers. Now, I have drawn a binary search tree of integers here. Lets see whether
this property that for each node value of all the nodes in left subtree is lesser or
equal and the value of all the nodes in right sub-tree must be greater is true or not. Lets
first look at the root node. Nodes in the left subtree have values 10, 8 and 12. So,
they are all lesser than15 and in right subtree, we have 17, 20 and 25, they are all greater
than 15. So, we are good for the root node. Now, lets look at this node with value 10.
In left, we have 8 which is lesser. In right, we have 12 which is greater. So, we are good.
We are good for this node too having value 20 and we don't need to bother about leave
nodes because they do not have children. So, this is a binary search tree. Now, what if
I change this value 12 to 16. Now, is this still a binary search tree. Well, for node
with value 10, we are good. The node with value 16 is in its right. So, not a problem.
But for the root node, we have a node in left sub-tree with higher value now. So, this tree
is not a binary search tree. I'll revert back and make the value 12 again. Now, as we were
saying we can search, insert or delete in a binary search tree in big-oh of log n time
in average case. How is it really possible? Lets first talk about search. If these integers
that I have here in the tree were in a sorted array, we could have performed binary search
and what do we do in binary search. Lets say we want to search number 10 in this array.
What we do in binary search is, we first define the complete list as our search space. The
number can exist only within the search space. I'll mark search space using these two pointers
- start and end. Now, we compare the number to be searched or the element to be searched
with mid element of the search space or the median. And if the record being searched,
if the element being searched is lesser, we go searching in the left half, else we go
searching in the right half. In case of equality, we have found the element. In this case, 10
is lesser than 15. So, we will go searching towards left. Our search space is reduced
now to half. Once again, we will compare to the mid-element and bingo, this time, we have
got a match. In binary search, we start with n elements in search space and then if mid
element is not the element that we are looking for, we reduce the search space to n/2 and
we go on reducing the search space to half, till we either find the record that we are
looking for or we get to only one element in search space and be done. In this whole
reduction, if we will go from n to n/2 to n/4 to n/8 and so on, we will have log n to
the base 2 steps. If we are taking k steps, then n upon 2 to the power k will be equal
to 1 which will imply 2 to the power k will be equal to n and k will be equal to log n
to the base 2. So, this is why running time of binary search is big-oh of log n. Now,
if we will use this binary search tree to store the integers, search operation will
be very similar. Lets say, we want to search for number 12. What we'll do is, we start
at root and then we will compare the value to be searched, the integer to be searched
with value of the root. if its equal, we are done with the search, if its lesser, we know
that we need to go to the left sub-tree because in a binary search tree, all the elements
in left sub-tree are lesser and all the elements in right sub-tree are greater. Now, we will
go and look at the left child of node with value 15. We know that number 12 that we are
looking for can exist in this sub-tree only and anything apart from the sub-tree is discarded.
So, we have reduced the search space to only these 3 nodes having value 10, 8 and 12. Now,
once again, we will compare 12 with 10. We are not equal. 12 is greater, so we know that
we need to go looking in right sub-tree of this node with value 10. So, now our search
space is reduced to just one node. Once again, we will compare the value here at this node
and we have a match. So, searching an element in binary search tree is basically this traversal
in which at each step, we will go either towards left or right and hence at each step, we will
discard one of the sub-trees. if the tree is balanced, we call a tree balanced if for
all nodes, the difference between the heights of left and right sub-trees is not greater
than 1. So, if the tree is balanced, we will start with a search space of n nodes and when
we will discard one of the sub-trees, we will discard n/2 nodes. So, our search space will
be reduced to n/2 and then in next step, we will reduce the search space to n/4. We will
go on reducing like this till we find the element or till our search space is reduced
to only one node when we will be done. So, the search here is also a binary search. And
that's why the name - Binary Search Tree. This tree that I am showing here is balanced.
In fact this is a perfect binary tree. But with same records, we can have an unbalanced
tree like this. This tree has got the same integer values as we had in the previous structure
and this is also a binary search tree, but this is unbalanced. This is as good as a linked
list. In this tree there is no right sub-tree for any of the nodes. Search space will be
reduced by only one.at each step. From n nodes in search space, we will go to n-1 nodes and
then to n-2 nodes, all the way till 1 will be n steps. In binary search tree, in average
case, cost of search, insertion or deletion is big-oh of log n and in worst case, this
is the worst case arrangement that I am showing you. The running time will be big-oh of n.
We always try to avoid the worst case by trying to keep the binary search tree balanced. With
same records in the tree, there can be multiple possible arrangements. For these integers
in this tree, another arrangement is this. For all the nodes, we have nothing to discard
in left sub-tree in a search. This is another arrangement. This is still balanced because
for all the nodes, the difference between the heights of left and right sub-tree is
not greater than 1. But this is the best arrangement when we have a perfect binary tree. At each
step, we will have exactly n/2 nodes to discard. Ok, now to insert some records in binary search
tree, we will first have to find the position at which we can insert and we can find the
position in big-oh of log n time. Lets say we want to insert 19 in this tree, what we
will do is start at the root. If the value to be inserted is lesser or equal, if there
is no child, insert as left child or go left. If the value is greater and there is no right
child, insert as right child or go right. In this case, 19 is greater, so we will go
right. Now, we are at 20. 19 is lesser and left sub-tree is not empty. We have a left
child. So, we will go left. Now, we are at 17, 19 is greater than 17. So, it should go
in right of 17. There is no right child of 17. So, we will create a node with value 19
and link it to this node with value 17 as right child. Because we are using pointers
or references here just like linked list, no shifting is needed like an array. Creating
a link will take constant time. So, overall insertion will also cost us like search. To
delete also, we will first have to search the node. Search once again will be big-oh
of log n and deleting the node will only mean adjusting some links. So, removal also is
going to be like search - big-oh of log n in average case. Binary search tree gets unbalanced
during insertion and deletion. So, often during insertion and deletion, we restore the balancing.
There are ways to do it and we will talk about all of this in detail in later lessons. In
next lesson, we will discuss implementation of binary search tree in detail. This is it
for this lesson. Thanks for watching.