Subtitles section Play video
In our previous lesson, we saw what binary search trees are, now in this lesson we are
going to implement binary search tree. We will be writing some code for binary search
tree. prerequisite for this lesson is that you must understand the concepts of pointers
and dynamic memory allocation in C/C++. If you have already followed this series and
seen our lessons on linked list, then implementation of binary search tree or binary tree in general
is not going to be very different. We will have nodes and links here as well. Ok, so
lets get started. Binary search tree or BST as we know is a binary tree in which for each
node, value of all the nodes in left subtree is lesser or equal and value of all the nodes
in right subtree is greater. We can draw BST as a recursive structure like this. Value
of all the nodes in left subtree must be lesser or equal and value of all the nodes in right
subtree must be greater and this must be true for all nodes and not just the root node.
So, in this recursive definition here, both left and right subtrees must also be binary
search trees. I have drawn a binary search tree of integers here. Now, the question is,
how can we create this non-linear logical structure in computer's memory. I had talked
about this briefly when we had discussed binary trees. The most popular way is - dynamically
created nodes linked to each other using pointers or references just the way we do it for linked
lists. Because in a binary search tree, or in a binary tree in general, each node can
have at most 2 children, we can define node as an object with 3 fields something like
what I'm showing here. We can have a field to store data, another to store address or
reference of left child and another to store address or reference to right child. If there
is no left or right child for a node, reference can be set as NULL. In C or C++, we can define
node like this. There is a field to store data. Here the data type is integer, but it
can be anything. There is one field which is pointer to node. Node asterisk means pointer
to node. This one is to store the address of left child and we have another one to store
the address of right child. This definition of node is very similar to definition of Node
for doubly linked list. Remember in doubly linked list also, each node had two links
- one to previous node and another to next node. but doubly linked list was a linear
arrangement. This definition of node is for a binary tree. We could also name this something
like BstNode, but Node is also fine, lets go with Node. Now, in our implementation,
just like linked list, all the nodes will be created in dynamic memory or heap section
of application's memory using malloc function in 'C' or new operator in C++. We can use
malloc in C++ as well. Now, as we know any object created in dynamic memory or heap section
of applications memory cannot have a name or identifier. It has to be accessed through
a pointer. malloc or new operator return us pointer to the object created in heap. if
you want to revise some of these concepts of dynamic memory allocation, you can check
the description of this video for link to a lesson. Its really important that you understand
this concept of stack and heap in applications memory really well. Now, for a linked list,
if you remember, the information that we always keep with us is address of the head node.
If we know the head node, we can access all other nodes using links. In case of trees,
the information that we always keep with us is address of the root node. If we know the
root node, we can access all other nodes in the tree using links. To create a tree, we
first need to declare a pointer to BstNode. I'll rather call node BstNode here, BST for
binary search tree. So, to create a tree, we first need to declare a pointer to BstNode
that will always store the address of root node. I have declared a pointer to Node here
named rootPtr - Ptr for pointer. In C, you can't just write BstNode asterisk rootPtr.
You will have to write struct space BstNode asterisk, you will have to write struct here
as well. I am gonna write C++ here, but anyway right now I'm trying to explain the logic.
we will not bother about minute details of implementation. In this logical structure
of tree that I'm showing here, each Node as you can see has 3 fields, 3 cells. leftmost
cell is to store the address of left child and rightmost cell is to store the address
of right child. Lets say root node is at address 200 in memory and I'll assume some random
addresses for all other nodes as well. Now, I can fill in these left and right cells in
each node with addresses of left and right children. In our definition, data is first
field, but in this logical structure, I am showing data in the middle. Ok, so for each
Node, I have filled in addresses for both left and right child. Address is 0 or NULL
if there is no child. Now, as we were saying, identity of the tree is address of the root
node. We need to have a pointer to Node in which we can store the address of the root
node. We must have a variable of type pointer to node to store the address of root node.
All these rectangles with 3 cells are Nodes. They are created using malloc or new operator
and live in heap section of application's memory. We cannot have name or identifier
for them. They are always accessed through pointers. This rootPtr, root pointer has to
be a local or global variable. We will discuss this in little more detail in some time. Quite
often, we like to name this root pointer, just root. We can do so, but we must not confuse.
This is pointer to root and not the root itself. To create a BST, as I was saying, we first
need to declare this pointer. Initially, we can set this pointer as NULL to say that the
tree is empty. A tree with no node can be called empty tree and for empty tree, root
pointer should be set as NULL. We can do this declaration and setting the root as NULL in
main function in our program. Actually, lets just write this code in a real compiler. I
am writing C++ here. As you can see, in the main function I have declared this pointer
to Node which will always store the address of root node of my tree and I am initially
setting this as NULL to say that the tree is empty. With this much of code, we have
created an empty tree. But, whats the point of having an empty tree. We should have some
data in it. So, what I want to do now is I want to write a function to be able to insert
a node in the tree. I will write a function named Insert that will take address of the
root node and data to be inserted as argument and this function will Insert a node with
this data at appropriate position in the tree. In the main function, I'll make calls to this
insert function passing it address of root and the data to be inserted, lets say first
I want to insert number 15 and then I want to insert number 10 and then number 20. We
can insert more, but lets first write the logic for Insert function. Before I write
the logic for Insert function, I want to write a function to create a new Node in dynamic
memory or heap. This function GetNewNode should take an integer, the data to be inserted as
argument, create a node in heap using new or malloc and return back the address of this
new node. I am creating a new node here using this new operator. the operator will return
me the address of the newly created node which I am collecting in this variable of type pointer
to BstNode. In C, instead of new operator, we will have to use malloc. We can use malloc
in C++as well. C++ is only a super-set of C. malloc will also do here. Now, anything
in dynamic memory or heap is always accessed through pointer. Now, using this pointer - newNode,
we can access the fields of newly created Node. I will have to dereference this pointer
using asterisk operator. i am writing asterisk newNode and now I can access the fields. We
have 3 fields in node - data and 2 pointers to node , left and right;. i have set the
data here. Instead of writing asterisk newNode dot data, we have this alternate syntax that
we could use. We could simply write newNode->data and this will mean the same. We have used
this syntax quite a lot in our lessons on linked list. Now, for the new node, we can
set the left and right child as NULL and finally we can return the address of the new Node.
Ok, coming back to the insert function. We can have couple of cases in insertion. First
of all, tree may be empty. For this first insertion, when we are inserting this number
15, tree will be empty. if tree is empty, we can simply create a new node and set it
as root. With this statement, root equal GetNewNode, I am setting root as address of the new node.
But there is something not alright here. This root is local variable of Insert function
and its scope is only within this function. We want this root, root in main to be modified.
This guy is a local variable of main function. There are 2 ways of doing this. We can either
return the address of the new root. So, return type of insert function will be pointer to
BstNode and not void. And here, in the main function, we will have to write statement
like root equal Insert and the arguments. So, we will have to collect the return and
update our root in main function. Another way is that, we can pass the address of this
root of main to the Insert function. This root is already a pointer to Node. So, its
address can be collected in a pointer to pointer. So, Insert function , in insert function first
argument will be a pointer to pointer and here, we can pass the address. We will say
ampersand root to pass the address. We can name this argument root or we can name this
argument rootPtr. We can name this whatever. Now, what we need to do is, we need to dereference
this using asterisk operator to access the value in root of main and we can also set
the value in root of main. So, here with this statement, we are setting the value and the
return type now can be void. This pointer to pointer thing gets a little tricky. I'll
go with the former approach. Actually, there is another way. instead of declaring root
as a local variable in main function, we can declare root as global variable. Global variable,
as we know has to be declared outside all the functions. if root would be global variable,
it would be accessible to all the functions and we will not have to pass the address stored
in it as argument. Anyway, coming back to the logic for insertion. As we were saying,
if the tree is empty, we can simply create a new node and we can simply set it as root.
At this stage, we wanted to insert 15. If we will make call to the insert function,
address of root is 0 or NULL. NULL is only a macro for 0 and the second argument is the
number to be inserted. In this call to Insert function, we will make call to get new Node.
Lets say, we got this new Node at address 200. GetNewNode function will return us address
200 which we can set as root here, but this root is a local variable. We will return this
address 200 back to the main function and in the main function, we are actually doing
this root equal Insert. So, in the main function we are building this link. Ok, our next call
in the main function was to insert number 10. At this stage, root is 200. the address
in root is 200 and the value to be inserted is 10. Now, the tree is not empty. So, what
do we do. If the tree is not empty, we can basically have 2 cases. If the data to be
inserted is lesser or equal, we need to insert it in the left subtree of root and if the
data to be inserted is greater, we need to insert it in right subtree of the root. So,
we can reduce this problem in a self similar manner, in a recursive manner. Recursion is
one thing that we are going to use almost all the time while working with trees. In
this function, I'll say that if the data to be inserted is less than or equal to the data
in root, then make a recursive call to insert data in left subtree. The root of the left
subtree will be the left child. So, in this recursive call, we are passing address of
left child and data as argument and after the data is inserted in left subtree, the
root of the left subtree can change. Insert function will return the address of the new
root of the left subtree and we need to set it as left child of the current node. In this
example tree here, right now, both left and right subtree are empty. We are trying to
insert number 10, so we have made call to this function Insert. From main function,
we have called Insert passing it address 200 and value or data 10. Now, 10 is lesser than
15, so control will come to this line and a call will be made to Insert data in left
subtree. Now, left subtree is empty. So, address of root for left subtree is 0. Data passed,
data to be inserted passed as argument is 10. Now, this first insert call will wait
for this insert below to finish and return. For this last, insert call, root is NULL.
Lets say we got this node at address 150. Now, this Insert call will return back 150
and execution of first insert call will resume at this line and now this particular address
will be set as 150. So, we will build this link and now this insert call can finish.
It can return back the current root. Actually this return root should be there for all cases,
so I am taking it out and I have it after all these conditions. Of course, we will have
one more else here. If the data is greater, we need to go Insert in right subtree. the
third call in Insert function was to Insert number 20. Now this time, we will go to this
else statement, this statement in else. Lets say, we got this new Node at address 300.
So, this guy will return 300. For this node at 200, right child will be set as 300 and
now this call to Insert can finish. The return will be 200. Ok, at this stage what if a call
is made to Insert number 25. We are at root right now, the node with address 200. 25 is
greater, so we need to go and insert in right subtree. Right subtree is not empty this time.
So, once again, for this call also, we will come to this else, last else because 25 is
greater than 20. Now, in this call we will go tot the first if. A node will be created,
lets say, we got this Node in heap at address 500. This particular call Insert(0,25) will
return 500 and finish. Now, for the node at 300, right child will be set as 500. So, this
link will get built. Now, this guy will return 300. The root for this subtree has not changed.
And this first call to Insert will also wrap up. It will return 200. So, we are looking
good for all cases. This Insert function will work for all cases. We could write this Insert
function without using recursion. I encourage you to do so. You will have to use some temporary
pointer to Node and loops. Recursion is very intuitive here and recursion is intuitive
in pretty much everything that we do with trees. So, its really important that we understand
recursion really well. Ok, I will write one more function now to search some data in BST.
In the main function here, I have made some more calls to Insert. Now, i want to write
a function named Search that should take as argument, address of the root node and the
data to be searched and this function should return me true if data is there in the tree,
false otherwise. Once again, we will have couple of cases. If the root is NULL, then
we can return false. If the data in root is equal to the data that we are looking for,
then we can return true. Else, we can have 2 cases - either we need to go and search
in the left subtree or we need to go in the right subtree. So, once again, i am using
recursion here. I am making recursive call to search function in these two cases. If
you have understood the previous recursion, then this is very similar.Lets test this code
now. what I have dine here is I have asked the user to enter a number to be searched
and then I am making call to this search function and if this function is returning me true,
I am printing "Found", else I am printing "NotFound". Lets run this code and see what
happens. I have moved multiple Insert statements in one line because I am short of space here.
Lets say, we want to search for number 8. 8 is found. And now lets say, we want to search
for 22. 22 is not found. So, we are looking good. I'll stop here now. You can check the
description of this video for link to all the source code. We will do a lot more with
trees in coming lessons. In our next lesson, we will go a little deeper and try to see
how things move in various sections of application's memory. How things move in stack and heap
sections of memory when we execute these functions. It will give you a lot of clarity. This is
it for this lesson. Thanks for watching !