Subtitles section Play video Print subtitles We will have a short lecture about… clustering of clustering! Originally, cluster analysis was developed by anthropologists aiming to explain the origin of human beings. Later it was adopted by psychology, intelligence and other areas. Nowadays, there are 2 broad types of clustering: Flat and hierarchical. K-means is a flat method in the sense that there is no hierarchy, but rather we choose the number of clusters, and the magic happens. The other type is hierarchical. And that’s what we are going to discuss in this lecture. Historically, hierarchical clustering was developed first, so it makes sense to get acquainted with it. An example of clustering with hierarchy is taxonomy of the animal kingdom. For instance, there is the general term: animal. Sub-clusters are fish, mammals, and birds, for instance. There are birds which can fly, and those that can’t. We can continue in this way, until we reach dogs and cats. Even then we can divide dogs and cats into different breeds. Moreover, some breeds have sub-breeds. This is called hierarchy of clusters. There are two types of hierarchical clustering: agglomerative or ‘bottom up’ and divisive or ‘top down’. With divisive clustering we start from a situation where all observations are in the same cluster. Like the dinosaurs. Then we split this big cluster into 2 smaller ones. Then we continue with 3, 4, 5, and so on, until each observation is its separate cluster. However, in order to find the best split, we must explore all possibilities at each step. Therefore, faster methods have been developed, such as k-means. With k-means, we can simulate this divisive technique. When it comes to agglomerative clustering, the approach is bottom up. We start from different dog and cat breeds, cluster them into dogs and cats respectively, and then we continue pairing up species, until we reach the animal cluster. Agglomerative and divisive clustering should reach similar results, but agglomerative is much easier to solve mathematically. This is also the other clustering method we will explore – agglomerative hierarchical clustering. In order to perform agglomerative hierarchical clustering, we start with each case being its own cluster. There is a total of N clusters. Second, using some similarity measure like Euclidean distance, we group the two closest clusters together, reaching an ‘n minus 1’ cluster solution. Then we repeat this procedure, until all observations are in a single cluster. The end result looks like this animal kingdom representation. The name for this type of graph is: a ‘dendrogram’. A line starts from each observation. Then the two closest clusters are combined, then another two, and so on, until we are left with a single cluster. Note that all cluster solutions are nested inside the dendrogram. Alright. Let’s explore a dendrogram and see how it works. Here is the dendrogram created on our ‘Country cluster. Okay. So, each line starts from a cluster. You can see the names of the countries at the beginning of those lines. This is to show that, at the start, each country is a separate cluster. The first two lines that merge are those of Germany and France. According to the dendrogram, these two countries are the closest in terms of the features considered. At this point, there are 5 clusters: Germany and France are 1, and each other country is its own cluster. From this point on, going up, Germany and France will be considered one cluster. Here’s where it becomes interesting. The next two lines that merge are those of the Germany and France cluster, and the UK. At this point there are 4 clusters: Germany, France and the UK are 1, and the rest are single-observation clusters. At the next stage of the hierarchy, Canada and the US join forces. The next step is to unite the Germany, France, UK cluster with the Canada-US one. Australia is still alone. Finally, all countries become one big cluster, representing the whole sample. Okay. Cool! What other information can we get from the dendrogram? Well, the bigger the distance between two links, the bigger the difference in terms of the chosen features. As you can see, Germany, France and the UK merged into 1 cluster very quickly. This shows us that they are very similar in terms of ‘longitude’ and ‘latitude’. Moreover, Germany and France are closer than Germany and UK, or France and UK. The USA and Canada came together not long after. However, it took half of the dendrogram to join these 5 countries together. This indicates the Europe cluster and the North America cluster are not so alike. Finally, the distance needed for Australia to join the other 5 countries was the other half of the dendrogram, meaning it is extremely different from them. To sum up, the distance between the links shows similarity, or better: dissimilarity between features. Alright. Next on our list is the choice of number of clusters. If we draw a straight line, piercing these two links, we will be left with two clusters, right? Australia in one, and all the rest in the other. Instead, if we pierce them here, we will get three clusters: North America, Europe, and Australia. The general rule is: when you draw a straight line, you should count the number of links that have been broken. In this case, we have broken 3 links, so we will be left with 3 clusters, because the links were coming out of those 3 clusters. Should we break the links here, there will be 4 clusters, and so on. Great! Finally, how should we decide where to draw the line? Well, there is no specific rule, but after solving several problems, you kind of develop an intuition. When the distance between two stages is too big, it is probably a good idea to stop there. For our case, I would draw the line at 3 clusters and remain with North America, Europe, and Australia. Okay. When most people get acquainted with dendrograms, they like them a lot. And I presume that is the case with you, too. Let’s see some pros and cons. The biggest pro is that hierarchical clustering shows all the possible linkages between clusters. This helps us understand the data much, much better. Moreover, we don’t need to preset the number of clusters. We just observe the dendrogram and take a decision. Another pro is that there are many different methods to perform hierarchical clustering, the most famous of which is the Ward method. Different data behaves in different ways, so it is a nice option to be able to choose the method that works better for you. K-means is a one-size fits it all method, so you don’t have that luxury. How about a con? The biggest con, which is also one of the reasons why hierarchical clustering is far from amazing is scalability. I will just show you a single dendrogram of 1000 observations and you will know what I mean. 1000 observations and the dendrogram is extremely hard to be examined. You know what else? It’s extremely computationally expensive. The more observations there are, the slower it gets. While K-means hardly has this issue. Thanks for watching!
B1 cluster clustering germany france divisive hierarchy Flat and Hierarchical Clustering | The Dendrogram Explained 1 0 林宜悉 posted on 2020/03/09 More Share Save Report Video vocabulary