Links

F. Husson website





Book Exploratory Multivariate Analysis Using R

Outline

Introduction

Principal Component Analysis

Correspondence Analysis

Multiple Correspondence Analysis

Clustering

Multiple Factor Analysis

To conclude

Forum

Quiz 3 on hierarchical clustering

For each question, tick the correct answer or answers.

Q1) The K-means algorithm
allows us to determine the number of clusters
is iterative
requires the number of clusters to be defined

Q2) The K-means algorithm
always leads to the same solution on the same data set
can be launched with several starting conditions in order to find the best solution
always gives the solution which minimizes the between-class inertia divided by the total inertia

Q3) Using hierarchical clustering and K-means together.
The clusters obtained by making a cut in the hierarchical tree can be used to initialize the K-means algorithm
The K-means algorithm determines the number of clusters at which to cut the hierarchical tree
We can use hierarchical clustering to determine a number of clusters with which to run K-means
The K-means algorithm can be used to robustify the clustering obtained using hierarchical clustering

Q4) High-dimensional data.
When there are many individuals, we can run a hierarchical clustering before doing K-means
When there are many individuals, we can first group individuals together using K-means, then run hierarchical clustering
When there are many variables, we can run a factor analysis and retain the first factor dimensions, with which we can then run a clustering algorithm

Q5) Factor analysis and clustering. Running a clustering algorithm on the first factor dimensions rather than the original data
corresponds to deleting information from some variables
removes noise contained in later dimensions
gives a more stable clustering
gives a general view of the information via the clustering, and a more detailed view via the factor analysis

Score =
Correct answers: