Links

F. Husson website





Book Exploratory Multivariate Analysis Using R

Outline

Introduction

Principal Component Analysis

Correspondence Analysis

Multiple Correspondence Analysis

Clustering

Multiple Factor Analysis

To conclude

Forum

Exercise.

A sensory evaluation was organized to characterize 16 fruit cocktails. The 16 cocktails are mixtures of orange juice, banana juice, mango juice and lemon juice in various quantities. The names of the cocktails make it possible to know their composition: the letter o for orange juice, b for banana, c for lemon and m for mango. The letter is not given if the percentage is small, the letter is lower case if the percentage is average, and the letter is capitalized if the percentage is high. For example, Moc has the highest percentage of mango juice (50%), an average percentage of orange juice (45%), an average percentage of lemon juice (5%), a low percentage of banana juice (0%). The * sign at the end of the cocktail name indicates that an odorless, tasteless red dye (Colorant) has been added.

A jury of 12 judges assessed each of the 16 cocktails during 2 tasting sessions, with scores between 0 and 10; a score of 0 means that the evaluated odor or flavor is extremely weak, and a score of 10 means that it is very strong. The average per cocktail and per sensory variable was calculated, and this is the data that is available.

In addition, a physico-chemical analysis of the cocktails was made, and their composition is known.

The file given here contains the sensory characterization data, the physico-chemical data, as well as the composition data.

Using the sensory variables only, provide a sensory description of the cocktails, which amounts to building what is also called a product space, describing the cocktails in a multidimensional way.

Q1) What is the percentage of inertia represented by the first axis?
40.28 %
47.92%
57.30 %
72.16 %

Q2) Which supplementary variable is the most associated with the 1st dimension?
%Orange
%Banana
%Mango
%Lemon
pH
Total acidity
Dry extract
Citric acid
Acid L-Malic
Sucrose
Fructose
Glucose
Protein
Vitamin C

Q3) The dye variable should in no way affect the results since the sensory variables are only odor, flavor and taste variables. Which of the following is the most true?
The dye significantly effects sensory perception
The dye has a significant effect on the second dimension of the product space, and thus a significant effect on variables strongly correlated with the second dimension
The dye is related to the second dimension, but not significantly, because the test statistic values for the categories are less than 2 in absolute value.

Q4) Which composition variable is the best represented in the principal plane?
%Banana
%Lemon
%Mango
%Orange

We now want to perform clustering on the cocktails.

To do this, choose the number of dimensions to be retained in the PCA in order to be able to retrieve 95% of the information contained in the sensory variables, before doing clustering.

Q5) How many PCA dimensions should we keep?
2
3
4
5
7
10
12

Q6) Choosing an optimal number of classes to keep for the clustering. How many clusters do you retain?
2
3
4
5
7
10

Q7) Which variable best characterizes overall the partition found?
Acid
Bitterness
%Orange
pH
Glucose
Vitamin.C

Q8) Which variable best characterizes the class containing a lot of banana?
Acid
Bitterness
%Orange
pH
Glucose
Vitamin.C

Q9) Which cocktail is the "model individual" of the class of cocktails containing a lot of banana?
MCb*
Mb
obmc*
Bo
OCb*
Obc

Q10) Which dimension allows us to characterize the cocktails from the class of cocktails containing a lot of banana?
1
2
3
4
5

Q11) Please go ahead and analyze this data set in more detail and interpret the axes from the factor analysis, as well as the clustering results, further.

Score =
Correct answers: