Make a 3D scatter plot that shows the data as represented by PC1, PC2, and PC3 and color labels clusters

After clustering of a dataset with three or more dimensions, we often want to visualize the result of the clustering on a 3D plot. If there are more than three dimensions, we want to first reduce the data down to three dimensions. This can be done with PCA. After PCA is completed, the data can be plotted with this function.

generate.3D.clustering.with.labeled.subgroup(
  pca.results.input,
  cluster.labels.input,
  subgroup.labels.input
)

Arguments

pca.results.input	An object outputted by stats::prcomp(). The PCA of all the features used for clustering. There should be at least 3 features.
cluster.labels.input	A vector of integers that specify which cluster each observation belongs to.
subgroup.labels.input	A vector of strings that specify an additional label for each observations. Each point needs to be labeled

Value

A list with 8 objects:

String specifying x axis label with percent variance for PC1.
String specifying y axis label with percent variance for PC2.
String specifying z axis label with percent variance for PC3.
A vector specifying values for x coordinates of points. PC1 values.
A vector specifying values for y coordinates of points. PC2 values.
A vector specifying values for z coordinates of points. PC3 values
A chisq.test() result object.
A table used for chisq.test()

Details

This function outputs values that can be used to plot PC1 vs PC2 vs PC3 using the rgl package. This function uses the output of stat::prcomp(). The input into prcomp() needs to have at least 3 dimensions. PC = principal component.

This function creates a contingency table to show if the cluster labels and subgroup labels are significantly associated.

Examples


example.data <- data.frame(x = c(18, 21, 22, 24, 26, 26, 27, 30, 31, 35,
                                 39, 40, 41, 42, 44, 46, 47, 48, 49, 54, 35, 30),
                           y = c(10, 11, 22, 15, 12, 13, 14, 33, 39, 37, 44,
                                 27, 29, 20, 28, 21, 30, 31, 23, 24, 40, 45),
                           z = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
                                 3, 3, 3, 3, 3, 3, 3, 3))

#dev.new()
plot(example.data$x, example.data$y)

km.res <- stats::kmeans(example.data[,c("x", "y", "z")], 3, nstart = 25, iter.max=10)

grouped <- km.res$cluster

pca.results <- prcomp(example.data[,c("x", "y", "z")], scale=FALSE)

actual.group.label <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")

results <- generate.3D.clustering.with.labeled.subgroup(pca.results, grouped, actual.group.label)
#> Warning: Chi-squared approximation may be incorrect

xlab.values <- results[[1]]
ylab.values <- results[[2]]
zlab.values <- results[[3]]
xdata.values <- results[[4]]
ydata.values <- results[[5]]
zdata.values <- results[[6]]

#rgl::rgl.bg(color = "white")

#rgl::plot3d(x= xdata.values, y= ydata.values, z= zdata.values,
#xlab = xlab.values, ylab = ylab.values, zlab = zlab.values, col=(grouped+1), pch=20, cex=2)

#rgl::text3d(x= xdata.values, y= ydata.values, z= zdata.values, text= actual.group.label, cex=1)

Arguments

Value

Details

See also

Examples