For an explanation of options on the k-Means Clustering - Step 1 of 3 dialog, see the Common Dialog Options section in the Introduction to Analytic Solver Data Mining. The following section explains the options belonging to k-Means Clustering - Step 2 of 3 and Step 3 of 3 dialogs.
Normalize input data
If this option is selected, Analytic Solver normalizes the input data before applying the k-Means Clustering algorithm. Normalizing the data is important to ensure that the distance measure accords equal weight to each variable. Without normalization, the variable with the largest scale will dominate the measure. Note: The related outputs will be reported in their original, not-normalized scale.
# Clusters
Enter the number of final clusters (k) to be formed here. The number of clusters should be at least 1, and at most the number of observations -1 in the data range. This value should be based on your knowledge of the data and the number of projected clusters. It is recommended that the procedure be repeated with several different k values.
# Iterations
Enter the number of times the program will perform the clustering algorithm. The configuration of clusters (and data separation) may differ from one starting partition to another. The algorithm will complete the specified number of iterations and select the cluster configuration that minimizes the distance measure.
Options
If Fixed start is selected, Analytic Solver builds the model with a single fixed starting point.
If Random starts is selected, the algorithm starts at any random point. Enter the number of desired starting points for Random Starts. This value specifies the number of times Analytic Solver starts the algorithm using randomly-selected starting points. The Sum of Squared distance for each randomly selected point is compared, and the point with the lowest value is chosen. Analytic Solver starts the clustering algorithm from this point.
Set SeedThis option initializes the random number generator that is used to calculate the initial cluster centroids. Setting the random number seed to a non-zero value (default 12345), ensures that the same sequence of random numbers is used each time the initial cluster centroids are calculated. When the seed is zero, the random number generator is initialized from the system clock, so the sequence of random numbers are different each time the centroids are initialized. Set the seed for the results from successive runs of the clustering method to be comparable. To do this, select the Set seed edit box, or enter a number into the box. This option accepts both positive and negative integers with up to nine digits.
Show data summary
Select this option to display the data summary in the k-Means Clustering output.
Show distances from each cluster center
Select this option to display the distances from each cluster center in the k-Means Clustering output.