Options for Principal Components Analysis are displayed on the Step 2 of 3 and Step 3 of 3 dialogs.
For more information on the Step 1 of 3 dialog, please see the Common Dialog Options page in the Introduction to Analytic Solver Data Mining section.
Select the number of principal components displayed in the output.
Fixed # components
Specify a fixed number of components by entering an integer value from 1 to n, where n is the number of Input variables selected in the Step 1 of 3 dialog. This option is selected by default, the default value of n is equal to the number of input variables.
Smallest #components explaining
Select this option to specify a percentage. Analytic Solver calculates the minimum number of principal components required to account for that percentage of variance.
To compute Principal Components, the data is matrix multiplied by a transformation matrix. Use this option to specify the choice of calculating this transformation matrix.
Use Covariance Matrix
The covariance matrix is a square, symmetric matrix of size n x n (number of variables by number of variables). The diagonal elements are variances, and the off diagonals are covariances. The eigenvalues and eigenvectors of the covariance matrix are computed, and the transformation matrix is defined as the transpose of this eigenvector matrix. If the covariance method is selected, the data set should first be normalized by dividing each variable by its standard deviation. Normalizing gives all variables equal importance in terms of variability.1
Use Correlation Matrix (Use Standardized Variables)
An alternative method is to derive the transformation matrix on the eigenvectors of the correlation matrix instead of the covariance matrix. The correlation matrix is equivalent to a covariance matrix for the data where each variable has been standardized to zero mean and unit variance. This method tends to equalize the influence of each variable, inflating the influence of variables with relatively small variance and reducing the influence of variables with high variance. This option is selected by default.
Show principal components score
This option results in the display of a matrix in which the columns are the principal components, the rows are the individual data records, and the value in each cell is the calculated score for that record on the relevant principal component. This option is selected by default.
Show Q-Statistics and Show Hotteling's T-Squared Statistics
Q Statistics (residuals) and Hottelling's T-Squared statistics are summary statistics that help explain how well a model fits the sample data, and can also be used to detect any outliers in the data. A detailed explanation for each is beyond the scope of this guide. Please see the literature for more information on each of these statistics.
If this option is selected, Analytic Solver includes Q-Statistics in the output. Q statistics (or residuals) measure the difference between sample data and the projection of the model onto the sample data. These statistics an also be used to determine if any outliers exist in the data. Low values for Q-statistics indicate a well-fit model.
Show Hotteling's T-Squared Statistics
If this option is selected, Analytic Solver includes Hotteling's T-Squared statistics in the output. T-Squared statistics measure the variation in the sample data within the mode, and indicate how far the sample data is from the center of the model. These statistics can also be used to detect outliers in the sample data. Low T-Squared statistics indicate a well-fit model.
1. Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data Mining for Business Intelligence. 2nd ed. New Jersey: Wiley, 2010.