Search for a command to run...
In this study, principal component analysis (PCA) was applied to fatty acid and triglyceride (TAG) profiles of extra virgin olive oils (n=40) obtained from Kahramanmaraş (Türkiye) during the 2023/2024 harvest season to diagnose and address matrix instabilities caused by multicollinearity and linear dependence. A total of 34 variables (23 experimental and 11 derived) were used to reduce data dimensionality and determine sample variability. The initial PCA attempt with standardized variables showed poor factorability (Kaiser-Meyer-Olkin measure, KMO=0.13; measure of sampling adequacy, MSA<0.40), and Bartlett’s test of sphericity could not be calculated because the correlation matrix was not positive definite. Multicollinearity and linear dependence were assessed using Pearson correlations and regression-based diagnostics (variance inflation factor, V IF; t olerance i ndex, T I; condition index, CI; and variance decomposition proportions, VDP). Most derived variables showing high correlations and redundant information were removed from the dataset, reducing the number of variables to 23, and in the repeated PCA, Bartlett’s test of sphericity became significant (P<0.001), but the KMO value of 0.49 indicated that the model still had insufficient factorability. An optimized 17-variable model was obtained through a stepwise screening based on MSA (<0.40) and multicollinearity criteria (VIF>10; TI<0.10). The final m odel p roduced 5 principal c omponents ( PCs) t hat e xplained 7 4% of t he t otal variance and reached an acceptable level of sampling adequacy (KMO=0.70). After Promax rotation, variables were mostly loaded uniquely and strongly on the relevant PCs in the pattern matrix, while secondary loadings were limited in the structure matrix. In the score analysis, most samples showed separation on the PC1-PC2 plane. Additionally, only 10 samples (25%) exceeded the standardized z-score threshold (|z|>2). Overall, the results indicated that for reliable and interpretable PCA modelling of the olive oil data, it is necessary to clearly manage factorability and multicollinearity issues and to carefully examine the correlation matrix structures and the score distributions.