||Data used: claus.mat
fluorescence excitation emission data from five samples containing tryptophan,
phenylalanine, and tyrosine.
Purpose: Learning to use PARAFAC modeling for more advanced things.
Information: R. Bro, PARAFAC: Tutorial & applications. Chemom. Intell. Lab. Syst., 1997, 38, 149-171.
Prerequisites: Be sure to understand the basics of handling multi-way arrays in MATLAB (Chapter 1). You should be acquainted with concepts such as nestedness, uniqueness, convergence, constraints and second-order calibration.
Ad a): Due to its uniqueness properties the PARAFAC model is ideally suited for curve resolution and certain kinds of calibration problems. Since the PARAFAC model is unique and coincides with several physical models (within fluorescence spectroscopy, spectrally detected chromatography etc.) it is possible to decompose such data into (chemically) meaningful parameters. As shown in the previous chapter, fully overlapping fluorescence data may be decomposed into score and loading vectors that are estimates of excitation and emission spectra and concentrations of chemical analytes. Thus the PARAFAC model can perform mathematical chromatography on mixture data enabling identification and quantification of specific analytes.
Ad b): Seeing the PARAFAC model as a generalization of two-way PCA it is apparent that the PARAFAC model may be used for extracting latent variables regardless of its uniqueness properties. The thus obtained scores and loadings may be used in complete analogy with ordinary two-way models for exploration and quantitative analysis such as regression and classification.
In this chapter we will elaborate on some of the properties relating
to the uniqueness of the PARAFAC model as well as on how to improve a given
model by incorporating a priori knowledge directly into the model.
Xnew = X([1 4 5],:);
Run PARAFAC with an appropriate number of components (three in this case - but not in all other cases - why?) and use the scores plus the known concentration of tryptophane in sample one to estimate the concentration. Compare with the reference concentrations.
Compare the predictions obtained with PARAFAC with those that can be
achieved with GRAM or DTLD.
A constrained model will fit the data poorer than an unconstrained model, but if the constrained model is more interpretable and realistic this may justify the decrease in fit. Applying constraints should be done carefully considering the appropriateness beforehand, considering why the unconstrained model is unsatisfactory, and critically evaluating the effect afterwards. In some cases there is confidence in that the constraint is appropriate.
Sometimes an intuitive approach is used for imposing constraints. Using non-negativity constraints as an example a common approach is to use an unconstrained estimation procedure and then subsequently set negative values to zero. Naturally this will lead to a feasible solution, i.e., a solution that is strictly non-negative. This approach can not be recommended for several reasons. First of all, an estimate obtained from such an approach will have no well-defined optimality property. This can make it difficult to distinguish between problems pertaining to the algorithm, the model, and the data. That is, if the fitted model is unsatisfactory, it becomes more difficult to assess the cause of the problem, because an additional source of error has been introduced, namely the properties of the constraint.
Task: Try to model a three-component PARAFAC model of sample four and five with and without non-negativity. Such restrictions are set in the input const. Do the estimated emission and excitation spectra change? Do the same using sample one and four. See the help in the PARAFAC m-file for how to impose constraints
Task: Repeat the models investigated previously in the second-order calibration problems and use non-negativity on all modes. Do the parameters change? Do the predictions?
When analyzing spectral data there is often a strong rationale behind using the PARAFAC model with its intrinsic uniqueness properties. Unfortunately, the PARAFAC model is sometimes very difficult to fit and it would be nice to be able to speed up the algorithm. If the data analyzed are not spectral or similar data it is often of little practical importance whether the determined loading vectors are orthogonal or not. With respect to relative interpretation and for quantitative purposes such as using the scores in a subsequent regression model, the primary aim is to span the systematic variation, and one may therefore e.g. use orthogonality constrained models in order to avoid numerical algorithmic problems.
Task: To exemplify how orthogonality constraints work try to fit a five-component model several times with and without orthogonality constraints. Compare the time consumptions.
How to do it
Additionally, some remarks have been made on the use of constraints in modeling. If additional knowledge is available on data this may be incorporated directly into the model by means of constraints. This can help, e.g., in providing more sensible and robust models.
The N-way tutorial
Copyright © 1998