Interactive introduction to multi-way analysis in MATLAB
Next Chapter: Multi-way calibration Previous Chapter: Basic PARAFAC modeling First Chapter: Contents 

 

CHAPTER 3

ADVANCED PARAFAC MODELING



 
 
 
Contents
 
  1. Introduction
  2. Second-order calibration
  3. Using constraints
  4. Summary
  5. Comments
     
Data used: claus.mat contains fluorescence excitation emission data from five samples containing tryptophan, phenylalanine, and tyrosine. 

Purpose: Learning to use PARAFAC modeling for more advanced things. 

Information: R. Bro, PARAFAC: Tutorial & applications. Chemom. Intell. Lab. Syst., 1997, 38, 149-171. 

Prerequisites: Be sure to understand the basics of handling multi-way arrays in MATLAB (Chapter 1). You should be acquainted with concepts such as nestedness, uniqueness, convergence, constraints and second-order calibration.



 

1. Introduction

In this chapter some of the more advanced aspects of PARAFAC modeling will be described. There are two main ways of seeing the PARAFAC model: a) as a hard model for parameter estimation, and b) as a soft model for decomposing multi-collinear data.

Ad a): Due to its uniqueness properties the PARAFAC model is ideally suited for curve resolution and certain kinds of calibration problems. Since the PARAFAC model is unique and coincides with several physical models (within fluorescence spectroscopy, spectrally detected chromatography etc.) it is possible to decompose such data into (chemically) meaningful parameters. As shown in the previous chapter, fully overlapping fluorescence data may be decomposed into score and loading vectors that are estimates of excitation and emission spectra and concentrations of chemical analytes. Thus the PARAFAC model can perform mathematical chromatography on mixture data enabling identification and quantification of specific analytes.

Ad b): Seeing the PARAFAC model as a generalization of two-way PCA it is apparent that the PARAFAC model may be used for extracting latent variables regardless of its uniqueness properties. The thus obtained scores and loadings may be used in complete analogy with ordinary two-way models for exploration and quantitative analysis such as regression and classification.

In this chapter we will elaborate on some of the properties relating to the uniqueness of the PARAFAC model as well as on how to improve a given model by incorporating a priori knowledge directly into the model.
 

2. Second-order calibration

Explore the concept of second-order calibration using PARAFAC. Use for example sample number one, X(1,:), as a standard of pure tryptophane. Use this standard to estimate the amount of tryptophane in sample two, three, four, and five. You can do this separately or simultaneously. For estimating the concentration of tryptophane in the fourth and fifth sample, create a data set of the first (standard), fourth and fifth sample as

Xnew = X([1 4 5],:);

Run PARAFAC with an appropriate number of components (three in this case - but not in all other cases - why?) and use the scores plus the known concentration of tryptophane in sample one to estimate the concentration. Compare with the reference concentrations.

Compare the predictions obtained with PARAFAC with those that can be achieved with GRAM or DTLD.
 

3. Using constraints

Constraining a model can sometimes be helpful. For example resolution of spectra may be wanted. To ensure that the estimated spectra make sense it may be reasonable to estimate the spectra under non-negativity constraints as most spectral parameters are known to be non-negative. Constraints can for example help to Some argue that constraining, e.g., the PARAFAC model is superfluous, as the structural model in itself should be unique. However, there are several good reasons for using constraints. Firstly not all models are unique like the PARAFAC model. Secondly, even though the model is unique, the model may not provide a completely satisfactory description of the data. Rayleigh scatter in fluorescence spectroscopy is but one instance where slight model inadequacy can cause the model parameters to be misleading. Constraints can be helpful in preventing that. In other situations numerical problems or intrinsic ill-conditioning can make a model problematic to fit. At a more general level constraints may be applied simply because they are known to be valid. This can give better estimates of model parameters and of the data.

A constrained model will fit the data poorer than an unconstrained model, but if the constrained model is more interpretable and realistic this may justify the decrease in fit. Applying constraints should be done carefully considering the appropriateness beforehand, considering why the unconstrained model is unsatisfactory, and critically evaluating the effect afterwards. In some cases there is confidence in that the constraint is appropriate.

Sometimes an intuitive approach is used for imposing constraints. Using non-negativity constraints as an example a common approach is to use an unconstrained estimation procedure and then subsequently set negative values to zero. Naturally this will lead to a feasible solution, i.e., a solution that is strictly non-negative. This approach can not be recommended for several reasons. First of all, an estimate obtained from such an approach will have no well-defined optimality property. This can make it difficult to distinguish between problems pertaining to the algorithm, the model, and the data. That is, if the fitted model is unsatisfactory, it becomes more difficult to assess the cause of the problem, because an additional source of error has been introduced, namely the properties of the constraint.

Task: Try to model a three-component PARAFAC model of sample four and five with and without non-negativity. Such restrictions are set in the input const. Do the estimated emission and excitation spectra change? Do the same using sample one and four. See the help in the PARAFAC m-file for how to impose constraints

Task: Repeat the models investigated previously in the second-order calibration problems and use non-negativity on all modes. Do the parameters change? Do the predictions?

When analyzing spectral data there is often a strong rationale behind using the PARAFAC model with its intrinsic uniqueness properties. Unfortunately,  the PARAFAC model is sometimes very difficult to fit and it would be nice to be able to speed up the algorithm. If the data analyzed are not spectral or similar data it is often of little practical importance whether the determined loading vectors are orthogonal or not. With respect to relative interpretation and for quantitative purposes such as using the scores in a subsequent regression model, the primary aim is to span the systematic variation, and one may therefore e.g. use orthogonality constrained models in order to avoid numerical algorithmic problems.

Task: To exemplify how orthogonality constraints work try to fit a five-component model several times with and without orthogonality constraints. Compare the time consumptions.

How to do it
 

4. Summary

In this chapter some of the advanced aspects of PARAFAC modeling have been discussed. Most notably second-order calibration which, in its most extreme version, allows quantification in situations where there is only one pure standard. Thus, no knowledge of the interferences and no knowledge of the normal variation. Of course this is not an optimal situation but it illustrates the power of second-order calibration.

Additionally, some remarks have been made on the use of constraints in modeling. If additional knowledge is available on data this may be incorporated directly into the model by means of constraints. This can help, e.g., in providing more sensible and robust models.

5. Comments please

Please comment here if the above exercises help in understanding the basics of multi-way data, or give suggestions to improvements


Next Chapter: Multi-way calibration Previous Chapter: Basic PARAFAC modeling First Chapter: Contents

The N-way tutorial
Copyright © 1998
R. Bro