Addressing and using the Y data wave simultaneously in an all-at-once fitting function

Hi (to whom it may concern)

I am trying to write an all-at-once fitting function that I can later-on use in Global Fits. The specific problem here is that the function I try to create should not only fit a "simple" function to my data points but also do some additional calculations with the input data whose outcome goes into the final fit, too...

To be more specific: I am fitting X-ray Photoelectron Spectra to which I want to apply functions that contain Voigt lineshapes (Gaussian broadened Lorentzian peaks) and Shirley- or Tougaard-type backgrounds. In Multi-Peak-Fitting (v2) I got this working nicely: There it takes all-at-once functions for the peaks and structure fit functions for the backgrounds and all is perfectly fine. However a "Global Fit" apparently needs a pure all-at-once function and that is what I am struggling with.

To do a Shirley or Tougaard background calculation one needs to take the input data and do specific iterative "integration-like" calculations with them. To do that in an all-at-once function there thus would have be a way to address the input Y data and use them for such calculations in parallel to the peak fitting itself because the resulting fit then consists of the sum of background and spectral features (just as in Multi-Peak-Fitting).

Unfortunately I could not find the internal variable into which Igor puts the name of the Y data wave in curve fitting (assuming that is what Igor does there...). And thus all I can do right now is to explicitly specify the Y data wave in my function: Like that, however, I would have to write an individual function for each data set (and that I do not want to do).

So if anybody knows the name of that (string) variable or has some other advice, I would be very happy to hear from you!

Best regards

Daniel
Yes, Global Fit doesn't understand a structure fit function. Making that work would be really hard!

I wonder- these background types depend on areas under the peaks, right? Can the fitted peaks stand in for the actual data? It seems like by the time you get to a final fit, the peaks should (are *supposed* to be) a pretty good approximation of the actual data. So why not compute the background from the fitted peaks instead?

You actually can get access to the original data waves in a standard fit function by going behind Igor's back by looking it up explicitly. Like:
Function myAAOFitfunc(Wave pw, Wave yw, Wave xw) : FitFunc
    Wave ydata = root:myYWave

    ...
end

You will have trouble with this using Global Fit, however, because you will need to figure out inside the fit function which data wave to look up. Perhaps a two-step process could be made to work. Establish a fit coefficient that really isn't a fit coefficient that is a data set number. Hold that coefficient during fits (if you don't, disaster will result!). Use it for figuring out which data wave to look up. Something like
Function myAAOFitfunc(Wave pw, Wave yw, Wave xw) : FitFunc
    Variable setNumber = pw[0]      // doesn't have to be [0]; pick what's convenient
    SVAR datawavename = $("NumberData"+num2str(SetNumber))
    WAVE ydata = $datawavename
    ...
end

The set-up will not be easy- you will have to create the global string variables, set the data set number in the initial guesses tab for each data set in Global Fit, and make sure everything matches correctly. And don't forget to turn on the Hold checkbox.

I have one more thought that might make the look-up and management a bit easier, though still cumbersome. Instead of a bunch of string variable globals, use a WAVE wave. To set up, you would create a wave:
Make/WAVE/N=(number of data sets) datasets
datasets[0] = yWave0
datasets[1] = yWave1
etc.

The look-up in the function is then a bit easier:
Function myAAOFitfunc(Wave pw, Wave yw, Wave xw) : FitFunc
    Wave/WAVE datasets
    Variable setNumber = pw[0]      // doesn't have to be [0]; pick what's convenient
    WAVE ydata = datasets[setnumber]
    ...
end



John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
If you are using pseudo-Voigt peak shapes (sum of a Gaussian and a Lorentzian) the Shirley background can be described as the integral of the Gaussian and Lorentzian parts of the peak, which can be described by the error function erf() and arcus tangent atan(), respectively. Both functions are available in Igor. That way the Shirley background simply becomes another peak parameter.

I have never worked with Tougaard backgrounds, though...
John Week's suggestion to access the resultant peak fits to generate the background seems at first not to work since both Shirley and Tougaard require integrals of the entire peak envelope. You can only generate the integral after the component peaks are generated not during their generation. Then, Ole Lytken's suggestion to use the erf() and atan() functions as the analytical integral expressions for Gauss + Lorentz seems to resolve the problem. It gives a true first principles approach to generate the Shirley iteratively during the peak fitting. The caution is however that Voigt is a complex expression of Gauss and Lorentz. In principle, it is to be a convolution of the two functions. In practice, it is generated using the equivalent of a lookup table. In other words, the Voigt function that is being fit is an approximation (albeit a good one) to the true analytical Voigt function. The Tougaard baseline adds one additional parameter over the Shirley. As far as I recall, the Tougaard is best when used to fit to a survey scan. Only then do you have sufficient range of the baseline to give high enough confidence in the additional fitting parameter.

I generally considered the approach taken to generate a Shirley baseline by first principles from the fitting peaks to be over-kill. I would always integrate the raw data curve, generate a smooth polynomial function, adjust it to the two ends of the spectrum, and subtract it. Then, I would fit peaks to the flattened spectrum. The caution is that one must have a sufficiently flat range on the high and low BE sides of the XPS peaks. I applied a few tricks to handle cases where the baseline was not sufficiently flat, especially on the high BE side of the peak.

Otherwise, have you considered using either of the existing packages for XPS analysis before writing your own?

http://www.igorexchange.com/project/XPStools

http://www.igorexchange.com/project/FitXPS

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
Dear John, Ole, and Jeffrey

Thank very much you for your swift replies (I wasn't aware that I would not be notified by email when somebody would answer to my posting) and of course for your input!

To John:
Thanks for confirming that structure functions are incompatible with global fits!
Yes, the Shirley and Tougaard backgrounds use the spectral area, but, alas, they both take more than a simple integration.
And yes, the fit peaks necessarily represent in part the spectral area (if the chosen model is appropriate). They do not contain the background, however. I aim at fitting background and peaks in one go: Therefore the resulting fit curve consists of peaks plus background and that requires fitting both simultaneously...
I will definitely give your suggestions to address data waves by defining a held "not-to-be-fit" variable a try even though they will involve some extra caution! I will report back on success or failure as soon as I got to try these things out...

To Ole:
I am using "true" Voigt lineshapes (actual Lorentzian peaks with Gaussian broadening) and "true" Shirley and Tougaard backgrounds: There your approach does not apply unfortunately.
And, if I may, some words of clarification: An actual Shirley background is generated by an iterative process. Thereby area ratios between the measured spectrum and the generated background (starting from a constant background) on the left- and right-hand side (assuming spectra plotted vs increasing kinetic or decreasing binding energy) of each X value are used. In the end the Shirley background at any point X in a spectrum is proportional to the quotient of the area on the right-hand side of X and the total area between background and spectrum. Even though a simple integration can yield a somewhat similarly-looking background, it is unfortunately not the same...

To Jeffrey:
Yes, I am afraid, John's first suggestion would not work because a Shirley or Tougaard background requires more than just a simple integral (unless Igor would accept feeding the intermediate fit curves back into a "parallel" process) and the resulting fit curve itself contains the peaks and the background. His second suggestion however looks promising!
I am not using the built-in Voigt functions in Igor, I wrote my own convoluted functions, but thanks for pointing out the difference: Many people seem to be not aware of that!
The "beauty" of going all the way with Shirley and Tougaard backgrounds is that their application allows a very detailed quantitative analysis of resulting spectra and individual spectral components, as well as of spectral shapes. Yes, it is more complex and still also Tougaard's approaches to explicitly take into account inelastic electron scattering is far from giving the true background: Still for me the outcome so far has been always worth the effort!
For individual spectra I use Igor's multi-peak-fitting-2 package amended with my own peak and background functions and that works like a charm: The real advantage being that fitting both peaks and background in one go minimizes spectral shape distortions and prevents possible erroneous removal of spectral intensity on the high binding-energy sides of spectra, effects that can occur when doing background subtraction before peak fitting.
I have had a look at both "XPStools" and "FitXPS" in the past, and they are without a doubt nice packages. However, both did not allow me to do everything I wanted. That is why I (reluctantly at first) went my own way.

Thank you all again very much for your input!

Kind regards

Daniel
The Shirley background is defined as the integral of the peak. If the integral of the peak fit function is know analytically, as it is for a pseudo-Voigt peak, you can easily include the Shirley background in the fit. However, if you use a fit function for which the integral cannot be expressed analytically, as for a true Voigt function, you will have to calculate the integral numerically. You can do this based on the measured data, which is the traditional approach, but the accuracy is limited by the spacing between the data points in your measured data. You can also calculate the numeric integral based on the fit function, using as fine an x-wave spacing as you desire, which slows down your calculation, but gives you a more and more accurate integral.

The nice things about including the background as a part of the fit (either analytically or numerically) is that it removes the need for someone to manually make selections that may deviate from person to person. I am certainly never going back to manually selecting where my background should start and end.

The deviation between my measured data and my fits is usually dominated by other factors than the small difference between Voigt and pseudo-Voigt, so I am happy with the convenience of the pseudo-Voigt. If you are using a true Voigt function numeric integration of the fit function with an x-spacing ten times finer than your measured data is probably the way to go. But it will be slow.
A couple of thoughts-

It still seems to me that, to the extent that the fitted peaks are a reasonable representation of the underlying "true" peaks, the Shirley or Tougaard backgrounds could be computed from the model peaks instead of the data. This won't work with Multipeak Fit 2, unfortunately, because of a decision I made early on to use the "sum of fit functions" method. Consequently, the peaks aren't available to the background function (at least not easily). But in your case, it seems that you are writing your own single fit function and you know the latest guess at the peaks before you need to compute the background. True, during intermediate iterations the peaks aren't the best representation of the peaks in the data, but that's why it iterates. By the end, it should all converge to a good representation of the data, as long as the model is appropriate and reflects that actual physics of the problem.

Maybe this is relevant: http://www.qro.cinvestav.mx/~aherrera/reportesInternos/peakShirley.pdf

As for the use of a "true convolution" to compute the peaks, instead of using the built-in VoigtPeak function: In Igor 8, we are using a numerical approximation of a true Voigt peak shape that the author claims has accuracy "typically at at least 13 significant digits". It's hard to imagine that you can do a numerical convolution that achieves better accuracy. And the built-in function will be *very* much faster than the numerical convolution.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Your thoughts are very much appreciated! I guess you are right, using the fit peaks instead of the spectral data should work as well. And arguably that approach would also be more elegant. I have not tried this yet however. I probably will have to do so anyhow because using the original data by referencing it using a WAVE wave failed so far: I must be doing something profoundly wrong there, but I still haven't figured out what exactly. Igor claims that it is missing a wave reference. It seems to not like
WAVE ydata = datasets[setnumber]
It makes a wave reference "ydata" pointing to a "null" wave which makes things fail of course. Could I be unaware of any naming conventions for the yWaves that I put in the WAVE wave "datasets"? I know the report you linked to, and it is indeed relevant (as are also other reports and works by Alberto Herrera-Gomez!) and illustrates nicely the reason why I am trying to do all this. (And again, in Multipeak Fit 2 I am already using all this successfully and happily.) For anyone interested, Proctor and Sherwood show very comprehensibly how to create a Shirley background by iterating over the original data in Anal. Chem. 1982, 54, 13-19 (https://pubs.acs.org/doi/10.1021/ac00238a008). The reason I chose to use true convolution (despite the resulting slower performance) is that I like (and need) to have the Gaussian broadening as an explicit fitting parameter (not only for Voigt but e.g. also for Doniach-Sunjic lines that I use). The approximation of the Voigt lineshape already in Igor version 6 is fine (I only chose to modify it in order to improve the accuracy of the calculated widths and errors in Multipeak Fit 2... maybe that would be a good amendment to Igor 8, too?)
johnweeks wrote:
A couple of thoughts- It still seems to me that, to the extent that the fitted peaks are a reasonable representation of the underlying "true" peaks, the Shirley or Tougaard backgrounds could be computed from the model peaks instead of the data. This won't work with Multipeak Fit 2, unfortunately, because of a decision I made early on to use the "sum of fit functions" method. Consequently, the peaks aren't available to the background function (at least not easily). But in your case, it seems that you are writing your own single fit function and you know the latest guess at the peaks before you need to compute the background. True, during intermediate iterations the peaks aren't the best representation of the peaks in the data, but that's why it iterates. By the end, it should all converge to a good representation of the data, as long as the model is appropriate and reflects that actual physics of the problem. Maybe this is relevant: http://www.qro.cinvestav.mx/~aherrera/reportesInternos/peakShirley.pdf As for the use of a "true convolution" to compute the peaks, instead of using the built-in VoigtPeak function: In Igor 8, we are using a numerical approximation of a true Voigt peak shape that the author claims has accuracy "typically at at least 13 significant digits". It's hard to imagine that you can do a numerical convolution that achieves better accuracy. And the built-in function will be *very* much faster than the numerical convolution. John Weeks WaveMetrics, Inc. support@wavemetrics.com
Okay, I know now where I went wrong. My bad, I did not take your code literally:
Make/WAVE/N=(number of data sets) datasets
datasets[0] = yWave0
datasets[1] = yWave1
etc.
Instead I simply created a TEXT wave called "datasets" into which I simply typed the names (or path:names) of the waves I wanted to use. Apparently it's illegal to reference rows in textwaves like that? I admittedly have not tried programming with textwaves before...
If your datasets wave is a text wave, then you need the $ operator to look up the wave named by a particular row of the wave:
WAVE ywave = $(datasets[setnumber])
One of the reasons I suggested the WAVE wave approach is that it is potentially faster than the text wave approach. In most cases not much faster, but if you have a large number of waves, and you're using Igor 7 or before, it could be significant. John Weeks WaveMetrics, Inc. support@wavemetrics.com