Is a spline fit "legal" for removing background from Raman spectra?

I need to remove background from some raman spectra. However, my spectra change a lot. Sometimes I'll see a lot of fluorescence that I can ignore, including wide band fluorescence. Other times I'll have a nice flat background all depending on what I'm looking at (and at what temperature.)

What I'm saying is that I can't always fit the background to a function. I generally only care about a few peaks in the spectrum, as I'm comparing peak heights and not looking at if peaks appear or not.

Can I "legally" use a spline fit with the background removal programs on this website to remove the background? I'm asking is this ok to use then publish the results? I just don't know how else I can remove the background of my spectra.

Any advice? The spline fit removes absolutely all of the background, but also removes wide, low intensity peaks. I generally don't care about those however. I'm just looking for a consistent way to find a baseline for my spectra.
What's proper or not is going to be up to you to decide.

Why do you need to remove the background? Is there some further analysis you're doing that is dependent on that? I know that in my x-ray diffraction papers, I never remove the background, but some analyses do require it.

Have a look at this paper for ideas on model-free background removal: Estimation of the background in powder diffraction
patterns through a robust smoothing procedure, Sergio Bruckner, J Appl. Cryst. 33:977-9
Thanks for the insight. I do HAVE to remove the background for a certain part of my analysis, but I think I'm going to leave the background in for the actual spectra. I usually don't have many spectra in the papers I write, only because we do kinetics with raman, so we generally report those instead.

Basically I need to remove the background to find an accurate peak height. I was previously getting away with fitting the peak to a gaussian then taking an average of the 5-10 points on each side of the peak, then using a linear fit between the averages of those points to act as my "background." That worked great for a while, but now I'm looking at different peaks which have other peaks near them so the "average of points on each side of the peak" often lands right on another peak, making my linear background removal useless.

You could argue that I could pick the peak heights by hand but I don't like that for a few reasons. 1. It can be (subconsciously) dishonest. 2. I have many hundreds of spectra to look at. 3. Computers produce more easily reproduced data and will remove any day-to-day variations in the analysis.

I think the spline fit will do fine. Thanks for the info, I'll take a look into that paper.
If you have a good model and can use a fitting procedure to get the peak info, you are best off to fit a background model along with the peaks. Especially if you have overlapping peaks, but even if they are close but well separated, it can be quite easy to take areas as background that are "contaminated" by the tails of your peaks.

Have you tried the Multipeak Fit package?

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
johnweeks wrote:

Have you tried the Multipeak Fit package?



I just did. It's a very well developed program. (I'm not just saying that because you developed it either.) For some of my spectra though, it gives odd backgrounds/and or odd peaks that aren't there. Admittedly, most of my spectra don't look like this, but some do, and I need to be able to handle all of them. I realize the program works by trying to minimize the residual after the removal of the background and peaks, but it seems to create peaks that don't exist in my data. EDIT: I forgot that you can manually pick peaks as well. That mitigates that problem.

For the rest of the normal spectra, my program gives results for the amplitude within a few #s of the multipeak program. (Considering I also use a gaussian to fit the peaks, that's what I would expect.)

Anyway, you seem to be a good person to ask. I've been looking for a book that describes the different techniques use to analyze raman data (or vibrational data in general). Do you have any suggestions? Something that describes when to use a gaussian or lorentzian fit, or when you can safely(legally) smooth your data, or when and how to remove your background. I know that you can find much of the information online, but I'd like to find an actual book. I can't seem to find anything, even through google. I constantly find "analysis WITH raman spectroscopy" or "vibrational spectroscopy analysis" (this isn't what I want, it just tells you how to analyze things with vibrational spectroscopy.)

My advisor suggested "An Introduction to Error Analysis" by Taylor (the book with the train falling out of a building.) I'll probably pick it up, but there has to be something more specific. I mean, vibrational spectroscopy has been around for a while now!
UglyRaman.pxp
I am no expert on Raman spectroscopy. I am just barely an expert on curve fitting.

What aspects of that data set are you interested in? The entire X range?

Do you know what kind of background you might expect? I know that in some cases you can get theoretical guidance even on the background shape.

My recollection is that one of our customers said something that I have remembered as suggesting that Raman peaks are Lorentzian. Don't take my word for it, though. I think your data set fits better with Lorentzian peak shape. I know just enough to know that Gaussian vs. Lorentzian is generally explained by different line-broadening processes. I have now exhausted my knowledge on this matter!

I have attached a fit to your data that I think is pretty good. But I cheated- I used an ArcTan baseline function that I wrote for another customer, and you don't have that one :) I think that customer was doing gas chromatography, so I don't know if it's justified in your case. And yes- I did fiddle some with the number of peaks. It looks like there might be another peak at the edge on the right, but I didn't include that. I think the peaks I used are all justified by "wiggles" in the data set.

Here is the code for my ArcTan baseline:
Function/S ArcTangentBP_BLFuncInfo(InfoDesired)
    Variable InfoDesired

    String info=""

    switch(InfoDesired)
        case BLFuncInfo_ParamNames:
            info = "y0;x0;deltaY;dilation;"
            break;
        case BLFuncInfo_BaselineFName:
            info = "arctan_BLFunc"
            break;
    endswitch

    return info
end

static Constant pio2 = 1.5707963267949

Function arctan_BLFunc(s)
    STRUCT MPF2_BLFitStruct &s
   
    Variable xx = (s.x-s.cwave[1])/s.cwave[3]
    Variable scale = s.cwave[2]/pi
   
    return s.cwave[0] + (pio2 + atan(xx))*scale
end

If you use it, be aware that you have to enter initial guesses for the coefficients yourself- there is no auto-guess for a baseline function. But they are straightforward- y0 is the Y level at the left (low X value) end, x0 is the middle of the sigmoid shape, deltaY is the difference between y0 and the right end, and dilation sets the width of the sigmoid transition, with larger values making it wider. You probably don't have to be very accurate with any of the coefficients.

Since I know so little about Raman, I suggest you ask your older colleagues for suggestions.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
MultipeakFit_Set3.png
reepingk wrote:
I've been looking for a book that describes the different techniques use to analyze raman data (or vibrational data in general). Do you have any suggestions? Something that describes when to use a gaussian or lorentzian fit, or when you can safely(legally) smooth your data, or when and how to remove your background. I know that you can find much of the information online, but I'd like to find an actual book. I can't seem to find anything, even through google. I constantly find "analysis WITH raman spectroscopy" or "vibrational spectroscopy analysis" (this isn't what I want, it just tells you how to analyze things with vibrational spectroscopy.)

My advisor suggested "An Introduction to Error Analysis" by Taylor (the book with the train falling out of a building.) I'll probably pick it up, but there has to be something more specific. I mean, vibrational spectroscopy has been around for a while now!


I did a lot of Raman work in the 1990's and early 2000's, but very little since then. My thoughts are are follows:

For a simple system (simple molecule or well ordered crystal, for example) the Raman bands are usually Lorentzian in shape. But, the spectrometer will have a finite resolution, usually imposing a Gaussian distribution on the measured band. If the resolution is of the same order as the natural width of the Raman band then I found a good fit using a shape that is a mixture of both Lorentzian and Gaussian functions. This was usually the case with the 520cm-1 band from silicon, which I used as my basic instrument QC before and after most experiments.

If your sample is less uniform (on the length scale that is relevant for the Raman band in question), then you may have a distribution of contributions to your measured spectrum, resulting in some (significant) broadening. The detail of this is very much dependent upon the sample in question.

As for the question of smoothing - this is a matter of opinion - personally I would only smooth data for presentation purposes, and then only if it was necessary to clarify my interpretation - any fitting would be done on the un-smoothed data unless I had a good reason to do otherwise (one example of the latter is a median smooth which can eliminate cosmic-ray spikes in the data).

For background removal/subtraction - I think it is important to choose a function which has a few parameters as possible. I like the look of the arctan function that John Weeks used for this particular case, but I am not sure whether it would be appropriate in general. It is a matter trial and error, unless you have an understanding as to a more appropriate function to use from your knowledge of the sample. [One interesting method is to use an excitation wavelength that is slightly shifted - the Raman bands will move with the excitation, but the background should stay more or less the same].

I don't think there is a single 'right' way to handle your data - the important thing is that whatever you do is sensible, and you explain what you have done (and why) in sufficient detail that others can understand and reproduce the analysis.

As for a book - there are plenty out there. Again it is all a matter of opinion as to which are appropriate or not. I can suggest, for example, 'Handbook of Raman Spectroscopy: From the Research Laboratory to the Process Line' Ed. Edwards & Lewis. Have a look at the start of chapter 7 for some discussion on smoothing and baseline removal.

For me, the most important thing is to clearly have in mind the specific question that you are asking the data to answer. If you get this straight, then the question of what is an appropriate analysis method will often become much clearer.

Finally, I had a quick look at your spectrum. I do not know what your sample is, but it reminds me of disordered carbon - the two bands are the 'D' and 'G' bands which are related to sp2 and sp3 moieties (it is some 20 years since I looked at this type of sample, so my memory may be a little rusty). Back then, there was much debate in the literature as to the detail of their origin.

Hope this helps,
Kurt

KurtB wrote:


I did a lot of Raman work in the 1990's and early 2000's, but very little since then. My thoughts are are follows:

...

Hope this helps,
Kurt



Kurt, thank you very much for your detailed response. You are correct that the spectrum posted shows the D and G bands of graphitic carbon. Unfortunately, the problem I have is that my sample is heterogeneous and also changing with time, so no one background will always fit nicely. I'd like to fit each and every one of my spectra by hand with a background function, but that's simply not feasible. I have many hundreds of spectra for each and every experiment. (Usually 2-3 experiments per week.)

I agree with your opinions on smoothing. I run my fits on the raw data, and only smooth if I need to clarify something. I have been running into some problems recently though with large bits of (locally inhomogeneous) noise landing directly over the center of my peak (relatively low signal intensities) and throwing off the amplitude returned from the fit. (This generally happens when the carbon peak intensities approach zero, and begin getting overrun with noise.) I think I may run some error checking in my fitting routine where if the amplitude returned is significantly different from the past few, smooth the data and run that fit again to see if there are differences. If there are, ask me if it's ok, and if not, then just report the original data. Generally, if I smooth the data, even slightly, the fit will return more reasonable amplitudes. For example, I had a set of data recently that the kinetics trace was trending around 20 (arbitrary intensity units). Suddenly, one point jumped to 120, then back down to ~20 for the next point. Obviously this didn't happen in the experiment, and sure enough, when I go back and look at the data, there is a large bit of noise right in the center of my peak. This scenario doesn't happen often, but when it does it puts a large spike into my kinetics traces. Maybe I could color that specific point differently to differentiate it from normal, un-smoothed data. I wonder if that's even possible...

I do already have a cosmic ray removal technique. Basically I scan through the data point by point, and check to see if the adjacent points are relatively close to each other in intensity. Cosmic rays generally have huge intensities and are almost always a single point, so if I look through the data, comparing point "a" with point "a+1", I can tell if a cosmic ray is there with a simple comparison of the intensities of adjacent points. Even very sharp peaks like the silicon 520 band mentioned above don't trip up this method.

Anyway, I believe I've gotten off topic. Thank you for the book suggestion, I'll go pick it up at the library and give it a read. I'll strive to make my data analysis well researched, thought out, and repeatable.
Your noise doesn't have zero mean?

Since standard smoothing is basically an averaging process, I would guess that the result of smoothing biased noise would be a biased fit.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com