how to extract binary data from an XML file

Project:XMLutils - XOP to facilitate working with XML files
Version:IGOR.5.04.x-1.x-dev
Component:Miscellaneous
Category:support request
Priority:normal
Assigned:andyfaff
Status:closed
Description

Hi,
First of all, thank you very much for this package! I am trying to use it to import mass spectra recorded stored in mzXML, which is a sort of standard format in that field (Nat. Biotechnol. 22 (11): 1459–66 ; Expert review of proteomics 2 (6): 839–45).
My main problem it that, the spectra are stored as a couple of intensity and mass in binary.
Using the XMLaveFMXapth function it seems I can access those data, but they are converted in ASCII.
Thank you for helping.
Regards,
Alex

#1

Please attach a test file and the node that you want to extract. In the meantime you may want to experiment with setting the delimiterStr in XMLwavefmXpath to "". Alternatively use XMLstrFmXpath.

XMLwavefmxpath will give you the node content as a single entry in a text wave (if you use the "" delimiterStr).
XMLstrfmXpath will give you a string.

You may then need to convert the string data to a numerical wave using the SOCKITstringtowave operation (supplied by the SOCKIT xop).

If it is compressed you may need to do extra jiggery pokery, but I'm fairly sure what you want to do will be possible.

A.

#2

Thank you for your reply Andy.
You'll find here a sample file, called test.txt ; it the mzXML file for which I just change attribute. I hope you can read it.
I access the binary data by the following commands :

Variable fileID
String tempStr

fileID = xmlopenfile("/Users/alexandre/Desktop/mzXML2Igor/CytC_M15_01.mzXML")
xmlelemlist(fileID)
XMLwaveFmXpath(fileID,"/*/*[1]/*[4]","http://sashimi.sourceforge.net/schema_revision/mzXML_3.1","")
tempStr = M_xmlcontent[2]

But I cannot convert this string into proper wave using the SOCKIT functions.
Do you have an idea ?
Regards,
Alex

AttachmentSize
test.txt223.59 KB

#3

Assigned to:Igor Pro User» andyfaff

It would help if I knew what the data was supposed to look like :-).
It turns out the reason you can't see the data very well is because it's base64 encoded - http://sashimi.sourceforge.net/schema_revision/mzXML_2.1/Doc/mzXML_2.1_t...

Try this out for size:

Variable fileID
String tempStr
fileID = xmlopenfile("foobar:Users:anz:Desktop:tmp:test.xml")
print fileID
string peaks = XMLstrFmXpath(fileID,"//ns:peaks[1]","ns=http://sashimi.sourceforge.net/schema_revision/mzXML_3.1","")
print strlen(peaks)
xmlclosefile(fileID, 0)
string data = base64decode(peaks)
print strlen(data)
//32 bit unsigned integer?
sockitstringtowave/E 96, data
Display/K=0  root:W_stringToWave

You will need to install the base64 xop. However, the version currently available is not able to do it. I need to recompile it tonight as I had to improve it to work with binary data. The only step I'm not sure about is the sockitstringtowave step.
In the specification document it says that the peaks are supposed to be m/z – intensity pairs, from the example it's uint32. But I can't be sure about the layout if I don't know what the example file is supposed ot look like.

#4

the new version of base64 is now available.

#5

Dear Andy,
Everything worked just fine (apart that the data were 64-bit float) !
Many, many thanks for your invaluable help. This will save me lot of difficulties.
Yours,
Alex

#6

Status:active» closed

Back to top