Only some special characters get escaped
Posted March 31st, 2010 by RGerkin
| Project: | XMLutils - XOP to facilitate working with XML files |
| Version: | IGOR.5.04.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
If I use XMLsetNodeStr to add a string like
"\"&"
that is the string that is equivalent to num2char(34)+num2char(38), it does not get set correctly. The ampersand gets escaped to & in the resulting XML as it should, but the quotation mark is left alone. If I try to add the escaped quotation " directly, the resulting XML is " which means it has escaped the escaped ampersand. So I have no idea how I am supposed to add a quotation mark to an XML string. In my case, the goal is to list quotation mark amongst a list of special characters that should be colored correctly by a syntax highlighter. In fact, I am using the wonderful XMLUtils XOP to make an Igor syntax highlighter for other text editors, and this is the last holdup.

#1
I just noticed that IgorExchange mangled the bug report I just sent (due, ironically, to some escape code issue), but use num2char(34)+num2char(38) with XMLsetNodeStr and you will get the picture.
#2
One has to escape various characters to make sure they never make it into valid XML files, as re-reading them would be impossible. One strictly can't have "<" or "&" in XML files. THe following text is quoted from the W3C website:
THis basically says you can't have "&" anywhere, unless it is escaped to &. You can have quote marks in an element node, but if you want a quote (""") in an attribute it has to be escaped as """.
THe following code illustrates that:
This should give:
#3
I see, I didn't realize that it was okay to have double quote appear unescaped in the content of the XML file. I think that resolves the issue.
#4
I am still unable to figure out how to get a < symbol into a CDATA section. Tthe CDATA section is a node string that looks like
I am using XMLstrFmXpath to extract it, doing a string substitution and then using XMLsetNodeStr to put it back. No matter how I escape it, the < and > get mangled to
#5
I think that's probably because I use a function that escapes all entities when it replaces/creates nodes, xmlEncodeEntitiesReentrant. As such it's a one stop shop for putting stuff into an XML file. It's fine for the majority of things, but there are places it doesn't work.
e.g. I designed the XOP to be able to extract everything, but I only really thought about writing a subset of XML (didn't worry about CDATA, comments, etc). This was not an oversight, but a practical realisation that I didn't have time to do everything.
The code in XMLsetNodeStr that does the business is the following. You'll see that I do a blind encoding of the string, no matter what the type of node it's going to is. THis way I don't need to test for "is it a CDATA block, comment block, processing instruction?". It also means it's guaranteed to not crash or mangle the document, apart from mangling what the user really wants. If you felt keen you could create a patch and I could commit it into the project.