Only some special characters get escaped

Project:XMLutils - XOP to facilitate working with XML files
Version:IGOR.5.04.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

If I use XMLsetNodeStr to add a string like

"\"&"

that is the string that is equivalent to num2char(34)+num2char(38), it does not get set correctly. The ampersand gets escaped to & in the resulting XML as it should, but the quotation mark is left alone. If I try to add the escaped quotation " directly, the resulting XML is " which means it has escaped the escaped ampersand. So I have no idea how I am supposed to add a quotation mark to an XML string. In my case, the goal is to list quotation mark amongst a list of special characters that should be colored correctly by a syntax highlighter. In fact, I am using the wonderful XMLUtils XOP to make an Igor syntax highlighter for other text editors, and this is the last holdup.

#1

I just noticed that IgorExchange mangled the bug report I just sent (due, ironically, to some escape code issue), but use num2char(34)+num2char(38) with XMLsetNodeStr and you will get the picture.

#2

One has to escape various characters to make sure they never make it into valid XML files, as re-reading them would be impossible. One strictly can't have "<" or "&" in XML files. THe following text is quoted from the W3C website:

XML text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty elements, entity references, character references, comments, CDATA sections, document type declarations, and processing instructions.

All text that is not markup constitutes the character data of the document.

The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within comments, processing instructions, or CDATA sections. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<". The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be so represented when it appears in the string "]]>", when that string is not marking the end of a CDATA section.

In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>".

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """.

THis basically says you can't have "&" anywhere, unless it is escaped to &. You can have quote marks in an element node, but if you want a quote (""") in an attribute it has to be escaped as """.
THe following code illustrates that:

variable fileID
•//create an XML file
•fileID = XMLcreatefile("/Users/anz/Desktop/test.xml","bar","","")
•xmladdnode(fileID, "/bar", "", "poop", "\"", 1)
xmlsetattr(fileID, "/bar/poop", "", "flor","\"")
•xmlsavefile(fileID)
xmlclosefile(fileID, 1)

This should give:

<?xml version="1.0"?>
<bar xmlns="">
  <poop flor="&quot;">"</poop>
</bar>

#3

I see, I didn't realize that it was okay to have double quote appear unescaped in the content of the XML file. I think that resolves the issue.

#4

I am still unable to figure out how to get a < symbol into a CDATA section. Tthe CDATA section is a node string that looks like

<!CDATA...>

I am using XMLstrFmXpath to extract it, doing a string substitution and then using XMLsetNodeStr to put it back. No matter how I escape it, the < and > get mangled to
&lt; and &gt;
Is this type of substitution possible with your XOP? Forgive my use of Igor blocks in this post, but it's the easiest way to make sure my comment text gets escaped correctly.

#5

I think that's probably because I use a function that escapes all entities when it replaces/creates nodes, xmlEncodeEntitiesReentrant. As such it's a one stop shop for putting stuff into an XML file. It's fine for the majority of things, but there are places it doesn't work.

e.g. I designed the XOP to be able to extract everything, but I only really thought about writing a subset of XML (didn't worry about CDATA, comments, etc). This was not an oversight, but a practical realisation that I didn't have time to do everything.

The code in XMLsetNodeStr that does the business is the following. You'll see that I do a blind encoding of the string, no matter what the type of node it's going to is. THis way I don't need to test for "is it a CDATA block, comment block, processing instruction?". It also means it's guaranteed to not crash or mangle the document, apart from mangling what the user really wants. If you felt keen you could create a patch and I could commit it into the project.

static int
update_xpath_nodes(xmlNodeSetPtr nodes, const xmlChar* value) {
    int err = 0;
	int size;
    int i;
	xmlChar *encValue = NULL;
 
	if(nodes->nodeTab && nodes->nodeTab[0]!= NULL){
		encValue = xmlEncodeEntitiesReentrant(nodes->nodeTab[0]->doc, value);
	}
    size = (nodes) ? nodes->nodeNr : 0;

Back to top