Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve CDATA sections, don't convert them to generic Text nodes. #3368

Closed
scabug opened this issue Apr 28, 2010 · 14 comments
Closed

Preserve CDATA sections, don't convert them to generic Text nodes. #3368

scabug opened this issue Apr 28, 2010 · 14 comments
Assignees
Milestone

Comments

@scabug
Copy link

scabug commented Apr 28, 2010

Sort of defeats part of the purpose of using a CDATA...

scala> <hi><![CDATA[This & That]]></hi>
res0: scala.xml.Elem = <hi>This &amp; That</hi>
@scabug
Copy link
Author

scabug commented Apr 28, 2010

Imported From: https://issues.scala-lang.org/browse/SI-3368?orig=1
Reporter: @acruise

@scabug
Copy link
Author

scabug commented Apr 28, 2010

Dmitry Grigoriev (dimgel) said:
The task is incorrectly formulated IMHO. I consulted with XML pros and here's the situation.

  1. XML 1.0 [http://www.w3.org/TR/REC-xml/#sec-cdata-sect] says: "Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "<" and "&". CDATA sections cannot nest."

  2. DOM Level 3 Core [http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#DOMConfiguration] defines DOMConfiguration parameter "cdata-sections" which determines whether CDATA must be converted to text nodes on serialization or not. There's also an entry in DOM FAQ for how this conversion must be done: [http://www.w3.org/DOM/faq.html#CDTA-text]

Currently Scala converts CDATA nodes to Text nodes (cdata-sections=false). The behaviour I (and David Pollak for 3 years ;) advocate for is to see CDATA sections serialized unchanged (cdata-sections=true), because:

  1. CDATA sections are necessary to embed Javascript code into XHTML pages;

  2. One already can define text nodes in XML literals;

  3. Generally, Scala should check XML literal wellformedness, but NOT perform any conversions. Same problem is tracked by ticket Represent empty XML elements in short form #1118 (
    vs

    ). Latest comments there show that guys came to preserving element emptiness mode as specified in XML literal, and that's great.

So I suggest ticket should be renamed to "Preserve CDATA sections".

@scabug
Copy link
Author

scabug commented Apr 28, 2010

@acruise said:
Good suggestion on the title, thanks.

@scabug
Copy link
Author

scabug commented May 5, 2012

Sebastian Nozzi (sebnozzi) said:
Is there a workaround to this?

@scabug
Copy link
Author

scabug commented Nov 1, 2013

Christian Zahl (aixpower) said:
Is there any hope that this issue will ever been solved? In our application we need this in order to transfer data as an interface between two applications. As this is not working as intended (i.e. unparsed data xfer) we cannot make use of it.

We see two alternatives:

  • encoding everything in base64
  • using a custom printer

@scabug
Copy link
Author

scabug commented Nov 1, 2013

@acruise said:
The separation of scala-xml into a separate project bodes well for making this kind of change more quickly, but unfortunately I think this particular one will require a change in the compiler XML API.

@scabug
Copy link
Author

scabug commented Dec 18, 2014

Michael Beckerle (mbeckerle.dfdl) said:
Just upvoted this. It's been quiet for a long time now.
Our example: We have XML like this:

<dfdl:assert testKind="pattern"><![CDATA[(?x) # free form regex
abc # and a comment
#  more comment
def # that's all
</dfdl:assert>

We have unit tests where that appears as literal XML text in a scala/junit-like test.

 @Test def testRegexWithFreeFormAndComments3() = {
    val testSuite =
      <tdml:testSuite suiteName="theSuiteName" xmlns:tns={ tns } xmlns:tdml={ tdml } xmlns:dfdl={ dfdl } xmlns:xsd={ xsd } xmlns:xs={ xsd } xmlns:xsi={ xsi }>
        <tdml:defineSchema name="mySchema">
          <dfdl:format ref="tns:daffodilTest1"/>
          <xsd:element name="data" type="xsd:string" dfdl:lengthKind="delimited">
            <xsd:annotation>
              <xsd:appinfo source="http://www.ogf.org/dfdl/">
                <!-- This assert passes only if free form works, and comments work. -->
                <dfdl:assert testKind='pattern'><![CDATA[(?x) # free form
abcd # a comment
# a line with only a comment
123 # another comment
]]></dfdl:assert>
              </xsd:appinfo>
            </xsd:annotation>
          </xsd:element>
        </tdml:defineSchema>
        <tdml:parserTestCase xmlns={ tdml } name="testRegex" root="data" model="mySchema">
          <tdml:document>
            <tdml:documentPart type="text"><![CDATA[abcdef]]></tdml:documentPart>
          </tdml:document>
          <tdml:errors>
            <tdml:error>assertion failed</tdml:error>
          </tdml:errors>
        </tdml:parserTestCase>
      </tdml:testSuite>
    val ts = new DFDLTestSuite(testSuite)
    ts.runOneTest("testRegex")
  }

Obviously this regex language, and many other embedded languages, must have the whitespace preserved (especially line endings).
The workaround is to put PCData nodes in explicitly:

 @Test def testRegexWithFreeFormAndComments3() = {
    val cdataText = """(?x) # free form
abcd # a comment
# a line with only a comment
123 # another comment
"""
    val cdata = new scala.xml.PCData(cdataText)
    val testSuite =
      <tdml:testSuite suiteName="theSuiteName" xmlns:tns={ tns } xmlns:tdml={ tdml } xmlns:dfdl={ dfdl } xmlns:xsd={ xsd } xmlns:xs={ xsd } xmlns:xsi={ xsi }>
        <tdml:defineSchema name="mySchema">
          <dfdl:format ref="tns:daffodilTest1"/>
          <xsd:element name="data" type="xsd:string" dfdl:lengthKind="delimited">
            <xsd:annotation>
              <xsd:appinfo source="http://www.ogf.org/dfdl/">
                <!-- This assert passes only if free form works, and comments work. -->
                <dfdl:assert testKind='pattern'>{ cdata }</dfdl:assert>
              </xsd:appinfo>
            </xsd:annotation>
          </xsd:element>
        </tdml:defineSchema>
        <tdml:parserTestCase xmlns={ tdml } name="testRegex" root="data" model="mySchema">
          <tdml:document>
            <tdml:documentPart type="text"><![CDATA[abcdef]]></tdml:documentPart>
          </tdml:document>
          <tdml:errors>
            <tdml:error>assertion failed</tdml:error>
          </tdml:errors>
        </tdml:parserTestCase>
      </tdml:testSuite>
    val ts = new DFDLTestSuite(testSuite)
    ts.runOneTest("testRegex")
  }

But.... then that fails because we pretty print these test suites to temp files, and the pretty printer doesn't preserve the contents of the PCData nodes either!

@scabug
Copy link
Author

scabug commented Dec 21, 2014

@som-snytt said (edited on Apr 10, 2015 2:08:23 AM UTC):
Anyone who still cares, or used to care, may volunteer to review

scala/scala#4306

There's a flag -Xxml:coalescing that converts to text nodes and merges.

I don't know whether it's considered gauche to track the PR in the comments.

@scabug
Copy link
Author

scabug commented Apr 10, 2015

@adriaanm said:
Au contraire, it's the droite thing to do!

@scabug
Copy link
Author

scabug commented Apr 10, 2015

@som-snytt said:
This isn't the droite you're looking for.

@scabug
Copy link
Author

scabug commented Apr 13, 2015

@som-snytt said:
Reopening long enough to preserve the current behavior, i.e., converting to text nodes. Our descendants may choose to default to a more correct lifestyle.

@scabug
Copy link
Author

scabug commented Apr 16, 2015

@som-snytt said:
scala/scala#4451

@scabug scabug closed this as completed Apr 16, 2015
@scabug
Copy link
Author

scabug commented Jul 29, 2015

Michael Beckerle (mbeckerle.dfdl) said (edited on Jul 29, 2015 1:57:15 PM UTC):
Is this issue resolved? It seems that work has been done to put in a switch that provides the behavior needed. If so then this issue seems done. If not should this be re-created on the scala-xml github issues list?

(Note: I created scala/scala-xml#74, but will close it out if we determine this issue is actually complete already.)

@scabug
Copy link
Author

scabug commented Jul 29, 2015

@som-snytt said:
Status = CLOSED AND Resolution = Fixed is Jira speak for "resolved." "Merged" is github's way of saying that if I'm run over by a bus next week while on holiday, and there really is a bus involved, then some tiny trace of this sorry life will attain to -- if not immortality in any measure -- then at least to the tiresome afterlife of jira commentary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants