<br><font size=2 face="sans-serif">Hi guys,</font>
<br>
<br><font size=2 face="sans-serif">For reference, and from wikipedia (search
"newline"):</font>
<br>
<br><font size=2 face="sans-serif">"</font><font size=2>The </font><a href=http://en.wikipedia.org/wiki/Unicode><font size=2 color=blue><u>Unicode</u></font></a><font size=2>
standard addresses the problem by defining a large number of characters
that conforming applications should recognize as line terminators:</font>
<p><tt><font size=2> LF</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
Line Feed, </font><tt><font size=2>U+000A<br>
CR</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
</font><a href=http://en.wikipedia.org/wiki/Carriage_return><font size=2 color=blue><u>Carriage
Return</u></font></a><font size=2>, </font><tt><font size=2>U+000D<br>
CR</font></tt><font size=2>+</font><tt><font size=2>LF</font></tt><font size=2>:
</font><tt><font size=2>CR</font></tt><font size=2> followed by </font><tt><font size=2>LF</font></tt><font size=2>,
</font><tt><font size=2>U+000D</font></tt><font size=2> followed by </font><tt><font size=2>U+000A<br>
NEL</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
Next Line, </font><tt><font size=2>U+0085<br>
FF</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
Form Feed, </font><tt><font size=2>U+000C<br>
LS</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
Line Separator, </font><tt><font size=2>U+2028<br>
PS</font></tt><font size=2>:</font><tt><font size=2> </font></tt><font size=2>
Paragraph Separator, </font><tt><font size=2>U+2029</font></tt><font size=2 face="sans-serif">"</font>
<br>
<br><font size=2 face="sans-serif">... so I guess, during parse, any of
these sequences should match %NL; (maybe excluding FF and PS as being more
significant than a single new line?). I agree with Mike, for unparse we'd
presumably need a new property to specify this.</font>
<br>
<br>
<br><font size=2 face="sans-serif">Again, from wikipedia, this time regarding
whitespace:</font>
<br>
<br><font size=2 face="sans-serif">"</font><font size=2>In </font><a href=http://en.wikipedia.org/wiki/Unicode><font size=2 color=blue><u>Unicode</u></font></a><font size=2>
(Unicode Character Database) the following codepoints are defined as whitespace:</font>
<ul>
<li><font size=2>U0009-U000D (Control characters, containing TAB, </font><a href=http://en.wikipedia.org/wiki/CR><font size=2 color=blue><u>CR</u></font></a><font size=2>
and </font><a href=http://en.wikipedia.org/wiki/LF><font size=2 color=blue><u>LF</u></font></a><font size=2>)</font>
<li><font size=2>U0020 SPACE</font>
<li><font size=2>U0085 NEL</font>
<li><font size=2>U00A0 NBSP</font>
<li><font size=2>U1680 OGHAM SPACE MARK</font>
<li><font size=2>U180E MONGOLIAN VOWEL SEPARATOR</font>
<li><font size=2>U2000-U200A (different sorts of spaces)</font>
<li><font size=2>U2028 LSP</font>
<li><font size=2>U2029 PSP</font>
<li><font size=2>U202F NARROW NBSP</font>
<li><font size=2>U205F MEDIUM MATHEMATICAL SPACE</font>
<li><font size=2>U3000 IDEOGRAPHIC SPACE</font><font size=2 face="sans-serif">"</font></ul>
<br><font size=2 face="sans-serif">....so presumably &WSP; would match
any of these characters on parse. What should it generate on unparse?</font>
<br>
<br><font size=2 face="sans-serif">Cheers,</font>
<br>
<br><font size=2 face="sans-serif">Ian</font>
<br><font size=2 face="sans-serif"><br>
Ian Parkinson<br>
WebSphere ESB Development<br>
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td><font size=1 face="sans-serif">Alan Powell/UK/IBM@IBMGB</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td><font size=1 face="sans-serif">"Mike Beckerle" <mbeckerle@OCO-INC.COM></font>
<tr>
<td valign=top><font size=1 color=#5f5f5f face="sans-serif">Cc:</font>
<td><font size=1 face="sans-serif">dfdl-wg@ogf.org, DFDL-Technical-Core%IBMGB@uk.ibm.com</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td><font size=1 face="sans-serif">22/01/2008 14:41</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td><font size=1 face="sans-serif">Re: [DFDL-WG] Action 14: Propose DFDL
entity scheme</font></table>
<br>
<hr noshade>
<br>
<br>
<br><font size=2 face="sans-serif"><br>
Hi Mike</font><font size=3> <br>
</font><font size=2 face="sans-serif"><br>
%NL; is a single character <LF> on those target platforms where
that is the convention or <CR><LF> on others, etc. This is
intended to make it easier for the same dfdl schema to parse messages from
different platforms. I know we avoided target platform in DFDL and was
expecting that this would cause some debate.</font><font size=3> <br>
</font><font size=2 face="sans-serif"><br>
This will be a good discussion for tomorrow's call</font><font size=3>
</font><font size=2 face="sans-serif"><br>
<br>
Alan Powell<br>
<br>
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England<br>
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
<br>
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898</font><font size=3><br>
<br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=34%><font size=1 face="sans-serif"><b>"Mike Beckerle"
<mbeckerle@OCO-INC.COM></b> </font>
<p><font size=1 face="sans-serif">22/01/2008 13:29</font><font size=3>
</font>
<td width=65%>
<br>
<table width=100%>
<tr valign=top>
<td width=7%>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td width=92%><font size=1 face="sans-serif">Alan Powell/UK/IBM@IBMGB,
<DFDL-Technical-Core%IBMGB@uk.ibm.com>, <dfdl-wg@ogf.org></font><font size=3>
</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">RE: [DFDL-WG] Action 14: Propose DFDL
entity scheme</font></table>
<br>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br><font size=3><br>
<br>
</font><font size=2 color=#004080 face="sans-serif"><br>
</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
Is the &NL; supposed to represent a single character? Or can it be
a CRLF?</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
There’s no notion of “the target platform” in DFDL. We’ve specifically
avoided this notion on purpose. So we need a separate property like newline=”&CR;&LF;”
or newline=”&LF;” if we want &NL; to be meaningful, unless some
other property is suitable.</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
There are some other Unicode whitespace and Unicode line-ending characters.
Do we want to include those in the definitions of WSP and NL ? I recall
there are 4 line-endings in Unicode.</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
…mikeb</font><font size=3> </font><font size=2 color=#004080 face="sans-serif"><br>
</font><font size=3> </font><font size=2 face="Tahoma"><b><br>
From:</b> dfdl-wg-bounces@ogf.org [</font><a href="mailto:dfdl-wg-bounces@ogf.org"><font size=2 face="Tahoma">mailto:dfdl-wg-bounces@ogf.org</font></a><font size=2 face="Tahoma">]
<b>On Behalf Of </b>Alan Powell<b><br>
Sent:</b> Tuesday, January 22, 2008 6:32 AM<b><br>
To:</b> DFDL-Technical-Core%IBMGB@uk.ibm.com; dfdl-wg@ogf.org<b><br>
Subject:</b> Re: [DFDL-WG] Action 14: Propose DFDL entity scheme</font><font size=3>
</font><font size=3 face="Times New Roman"><br>
</font><font size=3> </font><font size=2 face="Arial"><br>
<br>
All</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
<br>
Attached is the latest proposal for DFDL 'entities'</font><font size=3 face="Times New Roman">
</font><font size=2 face="Arial"><br>
<br>
The main changes are:</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
- No longer using XML entities as this proved to not meet all the requirements</font><font size=3 face="Times New Roman">
</font><font size=2 face="Arial"><br>
- New generic mnemonics for <NL> and others to represent the NL on
the target platform.</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
<br>
<br>
Alan Powell<br>
<br>
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England<br>
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
<br>
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898</font><font size=3 face="Arial"><br>
</font>
<div align=center>
<br><font size=3><br>
</font>
<hr></div>
<br><font size=3 face="Times New Roman"><br>
</font><font size=3> </font>
<p><font size=2 face="Arial"><i>Unless stated otherwise above:<br>
IBM United Kingdom Limited - Registered in England and Wales with number
741598. <br>
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU</i></font><font size=3 face="Times New Roman"> </font>
<p><font size=3 face="Times New Roman"><br>
<br>
</font><font size=3><br>
</font><font size=3 face="sans-serif"><br>
</font><font size=3><br>
</font><font size=3 face="sans-serif"><br>
</font><font size=3><br>
</font>
<hr><font size=2 face="sans-serif"><i><br>
</i></font>
<p><font size=2 face="sans-serif"><i>Unless stated otherwise above:<br>
IBM United Kingdom Limited - Registered in England and Wales with number
741598. <br>
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU</i></font><font size=3> </font>
<p><font size=3 face="sans-serif"><br>
</font><font size=3><br>
<br>
</font><font size=3 face="sans-serif"><br>
</font><tt><font size=2>--<br>
dfdl-wg mailing list<br>
dfdl-wg@ogf.org<br>
</font></tt><a href="http://www.ogf.org/mailman/listinfo/dfdl-wg"><tt><font size=2>http://www.ogf.org/mailman/listinfo/dfdl-wg</font></tt></a>
<br><font size=3 face="sans-serif"><br>
</font>
<br><font size=3 face="sans-serif"><br>
</font>
<hr><font size=2 face="sans-serif"><br>
<i><br>
</i></font>
<p><font size=2 face="sans-serif"><i>Unless stated otherwise above:<br>
IBM United Kingdom Limited - Registered in England and Wales with number
741598. <br>
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU</i></font>
<p><font size=2 face="sans-serif"><br>
</font><font size=3 face="sans-serif"><br>
</font>
<br>
<br><font size=3 face="sans-serif"><br>
</font>