[DFDL-WG] Fw: pattern based lengths - suggested revised language
smh at uk.ibm.com
Tue Aug 16 09:25:09 CDT 2011
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
----- Forwarded by Steve Hanson/UK/IBM on 16/08/2011 15:28 -----
Mike Beckerle <mbeckerle.dfdl at gmail.com>
Steve Hanson/UK/IBM at IBMGB
Re: pattern based lengths - suggested revised language
I support what you call the conservative approach. I.e. require text when
patterns are used.
On Jul 27, 2011 5:53 AM, "Steve Hanson" <smh at uk.ibm.com> wrote:
> Hi Mike
> I don't think we can reduce the wording that much. The second paragraph
> is needed because it covers the binary case, where encoding is not
> actually used.
> I think we either need to be conservative and disallow the combination
> binary & pattern, or leave the second paragraph as-is and effectively
> that if you binary with pattern then that is the behaviour.
> If we are to be conservative then:
> - For a simple element or simple type, disallow lengthKind="pattern"
> binary rep.
> - For a complex element with lengthKind = "pattern", all children must
> have lengthUnits = "characters" (so text only) and the encoding of the
> children must be the same as the encoding of the parent. (We already
> a similar rule for complex elements with specified length and
> = "characters").
> We also allow asserts and discriminators to carry patterns which are
> applied straight at the current position in the data stream. It would be
> difficult to police the conservative rules here. But we need to say what
> encoding is used and we currently do not. I would say it must be the
> encoding of the element or group that carries the assert/discriminator.
> I said on the call that we had extended DFDL regular expressions so that
> raw hex bytes could be specified. However I don't see any evidence of
> in the DFDL spec. This facility was something we added to IBM MRM for a
> retail format called TLOG which consists of delimited packed decimal
> with hex indicator bytes, so we needed a way to match the hex indicator
> bytes as part of the regexp. However, I think this was only necessary
> because MRM has neither speculation nor discriminators, and in a DFDL
> version of TLOG I would use a discriminator. So I think my statement was
> in error, and I don't believe raw hex in DFDL regexps is needed.
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group
> IBM SWG, Hursley, UK
> smh at uk.ibm.com
> "Mike Beckerle" <mbeckerle.dfdl at gmail.com>
> Steve Hanson/UK/IBM at IBMGB
> 26/07/2011 17:30
> pattern based lengths - suggested revised language
> I suggest this language to tighten up this whole section (replace both
> paragraphs). Given the concerns of Tim, that we make sure DFDL
> implementations don’t have to reimplement regexp matching, I think this
> 18.104.22.168 Based Lengths - Scanability
> Any element (complex, simple text, simple binary) may have a
> dfdl:lengthKind 'pattern'. When an element contains binary data, and
> lengthKind=’pattern’ is used, then it is a schema definition error if
> character set encoding is not iso-8859-1.
> (Possible generalization 1: allow other character sets, e.g.,
> as well. This is ok because 8859-15 still maps all 256 codepoints. But
> this is a slippery slope. )
> (Possible generalization 2: allow any character set, Ascii, ebcdic,
> utf-16be, etc. Note that using any character encoding other than one
> maps a valid character to any 8-bit byte creates ambiguity: e.g, the
> regexp “.” is one where we normally think it means “any character”. But
> do we really mean “any byte” ? If the character set encoding doesn’t
> a given byte as a codepoint, then this question really matters.)
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dfdl-wg