From smh at uk.ibm.com Thu Feb 2 09:42:32 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Thu, 2 Feb 2012 14:42:32 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2012-01-31 Message-ID: Please find minutes from the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16388 Latest errata document may be found via https://forge.ogf.org/sf/projects/dfdl-wg Next call will be hosted by Mike Beckerle who will distribute call-in details. Regards Steve Hanson Architect, DFDL, IBM SWG Co-Chair, OGF DFDL Working Group Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbeckerle.dfdl at gmail.com Mon Feb 6 11:52:25 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Mon, 06 Feb 2012 16:52:25 +0000 Subject: [DFDL-WG] Invitation: OGF DFDL Workgroup Weekly Call for 2012-02-07T10:00:00-05:00 @ Tue Feb 7 10am - 11am (dfdl-wg@ogf.org) Message-ID: <20cf3074b182b2889b04b84e7cfc@google.com> You have been invited to the following event. Title: OGF DFDL Workgroup Weekly Call for 2012-02-07T10:00:00-05:00 This week only we have a different dial-in arrangement: Please call USA: +1 (781) 330-0114, and I will add you to the conference call. (or use google talk/voice to call mbeckerle at gmail.com) Agenda will be sent separately. When: Tue Feb 7 10am ? 11am Eastern Time Where: Please call this USA number: +1 781 330 0114 (or google voice chat mbeckerle at gmail.com) Calendar: dfdl-wg at ogf.org Who: * mbeckerle.dfdl at gmail.com - organizer * dfdl-wg at ogf.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=NGpxbDA3cHVkdTdlcHVsbHUycnVrcXEwb3MgZGZkbC13Z0BvZ2Yub3Jn&tok=MjQjbWJlY2tlcmxlLmRmZGxAZ21haWwuY29tMGQ4Yjg5MDRjNGM1YzlhNzhhNWZlNGVmZjI2NWQ0ZjlkOTAxMjk5NA&ctz=America%2FNew_York&hl=en Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account dfdl-wg at ogf.org because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1416 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1448 bytes Desc: not available URL: From mbeckerle.dfdl at gmail.com Mon Feb 6 12:04:25 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Mon, 6 Feb 2012 12:04:25 -0500 Subject: [DFDL-WG] Agenda for 2012-02-07 WG Call Message-ID: Attached is a document with minutes from last call, and current status of action items. Current Agenda for the call: - Status/Updates - Discussion of specific Action Item issues (whichever we can make progress on): - 159 (documentFinalTerminatorCanBeMissing), - 164 (ignoreCase for nilValue), - 165 (end of parent) - 166 (property and enum names for separatorPolicy) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DFDL WG Call Agenda 2012-02-07.pdf Type: application/octet-stream Size: 65397 bytes Desc: not available URL: From mbeckerle.dfdl at gmail.com Mon Feb 6 14:46:25 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Mon, 06 Feb 2012 19:46:25 +0000 Subject: [DFDL-WG] Updated Invitation: OGF DFDL Workgroup Weekly Call for 2012-02-07T10:00:00-05:00 @ Tue Feb 7 10am - 11am (dfdl-wg@ogf.org) Message-ID: <20cf300fb1eff3e4a904b850eacb@google.com> This event has been changed. Title: OGF DFDL Workgroup Weekly Call for 2012-02-07T10:00:00-05:00 Updated: Screen sharing (if needed) http://www.yuuguu.com/share PIN 034159 -------------------- This week only we have a different dial-in arrangement: Please call USA: +1 (781) 330-0114, and I will add you to the conference call. (or use google talk/voice to call mbeckerle at gmail.com) Agenda will be sent separately. (changed) When: Tue Feb 7 10am ? 11am Eastern Time Where: Please call this USA number: +1 781 330 0114 (or google voice chat mbeckerle at gmail.com) Calendar: dfdl-wg at ogf.org Who: * mbeckerle.dfdl at gmail.com - organizer * dfdl-wg at ogf.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=NGpxbDA3cHVkdTdlcHVsbHUycnVrcXEwb3MgZGZkbC13Z0BvZ2Yub3Jn&tok=MjQjbWJlY2tlcmxlLmRmZGxAZ21haWwuY29tMGQ4Yjg5MDRjNGM1YzlhNzhhNWZlNGVmZjI2NWQ0ZjlkOTAxMjk5NA&ctz=America%2FNew_York&hl=en Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account dfdl-wg at ogf.org because you are an attendee of this event. To stop receiving future notifications for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1507 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1540 bytes Desc: not available URL: From mbeckerle.dfdl at gmail.com Tue Feb 7 10:23:55 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Tue, 7 Feb 2012 10:23:55 -0500 Subject: [DFDL-WG] Fail: DFDL WG Call 2012-02-06 - Apologies - conference calling failure. Message-ID: My apologies folks. I didn't have sufficient group call-in capabilities tested in advance of the call time. The conference number I gave did not perform as advertised (by Google/Grand Central). So, unfortunately, we will not hold the WG call this week. -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Mon Feb 13 12:26:42 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Mon, 13 Feb 2012 17:26:42 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2012-02-14 Message-ID: Please find agenda attached for the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16393 Note that calls are back to the original time 3pm GMT / 10am Eastern. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 14 08:30:44 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Tue, 14 Feb 2012 13:30:44 +0000 Subject: [DFDL-WG] Fw: Issue 140 and empty string - question on escape schemes as empty-string qualifiers Message-ID: Not sure that we had discussed this on a WG call, so adding to today's agenda. There's a potential spec update needed. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 14/02/2012 13:30 ----- From: Steve Hanson/UK/IBM To: Mike Beckerle Cc: Tim Kimber/UK/IBM at IBMGB Date: 31/01/2012 13:58 Subject: Re: Issue 140 and empty string - question on escape schemes as empty-string qualifiers Mike - some replies below Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle To: Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB Date: 30/01/2012 20:13 Subject: Issue 140 and empty string - question on escape schemes as empty-string qualifiers I think we forgot about escape schemes and how they are used to quote around empty strings, and possibly nil indicators, or I'd like clarification anyway. E.g., Now, if data is [nil, nil] I get two nils. What if data is ['nil','nil'] - either I still get two nils, or I get two non-nil strings with "nil" as their contents: nilnil Which is it? SMH: According to the property precedence order in section 22 of the spec, the escape scheme is applied before nil value processing when parsing, and after nil value processing on unparsing. That is independent of the nilKind. So in your example you would get two nils in the infoset. Similarly, assume please that empty string matches the syntax for empty per initiator/terminator and emptyValueDelimiterPolicy, Now if I have It's all optional, so if the data is ['',''] then I either get nothing in the infoset (because empty creates nothing for optionals), or I get two empty strings in the infoset. Which is it? SMH: I would look at this from the unparsing angle. If there is nothing in the infoset then I would expect to see nothing in the data, I would not expect to see escaped nothing. That's true if generateEscapeBlock is 'always' or 'whenNeeded'. If I had an empty string in the infoset then I would expect it to be escaped in the data if I said 'always' but not if I said 'whenNeeded' (because %ES; is not allowed as a delimiter or as a value extraEscapedCharacters, so escaping empty string can never be needed.) From this, the only way I could get '' in the data would be if I had escaped an empty string. Therefore on parsing, I would treat '' as an escaped empty string and add empty string to infoset. This sounds right to me. In our action 140 document, we have defined 'empty' to mean that the returned length (however obtained) is 0. If I encounter escape characters than I would claim that slot in the data is not 'empty'. We should check that this is consistent with how emptyValueDelimiterPolicy is applied. For parsing section 22 has this correct, and emptyValueDelimiterPolicy is examined before escape scheme applied. But for unparsing section 22 has it the wrong way round - the property should be applied after any escaping/padding has taken place. Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 14 19:53:01 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Wed, 15 Feb 2012 00:53:01 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2012-02-14 Message-ID: Please find minutes from the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16393 Errata document has been updated to version 8 and may be found via https://forge.ogf.org/sf/projects/dfdl-wg Regards Steve Hanson Architect, DFDL, IBM SWG Co-Chair, OGF DFDL Working Group Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 21 06:27:37 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Tue, 21 Feb 2012 11:27:37 +0000 Subject: [DFDL-WG] Fw: Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent Message-ID: Let's try and close on this today. Need to agree on list of restrictions below, and decide whether 'endOfParent' is allowed on root element (to handle case where data is ended by EOF). Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 21/02/2012 10:45 ----- From: Steve Hanson/UK/IBM To: dfdl-wg at ogf.org Date: 31/01/2012 17:35 Subject: Fw: [DFDL-WG] Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent Here are the updates to the restrictions that we discussed on the call. Action 165 raised. Constraints on element lengthKind 'endOfParent'... - element maxOccurs = 1 - no terminator on element - no trailingSkip on element - if element is in a sequence - sequence must be the content of a complex type - element must be the last object in the sequence - separatorPosition of sequence must not be 'postFix' - sequenceKind of sequence must be 'ordered' - no terminator on sequence - no trailingSkip on sequence - no floating elements in the sequence - if element is in a choice - if choiceLengthKind is 'implicit' - choice must be the content of a complex type - no terminator on choice - no trailingSkip on choice - parent element lengthKind must not be 'implicit' or 'delimited' - not sensitive to any in-scope markup Notes: - complex element can have 'endOfParent', & its last child element can be any lengthKind including 'endOfParent' - element must be the last thing in its box - a box is defined as a portion of the data stream that has an established length prior to the parsing of its children - element with lengthKind 'explicit', 'prefixed', 'pattern' & no (sequence right framing or sequence postfix separator or choice right framing) - choice with choiceLengthKind 'explicit' Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 31/01/2012 16:54 ----- From: Steve Hanson/UK/IBM To: Mike Beckerle Cc: dfdl-wg at ogf.org Date: 23/01/2012 17:53 Subject: Re: [DFDL-WG] Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent I have typically used 'implicit' for this because the children of the root element should match the data leaving no bytes unconsumed. Alternatively using 'delimited' should be equivalent, you are never actually scanning. However when the last child is endOfParent according to the rules below I can't use 'implicit' or 'delimited'....so in that scenario I am stuck. Need a rethink. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle To: Steve Hanson/UK/IBM at IBMGB Cc: dfdl-wg at ogf.org Date: 23/01/2012 17:00 Subject: Re: [DFDL-WG] Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent Only question I have is on "endOfParent" not being allowed on root element. If you have a DFDL implementation processing message buffers, and the root element's content ends at the end of the buffer/end-of-data, how do we express that? I expected that to be end-of-parent, the notion being that there's an implicit parent for all content, which has an end which is the true end-of-data. ...mikeb On Mon, Jan 23, 2012 at 6:19 AM, Steve Hanson wrote: ----- Forwarded by Steve Hanson/UK/IBM on 23/01/2012 11:16 ----- From: Steve Hanson/UK/IBM To: Mike Beckerle , Tim Kimber/UK/IBM Date: 18/01/2012 16:26 Subject: * DFDL Errata* Clarification: Limitations on use of endOfParent As agreed on WG extra call on 18th Jan. Will be raised as a separate issue on next DFDL WG call. Constraints on element lengthKind 'endOfParent'... - element maxOccurs = 1 - no terminator on element - if element is in a sequence - separatorPolicy of sequence must not be 'postFix' - sequenceKind of sequence must be 'ordered' - no floating elements in the sequence - must be the 'last' in the sequence statically ** - if element is in a choice it is always 'last' statically ** - parent element lengthKind must not be 'implicit' or 'delimited' - if element is complex then all possible 'last' elements ** must also be 'endOfParent' - not sensitive to any in-scope markup - not allowed on root element ** Need a concise description of walking the content of a complex element and building the list of 'last elements'. Involves factoring out local sequences and coping with choices. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: Mike Beckerle Cc: Tim Kimber/UK/IBM at IBMGB Date: 08/11/2011 18:22 Subject: Re: Coping with Character code U+0000 - and how to end-of-parent in an array Hi Mike For the record, I knew we said something about U+0000 - it's in section 5... String ? In DFDL a string can contain any character codes. None are reserved. (Including the character with character code U+0000, which is not permitted in XML documents.) After discussion with Sandy Gao, this is what we wrote in the DFDL to XDM mapping document: Note: SimpleElement [datavalue] values may contain characters that are illegal in XML, for example, DFDL strings can contain the character code 0 (zero) within them, but XML does not allow this character code in any XML content even if it is represented as a character entity. Nevertheless, a DFDL described string is mapped to an XDM string data value. and later for the actual mapping to XDM: SimpleElement: If the value of [datavalue] is special value ?nil?, then the empty string, otherwise the value of [datavalue] converted to its canonical lexical representation. On to your examples (and assuming separatorPosition is 'infix')... You are right about the first (endOfParent) example being odd. This example would work fine if the lengthKind was 'delimited'. Remember that the 'explicit' length of the parent element creates a box which scopes the delimited behaviour. The second (delimited) works fine. endOfParent and delimited behave almost identically most of the time. When the element is an array, this looks to be one of the differences. I am thinking that endOfParent should not be allowed when maxOccurs > 1. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle To: Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB Date: 08/11/2011 17:02 Subject: Coping with Character code U+0000 - and how to end-of-parent in an array We have this incompatiability with XML infoset around U+0000 aka NUL. However, one can model data containing character code U+0000 in the content as an array of strings with NUL termination. That is, we split the string on the NUL characters so as to avoid putting them in the infoset. So I tried to do this and ran into issues. E.g., if data contains a string of length 80, but inside it the character code 0 can appear, then this could be modeled as: Problem: is that use of endOfParent length kind right? It's the last thing in the group, but the same element decl also describes the prior elements. If there are no NULs in the string, then endOfParent is exactly what you want. There will be only one substring, it will have all 80 characters. But if there are NULs in the middle, then you want the earlier array elements to be delimited by the sequence's separators, and only the last element to be delimited by endOfParent. This semantics where the parent is providing the constraints on length, but sometimes its separator, just for the last thing it's endOfParent, is not something we can express I believe. I was actually even unclear on this one: If the data 'string' has a terminator of ! then perhaps: Is the array element delimited? Is that the right length kind for this situation? Thanks for comments ...mikeb ...mikeb Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg at ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 21 06:44:10 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Tue, 21 Feb 2012 11:44:10 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2012-02-21 Message-ID: Please find agenda attached for the above call (there is a problem with GridForge so I can't post the document there) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DFDL WG Call Agenda 2012-02-21.pdf Type: application/octet-stream Size: 66282 bytes Desc: not available URL: From mbeckerle.dfdl at gmail.com Tue Feb 21 08:21:59 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Tue, 21 Feb 2012 08:21:59 -0500 Subject: [DFDL-WG] test message Message-ID: Please ignore this message. -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 21 08:24:21 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Tue, 21 Feb 2012 13:24:21 +0000 Subject: [DFDL-WG] Testing mailing list responsiveness Message-ID: Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbeckerle.dfdl at gmail.com Tue Feb 21 08:41:19 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Tue, 21 Feb 2012 08:41:19 -0500 Subject: [DFDL-WG] test message 2 Message-ID: Please ignore. -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbeckerle.dfdl at gmail.com Tue Feb 21 16:08:35 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Tue, 21 Feb 2012 16:08:35 -0500 Subject: [DFDL-WG] Clarification on Escape Schemes being Optional Message-ID: Are escape schemes entirely optional in DFDL v1.0, or just "defineEscapeScheme" named ones? -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbeckerle.dfdl at gmail.com Thu Feb 23 09:04:15 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Thu, 23 Feb 2012 09:04:15 -0500 Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2012-02-21 In-Reply-To: References: Message-ID: Please find minutes from the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16397 Errata document has been updated to version 8 and may be found via https://forge.ogf.org/sf/projects/dfdl-wg Regards Steve Hanson Architect, DFDL, IBM SWG Co-Chair, OGF DFDL Working Group Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 ________________________________ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From mbeckerle.dfdl at gmail.com Thu Feb 23 09:10:45 2012 From: mbeckerle.dfdl at gmail.com (Mike Beckerle) Date: Thu, 23 Feb 2012 09:10:45 -0500 Subject: [DFDL-WG] Spec errata 3.3 In-Reply-To: References: Message-ID: ----- Message from Steve Hanson on Tue, 21 Feb 2012 17:24:59 +0000 ----- To be discussed on next DFDL WG call. As a result of action 136, the spec errata document currently says: 3.3. Section 12.3. Clarify that when property is lengthKind 'explicit', 'implicit', 'prefixed' or 'pattern', it means that delimiter scanning is turned off and in-scope delimiters are not looked for within or between elements. Consequently remove the last paragraph of section 5.2.2 starting "It is a processing error when a fixed-length string is found to have a number of characters not equal to the fixed number". It has been pointed out that for a complex element with lengthKind 'implicit', turning off in-scope delimiters is not a consistent behaviour. Even though implicit means that the length is defined by the child content, this is still subject to constraints imposed by the parent. Further, as lengthKind 'implicit' is the implied lengthKind for local sequences and choices, it should be possible to wrap such a sequence or choice in an element with lengthKind 'implicit' and no framing, and experience no behaviour change in parsing other than the addition to the infoset of the element. ?The ability to selectively switch off in-scope delimiters is something that could be useful for both 'implicit' and 'delimited' lengthKinds, and if so could be added post 1.0 as a separate control. ?The proposal is to change errata 3.3 to read: 3.3. Section 12.3. Clarify that when property lengthKind is 'explicit', 'prefixed' or 'pattern', then delimiter scanning is turned off and in-scope delimiters are not looked for within or between elements. Clarify that when property lengthKind is 'implicit' and type is simple, then delimiter scanning is turned off and in-scope delimiters are not looked for within or between elements. Clarify that when property lengthKind is 'implicit' and type is complex, then delimiter scanning is not turned off and in-scope delimiters are looked for. Consequently remove the last paragraph of section 5.2.2 starting "It is a processing error when a fixed-length string is found to have a number of characters not equal to the fixed number". Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 ________________________________ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From bradley.r.sexton at gmail.com Thu Feb 23 14:06:29 2012 From: bradley.r.sexton at gmail.com (Bradley Sexton) Date: Thu, 23 Feb 2012 14:06:29 -0500 Subject: [DFDL-WG] DFDL Modeling Question Message-ID: Hello, I've been looking at modeling Rich Text Format (RTF) files using the IBM Message Broker DFDL implementation, and ran into an issue. For some background, here's a small example of an RTF file: {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\fswiss\fcharset0 Arial;}}{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs24 This is an example document of an RTF file.\f1\fs20\par{\*\passwordhash 010000004c000000010000000480000050c300001400000010000000f89c360d0c9d360d000000008bc29e2f78a2144122ed68a1701e2ea50bbbbeaf7333c40dfe048ccf55f709b8cc7e8b49}} '\' and '\*\' mark the beginning of control words, and the curly braces mark the beginning and end of control groups that contain control words and data. My issue is that control words and data do not have suitable terminators for parsing. The end of control words is signified by a space when trailing data is present, but typically they are ended by '\' signalling the beginning of a new word or a curly brace signalling the end of the current of beginning of a new control group. Similarly data is typically ended by the '}' of the parent control group. With the exception of a small header the value and placement of control words, groups, and data varies by file. My issue with modeling this is that I was going to use dfdl:lengthKind="pattern" in lieu of suitable delimiters, but this feature is not implemented by IBM. I'm looking for an alternative way to model the data, and was hoping someone on the mailing list might have suggestions. My goal is to model control words and groups in as general a manner as possible given IBMs implementation restrictions, since RTF has over 1800 defined control words and gives you the ability to create your own. Ideal output for the above sample would be something along these lines: rtf1 ansi ansicpg1252 deff0 deflang1033 fonttbl f0 froman fprq2 fcharset0 Times New Roman; f1 fswiss fcharset0 Arial; generator Msftedit 5.41.15.1515; viewkind4 uc1 pard f0 fs24 This is an example document of an RTF file. f1 fs20 par passwordhash 010000004c000000010000000480000050c3. . . IBM Unsupported Features: http://publib.boulder.ibm.com/infocenter/wmbhelp/v8r0m0/index.jsp?topic=%2Fcom.ibm.dfdl.editor.messagebroker.doc%2Fdf00150_.html I know that's a lot of info out of left field, but I wanted to try and explain it as thoroughly as possible to avoid any confusion. Thanks in advance for any advice you might have and let me know if I've been unclear in any areas. Bradley Sexton -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Thu Feb 23 16:31:52 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Thu, 23 Feb 2012 21:31:52 +0000 Subject: [DFDL-WG] DFDL Modeling Question In-Reply-To: References: Message-ID: Hi Bradley Yes dfdl:lengthKind "pattern" is the ideal way to model this. I'm struggling to find a way to model this that preserves the nested groups and separates the trailing data from the control word. However if you were prepared to lose the group structure and treat the trailing data as part of the control word, then you could model a completely flat structure with the various delimiters interpreted as a prefix separator. dfdl:separator="\ }\ }}\ }}}\ {\ }{\ }}{\ }}}{\" dfdl:separatorPosition="prefix" That would give you an infoset like: rtf1 ansi ansicpg1252 deff0 deflang1033 fonttbl f0 froman fprq2 fcharset0 Times New Roman; f1 fswiss fcharset0 Arial; * generator Msftedit 5.41.15.1515; viewkind4 uc1 pard f0 fs24 This is an example document of an RTF file. f1 fs20 par * passwordhash 010000004c000000010000000480000050c3. . . Not ideal. I'll carry on thinking about the problem. If you like I'll add you to the invite list for the DFDL WG call next Tuesday and we can discuss further? Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 From: Bradley Sexton To: dfdl-wg at ogf.org Date: 23/02/2012 19:07 Subject: [DFDL-WG] DFDL Modeling Question Sent by: dfdl-wg-bounces at ogf.org Hello, I've been looking at modeling Rich Text Format (RTF) files using the IBM Message Broker DFDL implementation, and ran into an issue. For some background, here's a small example of an RTF file: {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\fswiss\fcharset0 Arial;}}{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs24 This is an example document of an RTF file.\f1\fs20\par{\*\passwordhash 010000004c000000010000000480000050c300001400000010000000f89c360d0c9d360d000000008bc29e2f78a2144122ed68a1701e2ea50bbbbeaf7333c40dfe048ccf55f709b8cc7e8b49}} '\' and '\*\' mark the beginning of control words, and the curly braces mark the beginning and end of control groups that contain control words and data. My issue is that control words and data do not have suitable terminators for parsing. The end of control words is signified by a space when trailing data is present, but typically they are ended by '\' signalling the beginning of a new word or a curly brace signalling the end of the current of beginning of a new control group. Similarly data is typically ended by the '}' of the parent control group. With the exception of a small header the value and placement of control words, groups, and data varies by file. My issue with modeling this is that I was going to use dfdl:lengthKind="pattern" in lieu of suitable delimiters, but this feature is not implemented by IBM. I'm looking for an alternative way to model the data, and was hoping someone on the mailing list might have suggestions. My goal is to model control words and groups in as general a manner as possible given IBMs implementation restrictions, since RTF has over 1800 defined control words and gives you the ability to create your own. Ideal output for the above sample would be something along these lines: rtf1 ansi ansicpg1252 deff0 deflang1033 fonttbl f0 froman fprq2 fcharset0 Times New Roman; f1 fswiss fcharset0 Arial; generator Msftedit 5.41.15.1515; viewkind4 uc1 pard f0 fs24 This is an example document of an RTF file. f1 fs20 par passwordhash 010000004c000000010000000480000050c3. . . IBM Unsupported Features: http://publib.boulder.ibm.com/infocenter/wmbhelp/v8r0m0/index.jsp?topic=%2Fcom.ibm.dfdl.editor.messagebroker.doc%2Fdf00150_.html I know that's a lot of info out of left field, but I wanted to try and explain it as thoroughly as possible to avoid any confusion. Thanks in advance for any advice you might have and let me know if I've been unclear in any areas. Bradley Sexton-- dfdl-wg mailing list dfdl-wg at ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From bradley.r.sexton at gmail.com Fri Feb 24 10:31:20 2012 From: bradley.r.sexton at gmail.com (Bradley Sexton) Date: Fri, 24 Feb 2012 10:31:20 -0500 Subject: [DFDL-WG] DFDL Modeling Question In-Reply-To: References: Message-ID: Steve, The order of nested groups is somewhat fluid in RTF, and my concern is whether or not modeling everything completely flat would preserve the structure and formatting properly. If you were to modify the text format in a file such as inserting a comment a new group is created and any data entered within the comment or previously existing text that is highlighted by the comment would be moved in new groups to signify their link. Feel free to put me down for the WG call, just let me know the time and call info. Thanks, Bradley Sexton On Thu, Feb 23, 2012 at 4:31 PM, Steve Hanson wrote: > Hi Bradley > > Yes dfdl:lengthKind "pattern" is the ideal way to model this. > > I'm struggling to find a way to model this that preserves the nested > groups and separates the trailing data from the control word. However if > you were prepared to lose the group structure and treat the trailing data > as part of the control word, then you could model a completely flat > structure with the various delimiters interpreted as a prefix separator. > > dfdl:separator="\ }\ }}\ }}}\ {\ }{\ }}{\ }}}{\" > dfdl:separatorPosition="prefix" > > That would give you an infoset like: > > > rtf1 > ansi > ansicpg1252 > deff0 > deflang1033 > fonttbl > f0 > froman > fprq2 > fcharset0 Times New Roman; > f1 > fswiss > fcharset0 Arial; > * > generator Msftedit 5.41.15.1515; > viewkind4 > uc1 > pard > f0 > fs24 This is an example document of an RTF > file. > f1 > fs20 > par > * > passwordhash 010000004c000000010000000480000050c3. . > . > > > Not ideal. I'll carry on thinking about the problem. > > If you like I'll add you to the invite list for the DFDL WG call next > Tuesday and we can discuss further? > > Regards > > Steve Hanson > Architect, Data Format Description Language (DFDL) > Co-Chair, *OGF DFDL Working Group* > IBM SWG, Hursley, UK* > **smh at uk.ibm.com* > tel:+44-1962-815848 > > > > From: Bradley Sexton > To: dfdl-wg at ogf.org > Date: 23/02/2012 19:07 > Subject: [DFDL-WG] DFDL Modeling Question > Sent by: dfdl-wg-bounces at ogf.org > ------------------------------ > > > > Hello, > > I've been looking at modeling Rich Text Format (RTF) files using the IBM > Message Broker DFDL implementation, and ran into an issue. For some > background, here's a small example of an RTF file: > > {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\froman\fprq2\fcharset0 > Times New Roman;}{\f1\fswiss\fcharset0 Arial;}}{\*\generator Msftedit > 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs24 This is an example document of an > RTF file.\f1\fs20\par{\*\passwordhash > 010000004c000000010000000480000050c300001400000010000000f89c360d0c9d360d000000008bc29e2f78a2144122ed68a1701e2ea50bbbbeaf7333c40dfe048ccf55f709b8cc7e8b49}} > > '\' and '\*\' mark the beginning of control words, and the curly braces > mark the beginning and end of control groups that contain control words and > data. My issue is that control words and data do not have suitable > terminators for parsing. The end of control words is signified by a space > when trailing data is present, but typically they are ended by '\' > signalling the beginning of a new word or a curly brace signalling the end > of the current of beginning of a new control group. Similarly data is > typically ended by the '}' of the parent control group. > > With the exception of a small header the value and placement of control > words, groups, and data varies by file. > > My issue with modeling this is that I was going to use > dfdl:lengthKind="pattern" in lieu of suitable delimiters, but this feature > is not implemented by IBM. I'm looking for an alternative way to model the > data, and was hoping someone on the mailing list might have suggestions. My > goal is to model control words and groups in as general a manner as > possible given IBMs implementation restrictions, since RTF has over 1800 > defined control words and gives you the ability to create your own. > > Ideal output for the above sample would be something along these lines: > > > rtf1 > ansi > ansicpg1252 > deff0 > deflang1033 > > fonttbl > > f0 > froman > fprq2 > fcharset0 > Times New Roman; > > > f1 > fswiss > fcharset0 > Arial; > > > > generator > Msftedit 5.41.15.1515; > > viewkind4 > uc1 > pard > f0 > fs24 > This is an example document of an RTF file. > f1 > fs20 > par > > passwordhash > 010000004c000000010000000480000050c3. . . > > > > IBM Unsupported Features: > * > http://publib.boulder.ibm.com/infocenter/wmbhelp/v8r0m0/index.jsp?topic=%2Fcom.ibm.dfdl.editor.messagebroker.doc%2Fdf00150_.html > * > > I know that's a lot of info out of left field, but I wanted to try and > explain it as thoroughly as possible to avoid any confusion. Thanks in > advance for any advice you might have and let me know if I've been unclear > in any areas. > > Bradley Sexton-- > dfdl-wg mailing list > dfdl-wg at ogf.org > https://www.ogf.org/mailman/listinfo/dfdl-wg > > > > ------------------------------ > > * > * > > *Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > * > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Mon Feb 27 13:59:25 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Mon, 27 Feb 2012 18:59:25 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2012-02-28 Message-ID: Please find agenda attached for the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16399 Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: From smh at uk.ibm.com Tue Feb 28 11:16:48 2012 From: smh at uk.ibm.com (Steve Hanson) Date: Tue, 28 Feb 2012 16:16:48 +0000 Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2012-02-28 Message-ID: Please find minutes from the above call on GridForge at https://forge.ogf.org/sf/docman/do/downloadDocument/projects.dfdl-wg/docman.root.current_0.calls/doc16399 Errata document has been updated to version 8 and may be found via https://forge.ogf.org/sf/projects/dfdl-wg Regards Steve Hanson Architect, DFDL, IBM SWG Co-Chair, OGF DFDL Working Group Hursley, UK smh at uk.ibm.com tel:+44-1962-815848 -------------- next part -------------- An HTML attachment was scrubbed... URL: