ࡱ> RTQ` 0&bjbj .<  6666$Z, =.m" -------$K0h2- mm- -  --*| , Nj6v(+- .0=.:+=3=3$,=3 ,--=. $ $. d  .    DFDL Proposal to Simplify Null, Default, Optional Handling 2007-08-16 V002 Introduction Null values, default values, and optional data handling make up one of the most complex areas of the DFDL specification. It is important to simplify this area, but certain functionality is required and cannot be omitted. Note that optionality and variable length are indistinguishable in DFDL. A variable length array simply has minOccurs < maxOccurs. An optional element is just a special case where minOccurs=0 and maxOccurs=1. So in this note we will discuss only variable length arrays. Several changes in the DFDL spec, and approach to it allow us to radically simplify this complex area. These are: speculative parsing as the general method for dealing with uncertainty. unordered groups expressed as equivalent DFDL for array of choice initiators expressed as equivalent to a hidden element with assertion about what it must contain/match. wildcards explained away in terms of rewrite to other equivalent DFDL constructs. Those combined with the proposals below result in a very much reduced overall complexity. Proposal: Simplify Output/Unparse and Defaulting We should adopt this policy DFDL unparsing does not make an invalid logical data item valid This is consistent with and symmetric to the parser behavior. Implementations of DFDL-based systems must provide APIs or other means to construct logical data instances. These can then be unparsed, a.k.a., written to output. It is the business of those systems to assist users with creation of valid logical data if they desire, but DFDL can reasonably stay out of this and thereby eliminate some confusing and subtle properties. DFDL can stay out of this because one does not need any DFDL properties to carry out the process of making invalid logical data into valid logical data. Almost by definition, since this is a transformation from logical to logical, it cannot involve DFDL properties. So, given the implied XML Schema (which is logical only) from the DFDL, one could provide a service to a programmer where by a partially constructed logical data instance (e.g., a DOM tree) could be automatically expanded into a valid one based on the information in the XML Schema facets and max/min occurs information alone. Hence, this issue can be reasonably said to be out-of-scope for DFDL which can focus only on the format for valid logical data. A note to DFDL implementers to this effect could go in the spec to provide guidance that while DFDL itself doesnt address this issue, it will be an issue for users. Proposal: Use Speculative Parsing to Eliminate Initiator-based Complexities Because we use speculative parsing to explain the DFDL parsing semantics, initiators can be described as hidden string data fields along with an assertion that their contents match a pattern or literal. (detail: for unordered groups, this assertion is a discriminator for an array of choice semantics.) By depending on this we can eliminate properties specific to initiated data interactions with default values. An element, initiated or not, is either required (minOccurs not satisfied, or specified length e.g., stored/computed -- not yet achieved), or optional. If it is required, but speculative parsing does not find it successfully, then if there is a default value specified it is used as the logical value. If there is no default value then it is a processing error. (Suggest: Warning appropriate for DFDL schemas with non-zero minOccurs and no default value.) If it is optional, and speculative parsing does not find it successfully then it is not present, then this determines that were at the end of the variable length. Note that the above paragraphs are roughly equivalent to pseudo-code for the algorithm for processing variable length arrays with default-valued elements, including the special case of optionals (minOccurs=0, maxOccurs=1). Proposal: Simplify Input Defaulting for Empty Strings If an element is of type string, and has a default value specified, it is not clear whether the empty string should be an allowed value or if the empty string, when found in the representation, should trigger use of the default value instead. The following makes this unambiguous: Type string with minLength of zero and default value are incompatible. It is a schema definition error if a variable length string where zero length is valid also has a default value specified. This eliminates complexities around the issue of empty content. Empty content always triggers use of the default value. If the type is string and empty string is a legal value then there cannot be a default value. Proposal: Explain xs:fixed facet in terms of Assertion When an element has the fixed facet, this can be explained as if we removed the xs:fixed facet and instead gave the element both a defaultValue facet with the same value as the fixed facet. Note that DFDL does not insist that the value read and the fixed facet value match, thats a choice about document validation, which users might turn on or off. The DFDL standard doesnt require that this be checked, and speculative parsing cannot be guided by it. Hence, specifying the fixed facet for an element is NOT equivalent to a DFDL assertion that the value (when not defaulted), is equal to the fixed value specified. So, to DFDL fixed and default-value are synonymous. This means the xs:fixed facet need not be considered further in analysis of defaults and optional behavior. Implications Null Handling Decoupled. The above proposals allow us to almost completely decouple null value handling from default value and optional handling with the exception of: useNullValueForDefault nullValueHasInitiator (this is a new property) The useNullValueForDefault property really just indicates a different value (null) to be used as the default value. The nullValueHasInitiator property indicates that when a value is null, one will still find the initiator. It is a modifier on what initiators mean for nullable data types. This interacts with choices of elements having initiators since without the initiators we cannot recognize which element is present. Our usual rule here is that first successfully parsed choice branch wins. So in this case below If the data looks like null; You will always get a null-valued f1 element, never the f2, because the initiator isnt required. (DFDL implementations might sensibly issue a warning about such a DFDL schema.) Everything else about null value handling and detection is orthogonal to defaults and optionality. Summary We end up with these properties and XSD facets for specifying default and optionality behavior: useNullValueForDefault nullValueHasInitiator XSD facet defaultValue XSD attribute minOccurs and maxOccurs XSD facet minLength and length The above depend on the fact that the properties listed below here control the size of arrays via normal parsing (w/speculation as normal): occursKind, occursPath, occursPathUnits, occursSeparator initiator, terminator on array elements separator, terminator, occursSeparator on enclosing constructs Properties eliminated (from set provided in Draft 019 of DFDL Core): defaultWhenMissing initiatedElementMissingWhen initiatedElementNullWhen ;<FGHLMZ8 F  2 U V ^  ! FY#dѭѭѦ~h: MhK*h h wh6m h6yh ho9h9p.h`& h8h Ah]^Rh]^RB*OJQJ^JaJphhdK B*OJQJ^JaJphhdK ha5hmhIh' h37h37hElhch&hLh?0<GHMZ8 F B V ! gdo9 & Fgd`&gd`&gd8 & FgddK  & F7$8$H$gddK gd,Xgd'gd?gd37 & Fgd5gd&gd?&! d V^a@vjR+b gdgd.gdlhh.hhYE4h84ZhvvhVdh*8h?h9_h,^uh6y hy6hK*hahK*hFh wh:m$<kq z !S!!!!("^""""""gdnQgd4gd gdCgdgdC & FgdC & Fgd_gdlJ[ p !!!!!!!*!?!R!`!n!q!|!!!!"" "%"y"|"""""""""#k#l####$ $$,$5$N$d$$$%%J%K%L%d%žŮha5h=-FhdfhC>Mh4 h0;<hM]^R T0W6X84Z&dZ6]lqXr@srt,^u w,y6y'm4;@]1jagi?.L,XvvC&c@9_*8a5dfEy6 -6o _{cKj/ E58 yBj&vFIxSK*J@  @Unknown Gz Times New Roman5Symbol3& z Arial1" Helv;&: Helvetica71 Courier5& zaTahoma?5 z Courier New;Wingdings"qhԂ킸77!24ss 2qHP)??2 2007-08-10Michael J. BeckerleMichael J. Beckerle$      Oh+'0 ( H T `lt| 2007-08-10Michael J. Beckerle Normal.dotMichael J. Beckerle4Microsoft Office Word@@8@Jg՜.+,0 hp|  IBM7s  2007-08-10 Title  !"#$%&()*+,-./0123456789:;<=>?@BCDEFGHJKLMNOPSRoot Entry F~xUData 1Table'a3WordDocument.<SummaryInformation(ADocumentSummaryInformation8ICompObjq  FMicrosoft Office Word Document MSWordDocWord.Document.89q