ࡱ> SUR` 0Q3bjbj .6 &  $ (((((,$ _5$))))))*,*4444444$7h94Q *))* *4 ))5---*H ) )4-*4--:3, 4)( ޑ(7,3 4/50_53.u:,u:4u: 4**-*****44-***_5****$  P"d$ P"$ $ $   DFDL Proposal to Simplify Opaque Types. 2007-08-10 V001 Introduction A user who wishes to pass-through/copy a piece of data without describing its structure has too many alternatives. Currently, one option is use of hexBinary and base64Binary logical types. However, this causes confusion because of the representation-implications of these logical type names. These were proposed for consistency with current commercial usage of pre-DFDL technologies for mapping binary data into XML. However, this seems to not be worth it given the confusion this causes. The problem arises from the fact that that hexBinary and base64Binary arent really logical types in XML, they are escape-hatch mechanisms for carrying binary data in what is essentially a textual format. Proposal: Lets drop hexBinary and base64binary types from DFDL represent opaque data as one of: byte strings arrays of byte integers any wildcards These are each explained below. Byte String A byte string is described like this: The encoding=bytes is a special encoding name which indicates that no character set translation is to be done on the data. So, a string with encoding=bytes is a byte string, and this is the general means to express opaque data when a byte blob is what is desired. There are a number of interesting cases where encoding=bytes is desirable for other reasons than opaque blobs, so if were going to need this anyway, might as well offer people byte strings. Byte Array Suppose we wanted to model this same data logically as an array of bytes. This alternative is always available. It requires no special types or keywords or encoding names. 'any' wildcard Suppose we wanted to model this same opaque data as a wildcard: The v019 spec draft for DFDL suggests that this any wildcard is translated into a hidden element; hence, does not become visible in the data model or implied schema of the data. This has the disadvantage that one cannot address this data in the logical model. E.g., you cant treat it as a byte string or byte array. You would have no ability at all to deal with it. By enclosing it within a sequence group inside another element, you could have a way to carry it, and copy it, but no way to examine it at all as part of the logical data. Appendix: Bit String As with the Byte String proposal, it has been suggested that sometimes bit strings would be useful. In a manner analogous to encoding=bytes, we could allow encoding=bits, where this character set encoding has exactly two codepoints which are 0 and 1. (or perhaps they are 0x30 and 0x31 which are the Unicode codepoints for characters 0 and 1 ??) This is just food for thought, and for DFDL v1.0 this is not part of this proposal. Appendix: Prior Clarifying Examples for Opaque and HexBinary This section shows how hexBinary type was expected to work, and the comments on that from Simon Parker indicating how confusing he found it. String Type This string contains Japanese characters. "2003t^08g27e" In UTF-8 encoding, the hexadecimal data bytes are these: 32 30 30 33 e5 b9 b4 30 38 e6 9c 88 32 37 e6 97 a5 Let us assume we have this collection of bytes in a file. The length is 11 characters, 17 bytes. In DFDL, we can describe this as: <element name="d" type="string" dfdl:representation="text" dfdl:encoding="utf-8" dfdl:length="11" dfdl:lengthUnitKind="characters"/> Now consider this XPath expression involving element d: substring(d, 1, 10) // substring starting at position 1 for 10 chars It's pretty clear that this should return a 10 character string containing "2003t^08g27", which is missing the final character of the string. XPath expressions on strings support only access to the data as strings and characters. In addition: string-to-codepoints(substring(d, 5, 1)) would return the character codepoint value 0x5e74 , which is the Unicode codepoint for the 5th (base 1 indexing in XPath) character which is the "t^" or 'year' character. HexBinary Type Suppose we wanted to model this same data logically as a hexBinary 'blob'. <element name="d" type="hexBinary" dfdl:repr#)5:G  l = W |    A a g n Q S z e f '( !V]gŽͱɩɥh"hh-hah hyh@srhth,yh;@h+h9p.h A h8h Ah,Xhmh8h@h'hElhLh??)45:G1  A b o gd,y & Fgd9p. & Fgd+ & Fgd9p. & Fgd8gd8gd,Xgd8gd'gd? & FgdElgdLgd?V1P3 W X f '(4~C{!1q^gdtgd;@gdtgd;@^gd-`gd-gd-gd;@gd+gd?gdtgd,y4I[h *568MAgd{cgdLgd h#gd{c^gd&dZgd&dZgdElgd"gd;@gd;@34X*68La(+A  PTüǮǮǦǎǖh?OJPJQJ^Jhh?H*hah?OJPJQJ^Jo(hah?o( hah? hh? h{ch{ch{ch?h6X hLhLhLh Th h#hElhe.9h&dZhh"hezWh"6]34k_`*D0*0<0_0`0^gd?gd?gd?gdEl 000V1W1m1o1p1q1O3P3Q3hlKjh80JU h80Jh8jh8U hah?U jh?0J5<U\^Jh?esentation="text" dfdl:encoding="utf-8" dfdl:length="11" dfdl:lengthUnitKind="characters"/> Now, consider this XPath: substring(d, 1, 10) // substring starting at position 1 for 10 chars. This should return "32303033e5 ", that is, the first ten hex digit characters. This is consistent with the type being hexBinary, and not string. PAGE \# "'Page: '#' '" HexBinary confuses me. Were counting nybbles, not characters, arent we? Perhaps dfdl:encoding is the problem. HexBinary is not a representation, but an encoding (or rather: a decoding, since the text encodes the binary rather than the other way round.) There are two decodings: hexbinary and base64binary (for completeness, perhaps we should add decoding base2binary?) The representation is 'binary', never text. dfdl:encoding=utf-8 is misleading. as I say, Im confused! `0{0|000U1V1q2263O3P3Q3gd?gd?gd? ,1h/ =!"#$% H@H ?NormalxOJQJ_HaJmH sH tH V@V ? Heading 1$ & Fx<@&5KH \^JaJ N@N ? Heading 2$ & F@&5\]^JaJh@h ? Heading 3&$ & F @&^`5OJQJ\^JaJT@T ? Heading 4$$ & F<@&a$ 5\aJT@T ? Heading 5 & F<@&56CJ\]aJV@V ? Heading 6 & F<@&5CJOJQJ\aJL@L ? Heading 7 & F<@& CJOJQJR@R ? Heading 8 & F<@&6CJOJQJ]L @L ? Heading 9 & F<@& CJ^JaJDA@D Default Paragraph FontRi@R  Table Normal4 l4a (k@(No List.O. ?nobreak$LOL ? nobreak CharOJQJ_HaJmH sH tH XOX ? Char Char4)5OJQJ\]^J_HaJmH sH tH TO!T ? Char Char3&5OJQJ\^J_HaJmH sH tH 8@28 ? Comment TextaJ\OB\? CodeBlock$h*$^hCJOJQJaJmHnHu\OQ\?CodeBlock Char'CJOJQJ_HaJmHnHsH tH uB'@aB ?Comment ReferenceCJaJH@rH ? Balloon TextCJOJQJ^JaJ^o^ 8Heading 1 Char*5KH OJQJ\^J_HaJ mH sH tH  Simon Parker: sp# X 64~1q 5 >u"#ijO\1A56|} :0000p0:00000:0000ʺ0:0000-0:0000p0:00000:0000ʺ0:00000:00000:0000-0:0000ʺ0:00000:00000:0000-0:00000:00000:00000:0000/0:00000:00000:00000:000080:0000o0:00000:00000:00000:0000K0:0000L0:00000:00000:00000:0000չ0:0000ֹ0:0000p0:00000:0000ʺ0:00000:00000:0000-0:0000P0:0000Q0:00000:00000:00000:0000004~1q 5 >u"#ijO\1A56|} :00d0pd:00d0d:00d0ʺd:00d0-d:00d0pd:00d0d:00d0ʺd:00d0d:00d0d:00d0-d:00d0ʺd:00d0d:00d0d:00d0-d:00d0d:00d0d:00d0d:00d0/d:00d0d:00d0d:00d0d:00d08d:00d0od:00d0d:00d0d:00d0d:00d0Kd:00d0Ld:00d0d:00d0d:00d0d:00d0չd:00d0ֹd:00d0pd:00d0d:00d0ʺd:00d0d:00d0d:00d0-d:00d0Pd:00d0Qd:00d0d:00d0d:00d0d:00d0d0)45:G1 Abo WXf'(4~C{!1q  4 I [ h  * 5 6 8 M A>u"#ijO\1A56|}+  0000)0) 00:0:0: 00 0 0 0A 0A 0A0 000000000 00(0(0(0(0(0(0(0(0( 00!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0!0! 008 08 08  00 0000000000000000000 0000000000000@0@0@0@0004~C{1q 5 >u"#ijO\1A56|} *B 000@0@p0@p0@p0@p0@p0@p0*B 000@0z@p0z@p0z@p0z@p0z@p0z@p0z@p0z@p0zA 0*B 000@0@0@0@0@p0@p0@p0@p0@0@0@0@p0@0@0@0@p0@p0@0*B 000@0*@p0*@p0*@p0*@p0*@0*@0*@0*@p0*@0*@0*0Q3 `0Q3P3!8@0(  B S  ? _Toc174412930 _Toc132009033 _Toc174412931 _Toc132009034 _Toc17441293211 @@@  # 1:\e. 8 7@XaRT1:z-2+4QX}9BDM (@"%T b M O w| 333333333)5:G  @Ab(3+!04 K   M 1AB   S N``6h P^`PhHh @@^@`hH.h 0^`0hH..h ``^``hH... h ^`hH .... h ^`hH ..... h ^`hH ...... h `^``hH....... h 00^0`hH........h^`OJQJo(hHh^`OJQJ^Jo(hHohpp^p`OJQJo(hHh@ @ ^@ `OJQJo(hHh^`OJQJ^Jo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJ^Jo(hHohPP^P`OJQJo(hH N`         -a+" h#9p.e.9 AlK T6X&dZEl@srt,y'm;@?L,X@{c8 y@(wwL   `@``4@````@Unknown Gz Times New Roman5Symbol3& z ArialG5  hMS Mincho-3 fg;" Helvetica71 Courier5& zaTahoma?5 z Courier New;Wingdings"qhXX#$ &$ &!24 2qHP)??2 2007-08-10Michael J. BeckerleMichael J. Beckerle  Oh+'0 ( H T `lt| 2007-08-10Michael J. Beckerle Normal.dotMichael J. Beckerle3Microsoft Office Word@@j@Nr$՜.+,0 hp|  IBM&   2007-08-10 Title  !"#%&'()*+,-./0123456789:;<=>?@ACDEFGHIKLMNOPQTRoot Entry FpVData 1Table$:WordDocument.6SummaryInformation(BDocumentSummaryInformation8JCompObjq  FMicrosoft Office Word Document MSWordDocWord.Document.89q