--- 1/draft-ietf-cbor-7049bis-10.txt 2019-12-18 08:14:29.753671972 -0800 +++ 2/draft-ietf-cbor-7049bis-11.txt 2019-12-18 08:14:29.905675849 -0800 @@ -1,19 +1,19 @@ Network Working Group C. Bormann Internet-Draft Universitaet Bremen TZI Obsoletes: 7049 (if approved) P. Hoffman Intended status: Standards Track ICANN Expires: June 20, 2020 December 18, 2019 Concise Binary Object Representation (CBOR) - draft-ietf-cbor-7049bis-10 + draft-ietf-cbor-7049bis-11 Abstract The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack. This document is a revised edition of RFC 7049, with editorial @@ -172,83 +172,83 @@ of the format. 1.1. Objectives The objectives of CBOR, roughly in decreasing order of importance, are: 1. The representation must be able to unambiguously encode most common data formats used in Internet standards. - * It must represent a reasonable set of basic data types and + - It must represent a reasonable set of basic data types and structures using binary encoding. "Reasonable" here is largely influenced by the capabilities of JSON, with the major addition of binary byte strings. The structures supported are limited to arrays and trees; loops and lattice-style graphs are not supported. - * There is no requirement that all data formats be uniquely + - There is no requirement that all data formats be uniquely encoded; that is, it is acceptable that the number "7" might be encoded in multiple different ways. 2. The code for an encoder or decoder must be able to be compact in order to support systems with very limited memory, processor power, and instruction sets. - * An encoder and a decoder need to be implementable in a very + - An encoder and a decoder need to be implementable in a very small amount of code (for example, in class 1 constrained nodes as defined in [RFC7228]). - * The format should use contemporary machine representations of + - The format should use contemporary machine representations of data (for example, not requiring binary-to-decimal conversion). 3. Data must be able to be decoded without a schema description. - * Similar to JSON, encoded data should be self-describing so + - Similar to JSON, encoded data should be self-describing so that a generic decoder can be written. 4. The serialization must be reasonably compact, but data compactness is secondary to code compactness for the encoder and decoder. - * "Reasonable" here is bounded by JSON as an upper bound in + - "Reasonable" here is bounded by JSON as an upper bound in size, and by implementation complexity maintaining a lower bound. Using either general compression schemes or extensive bit-fiddling violates the complexity goals. 5. The format must be applicable to both constrained nodes and high- volume applications. - * This means it must be reasonably frugal in CPU usage for both + - This means it must be reasonably frugal in CPU usage for both encoding and decoding. This is relevant both for constrained nodes and for potential usage in applications with a very high volume of data. 6. The format must support all JSON data types for conversion to and from JSON. - * It must support a reasonable level of conversion as long as + - It must support a reasonable level of conversion as long as the data represented is within the capabilities of JSON. It must be possible to define a unidirectional mapping towards JSON for all types of data. 7. The format must be extensible, and the extended data must be decodable by earlier decoders. - * The format is designed for decades of use. + - The format is designed for decades of use. - * The format must support a form of extensibility that allows + - The format must support a form of extensibility that allows fallback so that a decoder that does not understand an extension can still decode the message. - * The format must be able to be extended in the future by later + - The format must be able to be extended in the future by later IETF standards. 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. @@ -324,63 +324,63 @@ data model, generic CBOR encoders and decoders can be implemented (which usually involves defining additional implementation data types for those data items that do not already have a natural representation in the environment). The ability to provide generic encoders and decoders is an explicit design goal of CBOR; however many applications will provide their own application-specific encoders and/or decoders. In the basic (un-extended) generic data model, a data item is one of: - o an integer in the range -2**64..2**64-1 inclusive - o a simple value, identified by a number between 0 and 255, but + * an integer in the range -2**64..2**64-1 inclusive + * a simple value, identified by a number between 0 and 255, but distinct from that number - o a floating-point value, distinct from an integer, out of the set + * a floating-point value, distinct from an integer, out of the set representable by IEEE 754 binary64 (including non-finites) [IEEE754] - o a sequence of zero or more bytes ("byte string") + * a sequence of zero or more bytes ("byte string") - o a sequence of zero or more Unicode code points ("text string") + * a sequence of zero or more Unicode code points ("text string") - o a sequence of zero or more data items ("array") + * a sequence of zero or more data items ("array") - o a mapping (mathematical function) from zero or more data items + * a mapping (mathematical function) from zero or more data items ("keys") each to a data item ("values"), ("map") - o a tagged data item ("tag"), comprising a tag number (an integer in + * a tagged data item ("tag"), comprising a tag number (an integer in the range 0..2**64-1) and a tagged value (a data item) Note that integer and floating-point values are distinct in this model, even if they have the same numeric value. Also note that serialization variants, such as the number of bytes of the encoded floating value, or the choice of one of the ways in which an integer, the length of a text or byte string, the number of elements in an array or pairs in a map, or a tag number, (collectively "the argument", see Section 3) can be encoded, are not visible at the generic data model level. 2.1. Extended Generic Data Models This basic generic data model comes pre-extended by the registration of a number of simple values and tag numbers right in this document, such as: - o "false", "true", "null", and "undefined" (simple values identified + * "false", "true", "null", and "undefined" (simple values identified by 20..23) - o integer and floating-point values with a larger range and + * integer and floating-point values with a larger range and precision than the above (tag numbers 2 to 5) - o application data types such as a point in time or an RFC 3339 + * application data types such as a point in time or an RFC 3339 date/time string (tag numbers 1, 0) Further elements of the extended generic data model can be (and have been) defined via the IANA registries created for CBOR. Even if such an extension is unknown to a generic encoder or decoder, data items using that extension can be passed to or from the application by representing them at the interface to the application within the basic generic data model, i.e., as generic values of a simple type or generic tags. @@ -1119,49 +1119,49 @@ 3.4.5.3. Encoded Text Some text strings hold data that have formats widely used on the Internet, and sometimes those formats can be validated and presented to the application in appropriate form by the decoder. There are tags for some of these formats. As with tag numbers 21 to 23, if these tags are applied to an item other than a text string, they apply to all text string data items it contains. - o Tag number 32 is for URIs, as defined in [RFC3986]. If the text + * Tag number 32 is for URIs, as defined in [RFC3986]. If the text string doesn't match the "URI-reference" production, the string is invalid. - o Tag numbers 33 and 34 are for base64url- and base64-encoded text + * Tag numbers 33 and 34 are for base64url- and base64-encoded text strings, as defined in [RFC4648]. If any of: - * the encoded text string contains non-alphabet characters or + - the encoded text string contains non-alphabet characters or only 1 character in the last block of 4, or - * the padding bits in a 2- or 3-character block are not 0, or + - the padding bits in a 2- or 3-character block are not 0, or - * the base64 encoding has the wrong number of padding characters, + - the base64 encoding has the wrong number of padding characters, or - * the base64url encoding has padding characters, + - the base64url encoding has padding characters, the string is invalid. - o Tag number 35 is for regular expressions that are roughly in Perl + * Tag number 35 is for regular expressions that are roughly in Perl Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a version of the JavaScript regular expression syntax [ECMA262]. (Note that more specific identification may be necessary if the actual version of the specification underlying the regular expression, or more than just the text of the regular expression itself, need to be conveyed.) Any contained string value is valid. - o Tag number 36 is for MIME messages (including all headers), as + * Tag number 36 is for MIME messages (including all headers), as defined in [RFC2045]. A text string that isn't a valid MIME message is invalid. (For this tag, validity checking may be particularly onerous for a generic decoder and might therefore not be offered. Note that many MIME messages are general binary data and can therefore not be represented in a text string; [IANA.cbor-tags] lists a registration for tag number 257 that is similar to tag number 36 but is used with an enclosed byte string.) Note that tag numbers 33 and 34 differ from 21 and 22 in that the @@ -1246,45 +1246,45 @@ protocols are free to define what they mean by a "deterministic format" and what encoders and decoders are expected to do. This section defines a set of restrictions that can serve as the base of such a deterministic format. 4.2.1. Core Deterministic Encoding Requirements A CBOR encoding satisfies the "core deterministic encoding requirements" if it satisfies the following restrictions: - o Preferred serialization MUST be used. In particular, this means + * Preferred serialization MUST be used. In particular, this means that arguments (see Section 3) for integers, lengths in major types 2 through 5, and tags MUST be as short as possible, for instance: - * 0 to 23 and -1 to -24 MUST be expressed in the same byte as the + - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the major type; - * 24 to 255 and -25 to -256 MUST be expressed only with an + - 24 to 255 and -25 to -256 MUST be expressed only with an additional uint8_t; - * 256 to 65535 and -257 to -65536 MUST be expressed only with an + - 256 to 65535 and -257 to -65536 MUST be expressed only with an additional uint16_t; - * 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed + - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed only with an additional uint32_t. Floating point values also MUST use the shortest form that preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 as 0xfa49742408. - o Indefinite-length items MUST NOT appear. They can be encoded as + * Indefinite-length items MUST NOT appear. They can be encoded as definite-length items instead. - o The keys in every map MUST be sorted in the bytewise lexicographic + * The keys in every map MUST be sorted in the bytewise lexicographic order of their deterministic encodings. For example, the following keys are sorted correctly: 1. 10, encoded as 0x0a. 2. 100, encoded as 0x1864. 3. -1, encoded as 0x20. 4. "z", encoded as 0x617a. @@ -1317,21 +1317,21 @@ position, treating the latter as if they were tagged), the deterministic format would not allow them. In a protocol that requires tags in certain places to obtain specific semantics, the tag needs to appear in the deterministic format as well. Deterministic encoding considerations also apply to the content of tags. Protocols that include floating, big integer, or other complex values need to define extra requirements on their deterministic encodings. For example: - o If a protocol includes a field that can express floating-point + * If a protocol includes a field that can express floating-point values (Section 3.3), the protocol's deterministic encoding needs to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, 0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for this are: 1. Encode integral values that fit in 64 bits as values from major types 0 and 1, and other values as the smallest of 16-, 32-, or 64-bit floating point that accurately represents the value, @@ -1349,27 +1349,27 @@ payloads or signaling NaNs, the protocol needs to pick a single representation, for example 0xf97e00. If that simple choice is not possible, specific attention will be needed for NaN handling. Subnormal numbers (nonzero numbers with the lowest possible exponent of a given IEEE 754 number format) may be flushed to zero outputs or be treated as zero inputs in some floating point implementations. A protocol's deterministic encoding may want to exclude them from interchange, interchanging zero instead. - o If a protocol includes a field that can express integers with an + * If a protocol includes a field that can express integers with an absolute value of 2^64 or larger using tag numbers 2 or 3 (Section 3.4.3), the protocol's deterministic encoding needs to specify whether small integers are expressed using the tag or major types 0 and 1. - o A protocol might give encoders the choice of representing a URL as + * A protocol might give encoders the choice of representing a URL as either a text string or, using Section 3.4.5.3, tag number 32 containing a text string. This protocol's deterministic encoding needs to either require that the tag is present or require that it's absent, not allow either one. 4.2.3. Length-first map key ordering The core deterministic encoding requirements sort map keys in a different order from the one suggested by Section 3.9 of [RFC7049] (called "Canonical CBOR" there). Protocols that need to be @@ -1569,26 +1569,26 @@ needs to have an API that reports an error (and does not return data) for a CBOR data item that contains any of the validity errors listed in the previous subsection. The set of tags defined in the tag registry (Section 9.2), as well as the set of simple values defined in the simple values registry (Section 9.1), can grow at any time beyond the set understood by a generic decoder. A validity-checking decoder can do one of two things when it encounters such a case that it does not recognize: - o It can report an error (and not return data). Note that this + * It can report an error (and not return data). Note that this error is not a validity error per se. This kind of error is more likely to be raised by a decoder that would be performing validity checking if this were a known case. - o It can emit the unknown item (type, value, and, for tags, the + * It can emit the unknown item (type, value, and, for tags, the decoded tagged data item) to the application calling the decoder, with an indication that the decoder did not recognize that tag number or simple value. The latter approach, which is also appropriate for decoders that do not support validity checking, provides forward compatibility with newly registered tags and simple values without the requirement to update the encoder at the same time as the calling application. (For this, the API for the decoder needs to have a way to mark unknown items so that the calling application can handle them in a manner @@ -1756,101 +1756,101 @@ bytes, not characters. 6.1. Converting from CBOR to JSON Most of the types in CBOR have direct analogs in JSON. However, some do not, and someone implementing a CBOR-to-JSON converter has to consider what to do in those cases. The following non-normative advice deals with these by converting them to a single substitute value, such as a JSON null. - o An integer (major type 0 or 1) becomes a JSON number. + * An integer (major type 0 or 1) becomes a JSON number. - o A byte string (major type 2) that is not embedded in a tag that + * A byte string (major type 2) that is not embedded in a tag that specifies a proposed encoding is encoded in base64url without padding and becomes a JSON string. - o A UTF-8 string (major type 3) becomes a JSON string. Note that + * A UTF-8 string (major type 3) becomes a JSON string. Note that JSON requires escaping certain characters ([RFC8259], Section 7): quotation mark (U+0022), reverse solidus (U+005C), and the "C0 control characters" (U+0000 through U+001F). All other characters are copied unchanged into the JSON UTF-8 string. - o An array (major type 4) becomes a JSON array. + * An array (major type 4) becomes a JSON array. - o A map (major type 5) becomes a JSON object. This is possible + * A map (major type 5) becomes a JSON object. This is possible directly only if all keys are UTF-8 strings. A converter might also convert other keys into UTF-8 strings (such as by converting integers into strings containing their decimal representation); however, doing so introduces a danger of key collision. Note also that, if tags on UTF-8 strings are ignored as proposed below, this will cause a key collision if the tags are different but the strings are the same. - o False (major type 7, additional information 20) becomes a JSON + * False (major type 7, additional information 20) becomes a JSON false. - o True (major type 7, additional information 21) becomes a JSON + * True (major type 7, additional information 21) becomes a JSON true. - o Null (major type 7, additional information 22) becomes a JSON + * Null (major type 7, additional information 22) becomes a JSON null. - o A floating-point value (major type 7, additional information 25 + * A floating-point value (major type 7, additional information 25 through 27) becomes a JSON number if it is finite (that is, it can be represented in a JSON number); if the value is non-finite (NaN, or positive or negative Infinity), it is represented by the substitute value. - o Any other simple value (major type 7, any additional information + * Any other simple value (major type 7, any additional information value not yet discussed) is represented by the substitute value. - o A bignum (major type 6, tag number 2 or 3) is represented by + * A bignum (major type 6, tag number 2 or 3) is represented by encoding its byte string in base64url without padding and becomes a JSON string. For tag number 3 (negative bignum), a "~" (ASCII tilde) is inserted before the base-encoded value. (The conversion to a binary blob instead of a number is to prevent a likely numeric overflow for the JSON decoder.) - o A byte string with an encoding hint (major type 6, tag number 21 + * A byte string with an encoding hint (major type 6, tag number 21 through 23) is encoded as described and becomes a JSON string. - o For all other tags (major type 6, any other tag number), the + * For all other tags (major type 6, any other tag number), the enclosed CBOR item is represented as a JSON value; the tag number is ignored. - o Indefinite-length items are made definite before conversion. + * Indefinite-length items are made definite before conversion. 6.2. Converting from JSON to CBOR All JSON values, once decoded, directly map into one or more CBOR values. As with any kind of CBOR generation, decisions have to be made with respect to number representation. In a suggested conversion: - o JSON numbers without fractional parts (integer numbers) are + * JSON numbers without fractional parts (integer numbers) are represented as integers (major types 0 and 1, possibly major type 6 tag number 2 and 3), choosing the shortest form; integers longer than an implementation-defined threshold may instead be represented as floating-point values. The default range that is represented as integer is -2**53+1..2**53-1 (fully exploiting the range for exact integers in the binary64 representation often used for decoding JSON [RFC7493]). A CBOR-based protocol, or a generic converter implementation, may choose -2**32..2**32-1 or -2**64..2**64-1 (fully using the integer ranges available in CBOR with uint32_t or uint64_t, respectively) or even -2**31..2**31-1 or -2**63..2**63-1 (using popular ranges for two's complement signed integers). (If the JSON was generated from a JavaScript implementation, its precision is already limited to 53 bits maximum.) - o Numbers with fractional parts are represented as floating-point + * Numbers with fractional parts are represented as floating-point values, performing the decimal-to-binary conversion based on the precision provided by IEEE 754 binary64. Then, when encoding in CBOR, the preferred serialization uses the shortest floating-point representation exactly representing this conversion result; for instance, 1.5 is represented in a 16-bit floating-point value (not all implementations will be capable of efficiently finding the minimum form, though). Instead of using the default binary64 precision, there may be an implementation-defined limit to the precision of the conversion that will affect the precision of the represented values. Decimal representation should only be used on @@ -1899,38 +1899,38 @@ protocol is designed to tolerate and embrace implementations that start using more codepoints than initially allocated. Sizing the codepoint space may be difficult because the range required may be hard to predict. An attempt should be made to make the codepoint space large enough so that it can slowly be filled over the intended lifetime of the protocol. CBOR has three major extension points: - o the "simple" space (values in major type 7). Of the 24 efficient + * the "simple" space (values in major type 7). Of the 24 efficient (and 224 slightly less efficient) values, only a small number have been allocated. Implementations receiving an unknown simple data item may be able to process it as such, given that the structure of the value is indeed simple. The IANA registry in Section 9.1 is the appropriate way to address the extensibility of this codepoint space. - o the "tag" space (values in major type 6). Again, only a small + * the "tag" space (values in major type 6). Again, only a small part of the codepoint space has been allocated, and the space is abundant (although the early numbers are more efficient than the later ones). Implementations receiving an unknown tag number can choose to simply ignore it or to process it as an unknown tag number wrapping the enclosed data item. The IANA registry in Section 9.2 is the appropriate way to address the extensibility of this codepoint space. - o the "additional information" space. An implementation receiving + * the "additional information" space. An implementation receiving an unknown additional information value has no way to continue decoding, so allocating codepoints to this space is a major step. There are also very few codepoints left. 7.2. Curating the Additional Information Space The human mind is sometimes drawn to filling in little perceived gaps to make something neat. We expect the remaining gaps in the codepoint space for the additional information values to be an attractor for new ideas, just because they are there. @@ -2056,29 +2056,29 @@ Tags" registry at [IANA.cbor-tags]. The tags that were defined in [RFC7049] are described in detail in Section 3.4, but other tags have already been defined. New entries in the range 0 to 23 are assigned by Standards Action. New entries in the range 24 to 255 are assigned by Specification Required. New entries in the range 256 to 18446744073709551615 are assigned by First Come First Served. The template for registration requests is: - o Data item + * Data item - o Semantics (short form) + * Semantics (short form) In addition, First Come First Served requests should include: - o Point of contact + * Point of contact - o Description of semantics (URL) - This description is optional; the + * Description of semantics (URL) - This description is optional; the URL can point to something like an Internet-Draft or a web page. 9.3. Media Type ("MIME Type") The Internet media type [RFC6838] for a single encoded CBOR data item is application/cbor. Type name: application Subtype name: cbor @@ -2727,36 +2727,36 @@ | 0xff | "break" stop code | +------------+------------------------------------------------------+ Table 6: Jump Table for Initial Byte Appendix C. Pseudocode The well-formedness of a CBOR item can be checked by the pseudocode in Figure 1. The data is well-formed if and only if: - o the pseudocode does not "fail"; + * the pseudocode does not "fail"; - o after execution of the pseudocode, no bytes are left in the input + * after execution of the pseudocode, no bytes are left in the input (except in streaming applications) The pseudocode has the following prerequisites: - o take(n) reads n bytes from the input data and returns them as a + * take(n) reads n bytes from the input data and returns them as a byte string. If n bytes are no longer available, take(n) fails. - o uint() converts a byte string into an unsigned integer by + * uint() converts a byte string into an unsigned integer by interpreting the byte string in network byte order. - o Arithmetic works as in C. + * Arithmetic works as in C. - o All variables are unsigned integers of sufficient range. + * All variables are unsigned integers of sufficient range. Note that "well_formed" returns the major type for well-formed definite length items, but 0 for an indefinite length item (or -1 for a break stop code, only if "breakable" is set). This is used in "well_formed_indefinite" to ascertain that indefinite length strings only contain definite length strings as chunks. well_formed (breakable = false) { // process initial bytes ib = uint(take(1)); @@ -2995,80 +2995,80 @@ +-------------+--------------------------+--------------------------+ Table 7: Examples for Different Levels of Conciseness Appendix F. Changes from RFC 7049 The following is a list of known changes from RFC 7049. This list is non-authoritative. It is meant to help reviewers see the significant differences. - o Updated reference for [RFC4627] to [RFC8259] in many places + * Updated reference for [RFC4627] to [RFC8259] in many places - o Updated reference for [CNN-TERMS] to [RFC7228] + * Updated reference for [CNN-TERMS] to [RFC7228] - o Added a comment to the last example in Section 2.2.1 (added + * Added a comment to the last example in Section 2.2.1 (added "Second value") - o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") + * Fixed a bug in the example in Section 2.4.2 ("29" -> "49") - o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> + * Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> "0b000_11001") Appendix G. Well-formedness errors and examples There are three basic kinds of well-formedness errors that can occur in decoding a CBOR data item: - o Too much data: There are input bytes left that were not consumed. + * Too much data: There are input bytes left that were not consumed. This is only an error if the application assumed that the input bytes would span exactly one data item. Where the application uses the self-delimiting nature of CBOR encoding to permit additional data after the data item, as is for example done in CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can simply indicate what part of the input has not been consumed. - o Too little data: The input data available would need additional + * Too little data: The input data available would need additional bytes added at their end for a complete CBOR data item. This may indicate the input is truncated; it is also a common error when trying to decode random data as CBOR. For some applications however, this may not be actually be an error, as the application may not be certain it has all the data yet and can obtain or wait for additional input bytes. Some of these applications may have an upper limit for how much additional data can show up; here the decoder may be able to indicate that the encoded CBOR data item cannot be completed within this limit. - o Syntax error: The input data are not consistent with the + * Syntax error: The input data are not consistent with the requirements of the CBOR encoding, and this cannot be remedied by adding (or removing) data at the end. In Appendix C, errors of the first kind are addressed in the first paragraph/bullet list (requiring "no bytes are left"), and errors of the second kind are addressed in the second paragraph/bullet list (failing "if n bytes are no longer available"). Errors of the third kind are identified in the pseudocode by specific instances of calling fail(), in order: - o a reserved value is used for additional information (28, 29, 30) + * a reserved value is used for additional information (28, 29, 30) - o major type 7, additional information 24, value < 32 (incorrect or + * major type 7, additional information 24, value < 32 (incorrect or incorrectly encoded simple type) - o incorrect substructure of indefinite length byte/text string (may + * incorrect substructure of indefinite length byte/text string (may only contain definite length strings of the same major type) - o break stop code (mt=7, ai=31) occurs in a value position of a map + * break stop code (mt=7, ai=31) occurs in a value position of a map or except at a position directly in an indefinite length item where also another enclosed data item could occur - o additional information 31 used with major type 0, 1, or 6 + * additional information 31 used with major type 0, 1, or 6 G.1. Examples for CBOR data items that are not well-formed This subsection shows a few examples for CBOR data items that are not well-formed. Each example is a sequence of bytes each shown in hexadecimal; multiple examples in a list are separated by commas. Examples for well-formedness error kind 1 (too much data) can easily be formed by adding data to a well-formed encoded CBOR data item. @@ -3076,77 +3076,77 @@ data) can be formed by truncating a well-formed encoded CBOR data item. In test suites, it may be beneficial to specifically test with incomplete data items that would require large amounts of addition to be completed (for instance by starting the encoding of a string of a very large size). A premature end of the input can occur in a head or within the enclosed data, which may be bare strings or enclosed data items that are either counted or should have been ended by a break stop code. - o End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 + * End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 00 00, fb 00 00 00 - o Definite length strings with short data: 41, 61, 5a ff ff ff ff + * Definite length strings with short data: 41, 61, 5a ff ff ff ff 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f ff ff ff ff ff ff ff 01 02 03 - o Definite length maps and arrays not closed with enough items: 81, + * Definite length maps and arrays not closed with enough items: 81, 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 00 - o Indefinite length strings not closed by a break stop code: 5f 41 + * Indefinite length strings not closed by a break stop code: 5f 41 00, 7f 61 00 - o Indefinite length maps and arrays not closed by a break stop code: + * Indefinite length maps and arrays not closed by a break stop code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff A few examples for the five subkinds of well-formedness error kind 3 (syntax error) are shown below. Subkind 1: - o Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, + * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, fd, fe, Subkind 2: - o Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, + * Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, f8 1f Subkind 3: - o Indefinite length string chunks not of the correct type: 5f 00 ff, + * Indefinite length string chunks not of the correct type: 5f 00 ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, 7f 41 00 ff - o Indefinite length string chunks not definite length: 5f 5f 41 00 + * Indefinite length string chunks not definite length: 5f 5f 41 00 ff ff, 7f 7f 61 00 ff ff Subkind 4: - o Break occurring on its own outside of an indefinite length item: + * Break occurring on its own outside of an indefinite length item: ff - o Break occurring in a definite length array or map or a tag: 81 ff, + * Break occurring in a definite length array or map or a tag: 81 ff, 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 9f 81 9f 9f ff ff ff ff - o Break in indefinite length map would lead to odd number of items + * Break in indefinite length map would lead to odd number of items (break in a value position): bf 00 ff, bf 00 00 00 ff Subkind 5: - o Major type 0, 1, 6 with additional information 31: 1f, 3f, df + * Major type 0, 1, 6 with additional information 31: 1f, 3f, df Acknowledgements CBOR was inspired by MessagePack. MessagePack was developed and promoted by Sadayuki Furuhashi ("frsyuki"). This reference to MessagePack is solely for attribution; CBOR is not intended as a version of or replacement for MessagePack, as it has different design goals and requirements. The need for functionality beyond the original MessagePack