draft-ietf-cbor-7049bis-10.txt   draft-ietf-cbor-7049bis-11.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI Internet-Draft Universitaet Bremen TZI
Obsoletes: 7049 (if approved) P. Hoffman Obsoletes: 7049 (if approved) P. Hoffman
Intended status: Standards Track ICANN Intended status: Standards Track ICANN
Expires: June 20, 2020 December 18, 2019 Expires: June 20, 2020 December 18, 2019
Concise Binary Object Representation (CBOR) Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-10 draft-ietf-cbor-7049bis-11
Abstract Abstract
The Concise Binary Object Representation (CBOR) is a data format The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack. earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial This document is a revised edition of RFC 7049, with editorial
skipping to change at page 4, line 41 skipping to change at page 4, line 41
of the format. of the format.
1.1. Objectives 1.1. Objectives
The objectives of CBOR, roughly in decreasing order of importance, The objectives of CBOR, roughly in decreasing order of importance,
are: are:
1. The representation must be able to unambiguously encode most 1. The representation must be able to unambiguously encode most
common data formats used in Internet standards. common data formats used in Internet standards.
* It must represent a reasonable set of basic data types and - It must represent a reasonable set of basic data types and
structures using binary encoding. "Reasonable" here is structures using binary encoding. "Reasonable" here is
largely influenced by the capabilities of JSON, with the major largely influenced by the capabilities of JSON, with the major
addition of binary byte strings. The structures supported are addition of binary byte strings. The structures supported are
limited to arrays and trees; loops and lattice-style graphs limited to arrays and trees; loops and lattice-style graphs
are not supported. are not supported.
* There is no requirement that all data formats be uniquely - There is no requirement that all data formats be uniquely
encoded; that is, it is acceptable that the number "7" might encoded; that is, it is acceptable that the number "7" might
be encoded in multiple different ways. be encoded in multiple different ways.
2. The code for an encoder or decoder must be able to be compact in 2. The code for an encoder or decoder must be able to be compact in
order to support systems with very limited memory, processor order to support systems with very limited memory, processor
power, and instruction sets. power, and instruction sets.
* An encoder and a decoder need to be implementable in a very - An encoder and a decoder need to be implementable in a very
small amount of code (for example, in class 1 constrained small amount of code (for example, in class 1 constrained
nodes as defined in [RFC7228]). nodes as defined in [RFC7228]).
* The format should use contemporary machine representations of - The format should use contemporary machine representations of
data (for example, not requiring binary-to-decimal data (for example, not requiring binary-to-decimal
conversion). conversion).
3. Data must be able to be decoded without a schema description. 3. Data must be able to be decoded without a schema description.
* Similar to JSON, encoded data should be self-describing so - Similar to JSON, encoded data should be self-describing so
that a generic decoder can be written. that a generic decoder can be written.
4. The serialization must be reasonably compact, but data 4. The serialization must be reasonably compact, but data
compactness is secondary to code compactness for the encoder and compactness is secondary to code compactness for the encoder and
decoder. decoder.
* "Reasonable" here is bounded by JSON as an upper bound in - "Reasonable" here is bounded by JSON as an upper bound in
size, and by implementation complexity maintaining a lower size, and by implementation complexity maintaining a lower
bound. Using either general compression schemes or extensive bound. Using either general compression schemes or extensive
bit-fiddling violates the complexity goals. bit-fiddling violates the complexity goals.
5. The format must be applicable to both constrained nodes and high- 5. The format must be applicable to both constrained nodes and high-
volume applications. volume applications.
* This means it must be reasonably frugal in CPU usage for both - This means it must be reasonably frugal in CPU usage for both
encoding and decoding. This is relevant both for constrained encoding and decoding. This is relevant both for constrained
nodes and for potential usage in applications with a very high nodes and for potential usage in applications with a very high
volume of data. volume of data.
6. The format must support all JSON data types for conversion to and 6. The format must support all JSON data types for conversion to and
from JSON. from JSON.
* It must support a reasonable level of conversion as long as - It must support a reasonable level of conversion as long as
the data represented is within the capabilities of JSON. It the data represented is within the capabilities of JSON. It
must be possible to define a unidirectional mapping towards must be possible to define a unidirectional mapping towards
JSON for all types of data. JSON for all types of data.
7. The format must be extensible, and the extended data must be 7. The format must be extensible, and the extended data must be
decodable by earlier decoders. decodable by earlier decoders.
* The format is designed for decades of use. - The format is designed for decades of use.
* The format must support a form of extensibility that allows - The format must support a form of extensibility that allows
fallback so that a decoder that does not understand an fallback so that a decoder that does not understand an
extension can still decode the message. extension can still decode the message.
* The format must be able to be extended in the future by later - The format must be able to be extended in the future by later
IETF standards. IETF standards.
1.2. Terminology 1.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
skipping to change at page 7, line 51 skipping to change at page 7, line 51
data model, generic CBOR encoders and decoders can be implemented data model, generic CBOR encoders and decoders can be implemented
(which usually involves defining additional implementation data types (which usually involves defining additional implementation data types
for those data items that do not already have a natural for those data items that do not already have a natural
representation in the environment). The ability to provide generic representation in the environment). The ability to provide generic
encoders and decoders is an explicit design goal of CBOR; however encoders and decoders is an explicit design goal of CBOR; however
many applications will provide their own application-specific many applications will provide their own application-specific
encoders and/or decoders. encoders and/or decoders.
In the basic (un-extended) generic data model, a data item is one of: In the basic (un-extended) generic data model, a data item is one of:
o an integer in the range -2**64..2**64-1 inclusive * an integer in the range -2**64..2**64-1 inclusive
o a simple value, identified by a number between 0 and 255, but * a simple value, identified by a number between 0 and 255, but
distinct from that number distinct from that number
o a floating-point value, distinct from an integer, out of the set * a floating-point value, distinct from an integer, out of the set
representable by IEEE 754 binary64 (including non-finites) representable by IEEE 754 binary64 (including non-finites)
[IEEE754] [IEEE754]
o a sequence of zero or more bytes ("byte string") * a sequence of zero or more bytes ("byte string")
o a sequence of zero or more Unicode code points ("text string") * a sequence of zero or more Unicode code points ("text string")
o a sequence of zero or more data items ("array") * a sequence of zero or more data items ("array")
o a mapping (mathematical function) from zero or more data items * a mapping (mathematical function) from zero or more data items
("keys") each to a data item ("values"), ("map") ("keys") each to a data item ("values"), ("map")
o a tagged data item ("tag"), comprising a tag number (an integer in * a tagged data item ("tag"), comprising a tag number (an integer in
the range 0..2**64-1) and a tagged value (a data item) the range 0..2**64-1) and a tagged value (a data item)
Note that integer and floating-point values are distinct in this Note that integer and floating-point values are distinct in this
model, even if they have the same numeric value. model, even if they have the same numeric value.
Also note that serialization variants, such as the number of bytes of Also note that serialization variants, such as the number of bytes of
the encoded floating value, or the choice of one of the ways in which the encoded floating value, or the choice of one of the ways in which
an integer, the length of a text or byte string, the number of an integer, the length of a text or byte string, the number of
elements in an array or pairs in a map, or a tag number, elements in an array or pairs in a map, or a tag number,
(collectively "the argument", see Section 3) can be encoded, are not (collectively "the argument", see Section 3) can be encoded, are not
visible at the generic data model level. visible at the generic data model level.
2.1. Extended Generic Data Models 2.1. Extended Generic Data Models
This basic generic data model comes pre-extended by the registration This basic generic data model comes pre-extended by the registration
of a number of simple values and tag numbers right in this document, of a number of simple values and tag numbers right in this document,
such as: such as:
o "false", "true", "null", and "undefined" (simple values identified * "false", "true", "null", and "undefined" (simple values identified
by 20..23) by 20..23)
o integer and floating-point values with a larger range and * integer and floating-point values with a larger range and
precision than the above (tag numbers 2 to 5) precision than the above (tag numbers 2 to 5)
o application data types such as a point in time or an RFC 3339 * application data types such as a point in time or an RFC 3339
date/time string (tag numbers 1, 0) date/time string (tag numbers 1, 0)
Further elements of the extended generic data model can be (and have Further elements of the extended generic data model can be (and have
been) defined via the IANA registries created for CBOR. Even if such been) defined via the IANA registries created for CBOR. Even if such
an extension is unknown to a generic encoder or decoder, data items an extension is unknown to a generic encoder or decoder, data items
using that extension can be passed to or from the application by using that extension can be passed to or from the application by
representing them at the interface to the application within the representing them at the interface to the application within the
basic generic data model, i.e., as generic values of a simple type or basic generic data model, i.e., as generic values of a simple type or
generic tags. generic tags.
skipping to change at page 25, line 22 skipping to change at page 25, line 22
3.4.5.3. Encoded Text 3.4.5.3. Encoded Text
Some text strings hold data that have formats widely used on the Some text strings hold data that have formats widely used on the
Internet, and sometimes those formats can be validated and presented Internet, and sometimes those formats can be validated and presented
to the application in appropriate form by the decoder. There are to the application in appropriate form by the decoder. There are
tags for some of these formats. As with tag numbers 21 to 23, if tags for some of these formats. As with tag numbers 21 to 23, if
these tags are applied to an item other than a text string, they these tags are applied to an item other than a text string, they
apply to all text string data items it contains. apply to all text string data items it contains.
o Tag number 32 is for URIs, as defined in [RFC3986]. If the text * Tag number 32 is for URIs, as defined in [RFC3986]. If the text
string doesn't match the "URI-reference" production, the string is string doesn't match the "URI-reference" production, the string is
invalid. invalid.
o Tag numbers 33 and 34 are for base64url- and base64-encoded text * Tag numbers 33 and 34 are for base64url- and base64-encoded text
strings, as defined in [RFC4648]. If any of: strings, as defined in [RFC4648]. If any of:
* the encoded text string contains non-alphabet characters or - the encoded text string contains non-alphabet characters or
only 1 character in the last block of 4, or only 1 character in the last block of 4, or
* the padding bits in a 2- or 3-character block are not 0, or - the padding bits in a 2- or 3-character block are not 0, or
* the base64 encoding has the wrong number of padding characters, - the base64 encoding has the wrong number of padding characters,
or or
* the base64url encoding has padding characters, - the base64url encoding has padding characters,
the string is invalid. the string is invalid.
o Tag number 35 is for regular expressions that are roughly in Perl * Tag number 35 is for regular expressions that are roughly in Perl
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
version of the JavaScript regular expression syntax [ECMA262]. version of the JavaScript regular expression syntax [ECMA262].
(Note that more specific identification may be necessary if the (Note that more specific identification may be necessary if the
actual version of the specification underlying the regular actual version of the specification underlying the regular
expression, or more than just the text of the regular expression expression, or more than just the text of the regular expression
itself, need to be conveyed.) Any contained string value is itself, need to be conveyed.) Any contained string value is
valid. valid.
o Tag number 36 is for MIME messages (including all headers), as * Tag number 36 is for MIME messages (including all headers), as
defined in [RFC2045]. A text string that isn't a valid MIME defined in [RFC2045]. A text string that isn't a valid MIME
message is invalid. (For this tag, validity checking may be message is invalid. (For this tag, validity checking may be
particularly onerous for a generic decoder and might therefore not particularly onerous for a generic decoder and might therefore not
be offered. Note that many MIME messages are general binary data be offered. Note that many MIME messages are general binary data
and can therefore not be represented in a text string; and can therefore not be represented in a text string;
[IANA.cbor-tags] lists a registration for tag number 257 that is [IANA.cbor-tags] lists a registration for tag number 257 that is
similar to tag number 36 but is used with an enclosed byte similar to tag number 36 but is used with an enclosed byte
string.) string.)
Note that tag numbers 33 and 34 differ from 21 and 22 in that the Note that tag numbers 33 and 34 differ from 21 and 22 in that the
skipping to change at page 28, line 10 skipping to change at page 28, line 10
protocols are free to define what they mean by a "deterministic protocols are free to define what they mean by a "deterministic
format" and what encoders and decoders are expected to do. This format" and what encoders and decoders are expected to do. This
section defines a set of restrictions that can serve as the base of section defines a set of restrictions that can serve as the base of
such a deterministic format. such a deterministic format.
4.2.1. Core Deterministic Encoding Requirements 4.2.1. Core Deterministic Encoding Requirements
A CBOR encoding satisfies the "core deterministic encoding A CBOR encoding satisfies the "core deterministic encoding
requirements" if it satisfies the following restrictions: requirements" if it satisfies the following restrictions:
o Preferred serialization MUST be used. In particular, this means * Preferred serialization MUST be used. In particular, this means
that arguments (see Section 3) for integers, lengths in major that arguments (see Section 3) for integers, lengths in major
types 2 through 5, and tags MUST be as short as possible, for types 2 through 5, and tags MUST be as short as possible, for
instance: instance:
* 0 to 23 and -1 to -24 MUST be expressed in the same byte as the - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the
major type; major type;
* 24 to 255 and -25 to -256 MUST be expressed only with an - 24 to 255 and -25 to -256 MUST be expressed only with an
additional uint8_t; additional uint8_t;
* 256 to 65535 and -257 to -65536 MUST be expressed only with an - 256 to 65535 and -257 to -65536 MUST be expressed only with an
additional uint16_t; additional uint16_t;
* 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
only with an additional uint32_t. only with an additional uint32_t.
Floating point values also MUST use the shortest form that Floating point values also MUST use the shortest form that
preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5
as 0xfa49742408. as 0xfa49742408.
o Indefinite-length items MUST NOT appear. They can be encoded as * Indefinite-length items MUST NOT appear. They can be encoded as
definite-length items instead. definite-length items instead.
o The keys in every map MUST be sorted in the bytewise lexicographic * The keys in every map MUST be sorted in the bytewise lexicographic
order of their deterministic encodings. For example, the order of their deterministic encodings. For example, the
following keys are sorted correctly: following keys are sorted correctly:
1. 10, encoded as 0x0a. 1. 10, encoded as 0x0a.
2. 100, encoded as 0x1864. 2. 100, encoded as 0x1864.
3. -1, encoded as 0x20. 3. -1, encoded as 0x20.
4. "z", encoded as 0x617a. 4. "z", encoded as 0x617a.
skipping to change at page 29, line 32 skipping to change at page 29, line 32
position, treating the latter as if they were tagged), the position, treating the latter as if they were tagged), the
deterministic format would not allow them. In a protocol that deterministic format would not allow them. In a protocol that
requires tags in certain places to obtain specific semantics, the tag requires tags in certain places to obtain specific semantics, the tag
needs to appear in the deterministic format as well. Deterministic needs to appear in the deterministic format as well. Deterministic
encoding considerations also apply to the content of tags. encoding considerations also apply to the content of tags.
Protocols that include floating, big integer, or other complex values Protocols that include floating, big integer, or other complex values
need to define extra requirements on their deterministic encodings. need to define extra requirements on their deterministic encodings.
For example: For example:
o If a protocol includes a field that can express floating-point * If a protocol includes a field that can express floating-point
values (Section 3.3), the protocol's deterministic encoding needs values (Section 3.3), the protocol's deterministic encoding needs
to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00,
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for 0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for
this are: this are:
1. Encode integral values that fit in 64 bits as values from 1. Encode integral values that fit in 64 bits as values from
major types 0 and 1, and other values as the smallest of 16-, major types 0 and 1, and other values as the smallest of 16-,
32-, or 64-bit floating point that accurately represents the 32-, or 64-bit floating point that accurately represents the
value, value,
skipping to change at page 30, line 16 skipping to change at page 30, line 16
payloads or signaling NaNs, the protocol needs to pick a single payloads or signaling NaNs, the protocol needs to pick a single
representation, for example 0xf97e00. If that simple choice is representation, for example 0xf97e00. If that simple choice is
not possible, specific attention will be needed for NaN handling. not possible, specific attention will be needed for NaN handling.
Subnormal numbers (nonzero numbers with the lowest possible Subnormal numbers (nonzero numbers with the lowest possible
exponent of a given IEEE 754 number format) may be flushed to zero exponent of a given IEEE 754 number format) may be flushed to zero
outputs or be treated as zero inputs in some floating point outputs or be treated as zero inputs in some floating point
implementations. A protocol's deterministic encoding may want to implementations. A protocol's deterministic encoding may want to
exclude them from interchange, interchanging zero instead. exclude them from interchange, interchanging zero instead.
o If a protocol includes a field that can express integers with an * If a protocol includes a field that can express integers with an
absolute value of 2^64 or larger using tag numbers 2 or 3 absolute value of 2^64 or larger using tag numbers 2 or 3
(Section 3.4.3), the protocol's deterministic encoding needs to (Section 3.4.3), the protocol's deterministic encoding needs to
specify whether small integers are expressed using the tag or specify whether small integers are expressed using the tag or
major types 0 and 1. major types 0 and 1.
o A protocol might give encoders the choice of representing a URL as * A protocol might give encoders the choice of representing a URL as
either a text string or, using Section 3.4.5.3, tag number 32 either a text string or, using Section 3.4.5.3, tag number 32
containing a text string. This protocol's deterministic encoding containing a text string. This protocol's deterministic encoding
needs to either require that the tag is present or require that needs to either require that the tag is present or require that
it's absent, not allow either one. it's absent, not allow either one.
4.2.3. Length-first map key ordering 4.2.3. Length-first map key ordering
The core deterministic encoding requirements sort map keys in a The core deterministic encoding requirements sort map keys in a
different order from the one suggested by Section 3.9 of [RFC7049] different order from the one suggested by Section 3.9 of [RFC7049]
(called "Canonical CBOR" there). Protocols that need to be (called "Canonical CBOR" there). Protocols that need to be
skipping to change at page 34, line 42 skipping to change at page 34, line 42
needs to have an API that reports an error (and does not return data) needs to have an API that reports an error (and does not return data)
for a CBOR data item that contains any of the validity errors listed for a CBOR data item that contains any of the validity errors listed
in the previous subsection. in the previous subsection.
The set of tags defined in the tag registry (Section 9.2), as well as The set of tags defined in the tag registry (Section 9.2), as well as
the set of simple values defined in the simple values registry the set of simple values defined in the simple values registry
(Section 9.1), can grow at any time beyond the set understood by a (Section 9.1), can grow at any time beyond the set understood by a
generic decoder. A validity-checking decoder can do one of two generic decoder. A validity-checking decoder can do one of two
things when it encounters such a case that it does not recognize: things when it encounters such a case that it does not recognize:
o It can report an error (and not return data). Note that this * It can report an error (and not return data). Note that this
error is not a validity error per se. This kind of error is more error is not a validity error per se. This kind of error is more
likely to be raised by a decoder that would be performing validity likely to be raised by a decoder that would be performing validity
checking if this were a known case. checking if this were a known case.
o It can emit the unknown item (type, value, and, for tags, the * It can emit the unknown item (type, value, and, for tags, the
decoded tagged data item) to the application calling the decoder, decoded tagged data item) to the application calling the decoder,
with an indication that the decoder did not recognize that tag with an indication that the decoder did not recognize that tag
number or simple value. number or simple value.
The latter approach, which is also appropriate for decoders that do The latter approach, which is also appropriate for decoders that do
not support validity checking, provides forward compatibility with not support validity checking, provides forward compatibility with
newly registered tags and simple values without the requirement to newly registered tags and simple values without the requirement to
update the encoder at the same time as the calling application. (For update the encoder at the same time as the calling application. (For
this, the API for the decoder needs to have a way to mark unknown this, the API for the decoder needs to have a way to mark unknown
items so that the calling application can handle them in a manner items so that the calling application can handle them in a manner
skipping to change at page 38, line 37 skipping to change at page 38, line 37
bytes, not characters. bytes, not characters.
6.1. Converting from CBOR to JSON 6.1. Converting from CBOR to JSON
Most of the types in CBOR have direct analogs in JSON. However, some Most of the types in CBOR have direct analogs in JSON. However, some
do not, and someone implementing a CBOR-to-JSON converter has to do not, and someone implementing a CBOR-to-JSON converter has to
consider what to do in those cases. The following non-normative consider what to do in those cases. The following non-normative
advice deals with these by converting them to a single substitute advice deals with these by converting them to a single substitute
value, such as a JSON null. value, such as a JSON null.
o An integer (major type 0 or 1) becomes a JSON number. * An integer (major type 0 or 1) becomes a JSON number.
o A byte string (major type 2) that is not embedded in a tag that * A byte string (major type 2) that is not embedded in a tag that
specifies a proposed encoding is encoded in base64url without specifies a proposed encoding is encoded in base64url without
padding and becomes a JSON string. padding and becomes a JSON string.
o A UTF-8 string (major type 3) becomes a JSON string. Note that * A UTF-8 string (major type 3) becomes a JSON string. Note that
JSON requires escaping certain characters ([RFC8259], Section 7): JSON requires escaping certain characters ([RFC8259], Section 7):
quotation mark (U+0022), reverse solidus (U+005C), and the "C0 quotation mark (U+0022), reverse solidus (U+005C), and the "C0
control characters" (U+0000 through U+001F). All other characters control characters" (U+0000 through U+001F). All other characters
are copied unchanged into the JSON UTF-8 string. are copied unchanged into the JSON UTF-8 string.
o An array (major type 4) becomes a JSON array. * An array (major type 4) becomes a JSON array.
o A map (major type 5) becomes a JSON object. This is possible * A map (major type 5) becomes a JSON object. This is possible
directly only if all keys are UTF-8 strings. A converter might directly only if all keys are UTF-8 strings. A converter might
also convert other keys into UTF-8 strings (such as by converting also convert other keys into UTF-8 strings (such as by converting
integers into strings containing their decimal representation); integers into strings containing their decimal representation);
however, doing so introduces a danger of key collision. Note also however, doing so introduces a danger of key collision. Note also
that, if tags on UTF-8 strings are ignored as proposed below, this that, if tags on UTF-8 strings are ignored as proposed below, this
will cause a key collision if the tags are different but the will cause a key collision if the tags are different but the
strings are the same. strings are the same.
o False (major type 7, additional information 20) becomes a JSON * False (major type 7, additional information 20) becomes a JSON
false. false.
o True (major type 7, additional information 21) becomes a JSON * True (major type 7, additional information 21) becomes a JSON
true. true.
o Null (major type 7, additional information 22) becomes a JSON * Null (major type 7, additional information 22) becomes a JSON
null. null.
o A floating-point value (major type 7, additional information 25 * A floating-point value (major type 7, additional information 25
through 27) becomes a JSON number if it is finite (that is, it can through 27) becomes a JSON number if it is finite (that is, it can
be represented in a JSON number); if the value is non-finite (NaN, be represented in a JSON number); if the value is non-finite (NaN,
or positive or negative Infinity), it is represented by the or positive or negative Infinity), it is represented by the
substitute value. substitute value.
o Any other simple value (major type 7, any additional information * Any other simple value (major type 7, any additional information
value not yet discussed) is represented by the substitute value. value not yet discussed) is represented by the substitute value.
o A bignum (major type 6, tag number 2 or 3) is represented by * A bignum (major type 6, tag number 2 or 3) is represented by
encoding its byte string in base64url without padding and becomes encoding its byte string in base64url without padding and becomes
a JSON string. For tag number 3 (negative bignum), a "~" (ASCII a JSON string. For tag number 3 (negative bignum), a "~" (ASCII
tilde) is inserted before the base-encoded value. (The conversion tilde) is inserted before the base-encoded value. (The conversion
to a binary blob instead of a number is to prevent a likely to a binary blob instead of a number is to prevent a likely
numeric overflow for the JSON decoder.) numeric overflow for the JSON decoder.)
o A byte string with an encoding hint (major type 6, tag number 21 * A byte string with an encoding hint (major type 6, tag number 21
through 23) is encoded as described and becomes a JSON string. through 23) is encoded as described and becomes a JSON string.
o For all other tags (major type 6, any other tag number), the * For all other tags (major type 6, any other tag number), the
enclosed CBOR item is represented as a JSON value; the tag number enclosed CBOR item is represented as a JSON value; the tag number
is ignored. is ignored.
o Indefinite-length items are made definite before conversion. * Indefinite-length items are made definite before conversion.
6.2. Converting from JSON to CBOR 6.2. Converting from JSON to CBOR
All JSON values, once decoded, directly map into one or more CBOR All JSON values, once decoded, directly map into one or more CBOR
values. As with any kind of CBOR generation, decisions have to be values. As with any kind of CBOR generation, decisions have to be
made with respect to number representation. In a suggested made with respect to number representation. In a suggested
conversion: conversion:
o JSON numbers without fractional parts (integer numbers) are * JSON numbers without fractional parts (integer numbers) are
represented as integers (major types 0 and 1, possibly major type represented as integers (major types 0 and 1, possibly major type
6 tag number 2 and 3), choosing the shortest form; integers longer 6 tag number 2 and 3), choosing the shortest form; integers longer
than an implementation-defined threshold may instead be than an implementation-defined threshold may instead be
represented as floating-point values. The default range that is represented as floating-point values. The default range that is
represented as integer is -2**53+1..2**53-1 (fully exploiting the represented as integer is -2**53+1..2**53-1 (fully exploiting the
range for exact integers in the binary64 representation often used range for exact integers in the binary64 representation often used
for decoding JSON [RFC7493]). A CBOR-based protocol, or a generic for decoding JSON [RFC7493]). A CBOR-based protocol, or a generic
converter implementation, may choose -2**32..2**32-1 or converter implementation, may choose -2**32..2**32-1 or
-2**64..2**64-1 (fully using the integer ranges available in CBOR -2**64..2**64-1 (fully using the integer ranges available in CBOR
with uint32_t or uint64_t, respectively) or even -2**31..2**31-1 with uint32_t or uint64_t, respectively) or even -2**31..2**31-1
or -2**63..2**63-1 (using popular ranges for two's complement or -2**63..2**63-1 (using popular ranges for two's complement
signed integers). (If the JSON was generated from a JavaScript signed integers). (If the JSON was generated from a JavaScript
implementation, its precision is already limited to 53 bits implementation, its precision is already limited to 53 bits
maximum.) maximum.)
o Numbers with fractional parts are represented as floating-point * Numbers with fractional parts are represented as floating-point
values, performing the decimal-to-binary conversion based on the values, performing the decimal-to-binary conversion based on the
precision provided by IEEE 754 binary64. Then, when encoding in precision provided by IEEE 754 binary64. Then, when encoding in
CBOR, the preferred serialization uses the shortest floating-point CBOR, the preferred serialization uses the shortest floating-point
representation exactly representing this conversion result; for representation exactly representing this conversion result; for
instance, 1.5 is represented in a 16-bit floating-point value (not instance, 1.5 is represented in a 16-bit floating-point value (not
all implementations will be capable of efficiently finding the all implementations will be capable of efficiently finding the
minimum form, though). Instead of using the default binary64 minimum form, though). Instead of using the default binary64
precision, there may be an implementation-defined limit to the precision, there may be an implementation-defined limit to the
precision of the conversion that will affect the precision of the precision of the conversion that will affect the precision of the
represented values. Decimal representation should only be used on represented values. Decimal representation should only be used on
skipping to change at page 41, line 43 skipping to change at page 41, line 43
protocol is designed to tolerate and embrace implementations that protocol is designed to tolerate and embrace implementations that
start using more codepoints than initially allocated. start using more codepoints than initially allocated.
Sizing the codepoint space may be difficult because the range Sizing the codepoint space may be difficult because the range
required may be hard to predict. An attempt should be made to make required may be hard to predict. An attempt should be made to make
the codepoint space large enough so that it can slowly be filled over the codepoint space large enough so that it can slowly be filled over
the intended lifetime of the protocol. the intended lifetime of the protocol.
CBOR has three major extension points: CBOR has three major extension points:
o the "simple" space (values in major type 7). Of the 24 efficient * the "simple" space (values in major type 7). Of the 24 efficient
(and 224 slightly less efficient) values, only a small number have (and 224 slightly less efficient) values, only a small number have
been allocated. Implementations receiving an unknown simple data been allocated. Implementations receiving an unknown simple data
item may be able to process it as such, given that the structure item may be able to process it as such, given that the structure
of the value is indeed simple. The IANA registry in Section 9.1 of the value is indeed simple. The IANA registry in Section 9.1
is the appropriate way to address the extensibility of this is the appropriate way to address the extensibility of this
codepoint space. codepoint space.
o the "tag" space (values in major type 6). Again, only a small * the "tag" space (values in major type 6). Again, only a small
part of the codepoint space has been allocated, and the space is part of the codepoint space has been allocated, and the space is
abundant (although the early numbers are more efficient than the abundant (although the early numbers are more efficient than the
later ones). Implementations receiving an unknown tag number can later ones). Implementations receiving an unknown tag number can
choose to simply ignore it or to process it as an unknown tag choose to simply ignore it or to process it as an unknown tag
number wrapping the enclosed data item. The IANA registry in number wrapping the enclosed data item. The IANA registry in
Section 9.2 is the appropriate way to address the extensibility of Section 9.2 is the appropriate way to address the extensibility of
this codepoint space. this codepoint space.
o the "additional information" space. An implementation receiving * the "additional information" space. An implementation receiving
an unknown additional information value has no way to continue an unknown additional information value has no way to continue
decoding, so allocating codepoints to this space is a major step. decoding, so allocating codepoints to this space is a major step.
There are also very few codepoints left. There are also very few codepoints left.
7.2. Curating the Additional Information Space 7.2. Curating the Additional Information Space
The human mind is sometimes drawn to filling in little perceived gaps The human mind is sometimes drawn to filling in little perceived gaps
to make something neat. We expect the remaining gaps in the to make something neat. We expect the remaining gaps in the
codepoint space for the additional information values to be an codepoint space for the additional information values to be an
attractor for new ideas, just because they are there. attractor for new ideas, just because they are there.
skipping to change at page 45, line 7 skipping to change at page 45, line 7
Tags" registry at [IANA.cbor-tags]. The tags that were defined in Tags" registry at [IANA.cbor-tags]. The tags that were defined in
[RFC7049] are described in detail in Section 3.4, but other tags have [RFC7049] are described in detail in Section 3.4, but other tags have
already been defined. already been defined.
New entries in the range 0 to 23 are assigned by Standards Action. New entries in the range 0 to 23 are assigned by Standards Action.
New entries in the range 24 to 255 are assigned by Specification New entries in the range 24 to 255 are assigned by Specification
Required. New entries in the range 256 to 18446744073709551615 are Required. New entries in the range 256 to 18446744073709551615 are
assigned by First Come First Served. The template for registration assigned by First Come First Served. The template for registration
requests is: requests is:
o Data item * Data item
o Semantics (short form) * Semantics (short form)
In addition, First Come First Served requests should include: In addition, First Come First Served requests should include:
o Point of contact * Point of contact
o Description of semantics (URL) - This description is optional; the * Description of semantics (URL) - This description is optional; the
URL can point to something like an Internet-Draft or a web page. URL can point to something like an Internet-Draft or a web page.
9.3. Media Type ("MIME Type") 9.3. Media Type ("MIME Type")
The Internet media type [RFC6838] for a single encoded CBOR data item The Internet media type [RFC6838] for a single encoded CBOR data item
is application/cbor. is application/cbor.
Type name: application Type name: application
Subtype name: cbor Subtype name: cbor
skipping to change at page 60, line 17 skipping to change at page 60, line 17
| 0xff | "break" stop code | | 0xff | "break" stop code |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
Table 6: Jump Table for Initial Byte Table 6: Jump Table for Initial Byte
Appendix C. Pseudocode Appendix C. Pseudocode
The well-formedness of a CBOR item can be checked by the pseudocode The well-formedness of a CBOR item can be checked by the pseudocode
in Figure 1. The data is well-formed if and only if: in Figure 1. The data is well-formed if and only if:
o the pseudocode does not "fail"; * the pseudocode does not "fail";
o after execution of the pseudocode, no bytes are left in the input * after execution of the pseudocode, no bytes are left in the input
(except in streaming applications) (except in streaming applications)
The pseudocode has the following prerequisites: The pseudocode has the following prerequisites:
o take(n) reads n bytes from the input data and returns them as a * take(n) reads n bytes from the input data and returns them as a
byte string. If n bytes are no longer available, take(n) fails. byte string. If n bytes are no longer available, take(n) fails.
o uint() converts a byte string into an unsigned integer by * uint() converts a byte string into an unsigned integer by
interpreting the byte string in network byte order. interpreting the byte string in network byte order.
o Arithmetic works as in C. * Arithmetic works as in C.
o All variables are unsigned integers of sufficient range. * All variables are unsigned integers of sufficient range.
Note that "well_formed" returns the major type for well-formed Note that "well_formed" returns the major type for well-formed
definite length items, but 0 for an indefinite length item (or -1 for definite length items, but 0 for an indefinite length item (or -1 for
a break stop code, only if "breakable" is set). This is used in a break stop code, only if "breakable" is set). This is used in
"well_formed_indefinite" to ascertain that indefinite length strings "well_formed_indefinite" to ascertain that indefinite length strings
only contain definite length strings as chunks. only contain definite length strings as chunks.
well_formed (breakable = false) { well_formed (breakable = false) {
// process initial bytes // process initial bytes
ib = uint(take(1)); ib = uint(take(1));
skipping to change at page 66, line 32 skipping to change at page 66, line 32
+-------------+--------------------------+--------------------------+ +-------------+--------------------------+--------------------------+
Table 7: Examples for Different Levels of Conciseness Table 7: Examples for Different Levels of Conciseness
Appendix F. Changes from RFC 7049 Appendix F. Changes from RFC 7049
The following is a list of known changes from RFC 7049. This list is The following is a list of known changes from RFC 7049. This list is
non-authoritative. It is meant to help reviewers see the significant non-authoritative. It is meant to help reviewers see the significant
differences. differences.
o Updated reference for [RFC4627] to [RFC8259] in many places * Updated reference for [RFC4627] to [RFC8259] in many places
o Updated reference for [CNN-TERMS] to [RFC7228] * Updated reference for [CNN-TERMS] to [RFC7228]
o Added a comment to the last example in Section 2.2.1 (added * Added a comment to the last example in Section 2.2.1 (added
"Second value") "Second value")
o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") * Fixed a bug in the example in Section 2.4.2 ("29" -> "49")
o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> * Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" ->
"0b000_11001") "0b000_11001")
Appendix G. Well-formedness errors and examples Appendix G. Well-formedness errors and examples
There are three basic kinds of well-formedness errors that can occur There are three basic kinds of well-formedness errors that can occur
in decoding a CBOR data item: in decoding a CBOR data item:
o Too much data: There are input bytes left that were not consumed. * Too much data: There are input bytes left that were not consumed.
This is only an error if the application assumed that the input This is only an error if the application assumed that the input
bytes would span exactly one data item. Where the application bytes would span exactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in additional data after the data item, as is for example done in
CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can
simply indicate what part of the input has not been consumed. simply indicate what part of the input has not been consumed.
o Too little data: The input data available would need additional * Too little data: The input data available would need additional
bytes added at their end for a complete CBOR data item. This may bytes added at their end for a complete CBOR data item. This may
indicate the input is truncated; it is also a common error when indicate the input is truncated; it is also a common error when
trying to decode random data as CBOR. For some applications trying to decode random data as CBOR. For some applications
however, this may not be actually be an error, as the application however, this may not be actually be an error, as the application
may not be certain it has all the data yet and can obtain or wait may not be certain it has all the data yet and can obtain or wait
for additional input bytes. Some of these applications may have for additional input bytes. Some of these applications may have
an upper limit for how much additional data can show up; here the an upper limit for how much additional data can show up; here the
decoder may be able to indicate that the encoded CBOR data item decoder may be able to indicate that the encoded CBOR data item
cannot be completed within this limit. cannot be completed within this limit.
o Syntax error: The input data are not consistent with the * Syntax error: The input data are not consistent with the
requirements of the CBOR encoding, and this cannot be remedied by requirements of the CBOR encoding, and this cannot be remedied by
adding (or removing) data at the end. adding (or removing) data at the end.
In Appendix C, errors of the first kind are addressed in the first In Appendix C, errors of the first kind are addressed in the first
paragraph/bullet list (requiring "no bytes are left"), and errors of paragraph/bullet list (requiring "no bytes are left"), and errors of
the second kind are addressed in the second paragraph/bullet list the second kind are addressed in the second paragraph/bullet list
(failing "if n bytes are no longer available"). Errors of the third (failing "if n bytes are no longer available"). Errors of the third
kind are identified in the pseudocode by specific instances of kind are identified in the pseudocode by specific instances of
calling fail(), in order: calling fail(), in order:
o a reserved value is used for additional information (28, 29, 30) * a reserved value is used for additional information (28, 29, 30)
o major type 7, additional information 24, value < 32 (incorrect or * major type 7, additional information 24, value < 32 (incorrect or
incorrectly encoded simple type) incorrectly encoded simple type)
o incorrect substructure of indefinite length byte/text string (may * incorrect substructure of indefinite length byte/text string (may
only contain definite length strings of the same major type) only contain definite length strings of the same major type)
o break stop code (mt=7, ai=31) occurs in a value position of a map * break stop code (mt=7, ai=31) occurs in a value position of a map
or except at a position directly in an indefinite length item or except at a position directly in an indefinite length item
where also another enclosed data item could occur where also another enclosed data item could occur
o additional information 31 used with major type 0, 1, or 6 * additional information 31 used with major type 0, 1, or 6
G.1. Examples for CBOR data items that are not well-formed G.1. Examples for CBOR data items that are not well-formed
This subsection shows a few examples for CBOR data items that are not This subsection shows a few examples for CBOR data items that are not
well-formed. Each example is a sequence of bytes each shown in well-formed. Each example is a sequence of bytes each shown in
hexadecimal; multiple examples in a list are separated by commas. hexadecimal; multiple examples in a list are separated by commas.
Examples for well-formedness error kind 1 (too much data) can easily Examples for well-formedness error kind 1 (too much data) can easily
be formed by adding data to a well-formed encoded CBOR data item. be formed by adding data to a well-formed encoded CBOR data item.
skipping to change at page 68, line 16 skipping to change at page 68, line 16
data) can be formed by truncating a well-formed encoded CBOR data data) can be formed by truncating a well-formed encoded CBOR data
item. In test suites, it may be beneficial to specifically test with item. In test suites, it may be beneficial to specifically test with
incomplete data items that would require large amounts of addition to incomplete data items that would require large amounts of addition to
be completed (for instance by starting the encoding of a string of a be completed (for instance by starting the encoding of a string of a
very large size). very large size).
A premature end of the input can occur in a head or within the A premature end of the input can occur in a head or within the
enclosed data, which may be bare strings or enclosed data items that enclosed data, which may be bare strings or enclosed data items that
are either counted or should have been ended by a break stop code. are either counted or should have been ended by a break stop code.
o End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 * End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa
00 00, fb 00 00 00 00 00, fb 00 00 00
o Definite length strings with short data: 41, 61, 5a ff ff ff ff * Definite length strings with short data: 41, 61, 5a ff ff ff ff
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
ff ff ff ff ff ff ff 01 02 03 ff ff ff ff ff ff ff 01 02 03
o Definite length maps and arrays not closed with enough items: 81, * Definite length maps and arrays not closed with enough items: 81,
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
00 00
o Indefinite length strings not closed by a break stop code: 5f 41 * Indefinite length strings not closed by a break stop code: 5f 41
00, 7f 61 00 00, 7f 61 00
o Indefinite length maps and arrays not closed by a break stop code: * Indefinite length maps and arrays not closed by a break stop code:
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
A few examples for the five subkinds of well-formedness error kind 3 A few examples for the five subkinds of well-formedness error kind 3
(syntax error) are shown below. (syntax error) are shown below.
Subkind 1: Subkind 1:
o Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
fd, fe, fd, fe,
Subkind 2: Subkind 2:
o Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, * Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18,
f8 1f f8 1f
Subkind 3: Subkind 3:
o Indefinite length string chunks not of the correct type: 5f 00 ff, * Indefinite length string chunks not of the correct type: 5f 00 ff,
5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff,
7f 41 00 ff 7f 41 00 ff
o Indefinite length string chunks not definite length: 5f 5f 41 00 * Indefinite length string chunks not definite length: 5f 5f 41 00
ff ff, 7f 7f 61 00 ff ff ff ff, 7f 7f 61 00 ff ff
Subkind 4: Subkind 4:
o Break occurring on its own outside of an indefinite length item: * Break occurring on its own outside of an indefinite length item:
ff ff
o Break occurring in a definite length array or map or a tag: 81 ff, * Break occurring in a definite length array or map or a tag: 81 ff,
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82
9f 81 9f 9f ff ff ff ff 9f 81 9f 9f ff ff ff ff
o Break in indefinite length map would lead to odd number of items * Break in indefinite length map would lead to odd number of items
(break in a value position): bf 00 ff, bf 00 00 00 ff (break in a value position): bf 00 ff, bf 00 00 00 ff
Subkind 5: Subkind 5:
o Major type 0, 1, 6 with additional information 31: 1f, 3f, df * Major type 0, 1, 6 with additional information 31: 1f, 3f, df
Acknowledgements Acknowledgements
CBOR was inspired by MessagePack. MessagePack was developed and CBOR was inspired by MessagePack. MessagePack was developed and
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to promoted by Sadayuki Furuhashi ("frsyuki"). This reference to
MessagePack is solely for attribution; CBOR is not intended as a MessagePack is solely for attribution; CBOR is not intended as a
version of or replacement for MessagePack, as it has different design version of or replacement for MessagePack, as it has different design
goals and requirements. goals and requirements.
The need for functionality beyond the original MessagePack The need for functionality beyond the original MessagePack
 End of changes. 97 change blocks. 
98 lines changed or deleted 98 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/