draft-ietf-cbor-7049bis-06.txt | draft-ietf-cbor-7049bis-07.txt | |||
---|---|---|---|---|
Network Working Group C. Bormann | Network Working Group C. Bormann | |||
Internet-Draft Universitaet Bremen TZI | Internet-Draft Universitaet Bremen TZI | |||
Intended status: Standards Track P. Hoffman | Intended status: Standards Track P. Hoffman | |||
Expires: January 3, 2020 ICANN | Expires: February 26, 2020 ICANN | |||
July 02, 2019 | August 25, 2019 | |||
Concise Binary Object Representation (CBOR) | Concise Binary Object Representation (CBOR) | |||
draft-ietf-cbor-7049bis-06 | draft-ietf-cbor-7049bis-07 | |||
Abstract | Abstract | |||
The Concise Binary Object Representation (CBOR) is a data format | The Concise Binary Object Representation (CBOR) is a data format | |||
whose design goals include the possibility of extremely small code | whose design goals include the possibility of extremely small code | |||
size, fairly small message size, and extensibility without the need | size, fairly small message size, and extensibility without the need | |||
for version negotiation. These design goals make it different from | for version negotiation. These design goals make it different from | |||
earlier binary serializations such as ASN.1 and MessagePack. | earlier binary serializations such as ASN.1 and MessagePack. | |||
This document obsoletes RFC 7049. | This document is a revised edition of RFC 7049, with editorial | |||
improvements, added detail, and fixed errata. This revision formally | ||||
obsoletes RFC 7049, while keeping full compatibility of the | ||||
interchange format from RFC 7049. It does not create a new version | ||||
of the format. | ||||
Contributing | Contributing | |||
This document is being worked on in the CBOR Working Group. Please | This document is being worked on in the CBOR Working Group. Please | |||
contribute on the mailing list there, or in the GitHub repository for | contribute on the mailing list there, or in the GitHub repository for | |||
this draft: https://github.com/cbor-wg/CBORbis | this draft: https://github.com/cbor-wg/CBORbis | |||
The charter for the CBOR Working Group says that the WG will update | The charter for the CBOR Working Group says that the WG will update | |||
RFC 7049 to fix verified errata. Security issues and clarifications | RFC 7049 to fix verified errata. Security issues and clarifications | |||
may be addressed, but changes to this document will ensure backward | may be addressed, but changes to this document will ensure backward | |||
skipping to change at page 1, line 49 ¶ | skipping to change at page 2, line 7 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 3, 2020. | This Internet-Draft will expire on February 26, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | |||
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 8 | 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | |||
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 | |||
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 12 | 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 | |||
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 12 | 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 | |||
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 12 | 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 13 | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 14 | 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 15 | |||
3.3. Floating-Point Numbers and Values with No Content . . . . 15 | 3.3. Floating-Point Numbers and Values with No Content . . . . 16 | |||
3.4. Optional Tagging of Items . . . . . . . . . . . . . . . . 17 | 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 17 | |||
3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 19 | 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 19 | |||
3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 19 | 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 19 | |||
3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 19 | 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 20 | |||
3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 20 | 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 21 | 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 21 | |||
3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 22 | 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 23 | |||
3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 22 | 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 23 | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON | |||
Converters . . . . . . . . . . . . . . . . . . . 23 | Converters . . . . . . . . . . . . . . . . . . . 23 | |||
3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 23 | 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 24 | |||
3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 24 | 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 25 | |||
4. Serialization Considerations . . . . . . . . . . . . . . . . 25 | 4. Serialization Considerations . . . . . . . . . . . . . . . . 25 | |||
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 25 | 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 25 | |||
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 26 | 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 26 | |||
4.2.1. Core Deterministic Encoding Requirements . . . . . . 26 | 4.2.1. Core Deterministic Encoding Requirements . . . . . . 26 | |||
4.2.2. Additional Deterministic Encoding Considerations . . 27 | 4.2.2. Additional Deterministic Encoding Considerations . . 27 | |||
4.2.3. Length-first map key ordering . . . . . . . . . . . . 28 | 4.2.3. Length-first map key ordering . . . . . . . . . . . . 28 | |||
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 29 | 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 29 | |||
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 30 | 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 30 | |||
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 30 | 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 31 | |||
5.3. Invalid Items . . . . . . . . . . . . . . . . . . . . . . 31 | 5.3. Invalid Items . . . . . . . . . . . . . . . . . . . . . . 31 | |||
5.4. Handling Unknown Simple Values and Tags . . . . . . . . . 32 | 5.4. Handling Unknown Simple Values and Tags . . . . . . . . . 32 | |||
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 33 | 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 33 | |||
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 34 | 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 34 | |||
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 35 | 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 35 | |||
5.8. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 35 | 5.8. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 35 | |||
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 36 | 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 37 | |||
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 36 | 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 37 | |||
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 38 | 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 38 | |||
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 39 | 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 39 | |||
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 39 | 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 40 | |||
7.2. Curating the Additional Information Space . . . . . . . . 40 | 7.2. Curating the Additional Information Space . . . . . . . . 40 | |||
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 40 | 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 41 | |||
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 41 | 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 42 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 | |||
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 42 | 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 43 | |||
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 42 | 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 43 | |||
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 43 | 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 43 | |||
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 44 | 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 44 | |||
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 44 | 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 45 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 47 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 47 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 48 | 11.2. Informative References . . . . . . . . . . . . . . . . . 48 | |||
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 50 | Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 50 | |||
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 54 | Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 54 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 57 | Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 57 | |||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 59 | Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 59 | |||
Appendix E. Comparison of Other Binary Formats to CBOR's Design | Appendix E. Comparison of Other Binary Formats to CBOR's Design | |||
Objectives . . . . . . . . . . . . . . . . . . . . . 60 | Objectives . . . . . . . . . . . . . . . . . . . . . 60 | |||
skipping to change at page 4, line 20 ¶ | skipping to change at page 4, line 26 ¶ | |||
to note that this is not a proposal that the grammar in RFC 8259 be | to note that this is not a proposal that the grammar in RFC 8259 be | |||
extended in general, since doing so would cause a significant | extended in general, since doing so would cause a significant | |||
backwards incompatibility with already deployed JSON documents. | backwards incompatibility with already deployed JSON documents. | |||
Instead, this document simply defines its own data model that starts | Instead, this document simply defines its own data model that starts | |||
from JSON. | from JSON. | |||
Appendix E lists some existing binary formats and discusses how well | Appendix E lists some existing binary formats and discusses how well | |||
they do or do not fit the design objectives of the Concise Binary | they do or do not fit the design objectives of the Concise Binary | |||
Object Representation (CBOR). | Object Representation (CBOR). | |||
This document obsoletes [RFC7049]. | This document is a revised edition of [RFC7049], with editorial | |||
improvements, added detail, and fixed errata. This revision formally | ||||
obsoletes RFC 7049, while keeping full compatibility of the | ||||
interchange format from RFC 7049. It does not create a new version | ||||
of the format. | ||||
1.1. Objectives | 1.1. Objectives | |||
The objectives of CBOR, roughly in decreasing order of importance, | The objectives of CBOR, roughly in decreasing order of importance, | |||
are: | are: | |||
1. The representation must be able to unambiguously encode most | 1. The representation must be able to unambiguously encode most | |||
common data formats used in Internet standards. | common data formats used in Internet standards. | |||
* It must represent a reasonable set of basic data types and | * It must represent a reasonable set of basic data types and | |||
skipping to change at page 7, line 35 ¶ | skipping to change at page 7, line 49 ¶ | |||
many applications will provide their own application-specific | many applications will provide their own application-specific | |||
encoders and/or decoders. | encoders and/or decoders. | |||
In the basic (un-extended) generic data model, a data item is one of: | In the basic (un-extended) generic data model, a data item is one of: | |||
o an integer in the range -2**64..2**64-1 inclusive | o an integer in the range -2**64..2**64-1 inclusive | |||
o a simple value, identified by a number between 0 and 255, but | o a simple value, identified by a number between 0 and 255, but | |||
distinct from that number | distinct from that number | |||
o a floating point value, distinct from an integer, out of the set | o a floating-point value, distinct from an integer, out of the set | |||
representable by IEEE 754 binary64 (including non-finites) | representable by IEEE 754 binary64 (including non-finites) | |||
[IEEE754] | [IEEE754] | |||
o a sequence of zero or more bytes ("byte string") | o a sequence of zero or more bytes ("byte string") | |||
o a sequence of zero or more Unicode code points ("text string") | o a sequence of zero or more Unicode code points ("text string") | |||
o a sequence of zero or more data items ("array") | o a sequence of zero or more data items ("array") | |||
o a mapping (mathematical function) from zero or more data items | o a mapping (mathematical function) from zero or more data items | |||
("keys") each to a data item ("values"), ("map") | ("keys") each to a data item ("values"), ("map") | |||
o a tagged data item, comprising a tag (an integer in the range | o a tagged data item ("tag"), comprising a tag number (an integer in | |||
0..2**64-1) and a value (a data item) | the range 0..2**64-1) and a tagged value (a data item) | |||
Note that integer and floating-point values are distinct in this | Note that integer and floating-point values are distinct in this | |||
model, even if they have the same numeric value. | model, even if they have the same numeric value. | |||
Also note that serialization variants, such as number of bytes of the | Also note that serialization variants, such as number of bytes of the | |||
encoded floating value, or the choice of one of the ways in which an | encoded floating value, or the choice of one of the ways in which an | |||
integer, the length of a text or byte string, the number of elements | integer, the length of a text or byte string, the number of elements | |||
in an array or pairs in a map, or a tag value, (collectively "the | in an array or pairs in a map, or a tag number, (collectively "the | |||
argument", see Section 3) can be encoded, are not visible at the | argument", see Section 3) can be encoded, are not visible at the | |||
generic data model level. | generic data model level. | |||
2.1. Extended Generic Data Models | 2.1. Extended Generic Data Models | |||
This basic generic data model comes pre-extended by the registration | This basic generic data model comes pre-extended by the registration | |||
of a number of simple values and tags right in this document, such | of a number of simple values and tag numbers right in this document, | |||
as: | such as: | |||
o "false", "true", "null", and "undefined" (simple values identified | o "false", "true", "null", and "undefined" (simple values identified | |||
by 20..23) | by 20..23) | |||
o integer and floating point values with a larger range and | o integer and floating-point values with a larger range and | |||
precision than the above (tags 2 to 5) | precision than the above (tag numbers 2 to 5) | |||
o application data types such as a point in time or an RFC 3339 | o application data types such as a point in time or an RFC 3339 | |||
date/time string (tags 1, 0) | date/time string (tag numbers 1, 0) | |||
Further elements of the extended generic data model can be (and have | Further elements of the extended generic data model can be (and have | |||
been) defined via the IANA registries created for CBOR. Even if such | been) defined via the IANA registries created for CBOR. Even if such | |||
an extension is unknown to a generic encoder or decoder, data items | an extension is unknown to a generic encoder or decoder, data items | |||
using that extension can be passed to or from the application by | using that extension can be passed to or from the application by | |||
representing them at the interface to the application within the | representing them at the interface to the application within the | |||
basic generic data model, i.e., as generic values of a simple type or | basic generic data model, i.e., as generic values of a simple type or | |||
generic tagged items. | generic tags. | |||
In other words, the basic generic data model is stable as defined in | In other words, the basic generic data model is stable as defined in | |||
this document, while the extended generic data model expands by the | this document, while the extended generic data model expands by the | |||
registration of new simple values or tags, but never shrinks. | registration of new simple values or tag numbers, but never shrinks. | |||
While there is a strong expectation that generic encoders and | While there is a strong expectation that generic encoders and | |||
decoders can represent "false", "true", and "null" ("undefined" is | decoders can represent "false", "true", and "null" ("undefined" is | |||
intentionally omitted) in the form appropriate for their programming | intentionally omitted) in the form appropriate for their programming | |||
environment, implementation of the data model extensions created by | environment, implementation of the data model extensions created by | |||
tags is truly optional and a matter of implementation quality. | tags is truly optional and a matter of implementation quality. | |||
2.2. Specific Data Models | 2.2. Specific Data Models | |||
The specific data model for a CBOR-based protocol usually subsets the | The specific data model for a CBOR-based protocol usually subsets the | |||
skipping to change at page 9, line 15 ¶ | skipping to change at page 9, line 27 ¶ | |||
of data items, it is preferred to identify the types by the names | of data items, it is preferred to identify the types by the names | |||
they have in the generic data model ("negative integer", "array") | they have in the generic data model ("negative integer", "array") | |||
instead of by referring to aspects of their CBOR representation | instead of by referring to aspects of their CBOR representation | |||
("major type 1", "major type 4"). | ("major type 1", "major type 4"). | |||
Specific data models can also specify what values (including values | Specific data models can also specify what values (including values | |||
of different types) are equivalent for the purposes of map keys and | of different types) are equivalent for the purposes of map keys and | |||
encoder freedom. For example, in the generic data model, a valid map | encoder freedom. For example, in the generic data model, a valid map | |||
MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode | MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode | |||
"0.0" as an integer (major type 0, Section 3.1). However, if a | "0.0" as an integer (major type 0, Section 3.1). However, if a | |||
specific data model declares that floating point and integer | specific data model declares that floating-point and integer | |||
representations of integral values are equivalent, using both map | representations of integral values are equivalent, using both map | |||
keys "0" and "0.0" in a single map would be considered duplicates and | keys "0" and "0.0" in a single map would be considered duplicates and | |||
so invalid, and an encoder could encode integral-valued floats as | so invalid, and an encoder could encode integral-valued floats as | |||
integers or vice versa, perhaps to save encoded bytes. | integers or vice versa, perhaps to save encoded bytes. | |||
3. Specification of the CBOR Encoding | 3. Specification of the CBOR Encoding | |||
A CBOR data item (Section 2) is encoded to or decoded from a byte | A CBOR data item (Section 2) is encoded to or decoded from a byte | |||
string carrying a well-formed encoded data item as described in this | string carrying a well-formed encoded data item as described in this | |||
section. The encoding is summarized in Table 5. An encoder MUST | section. The encoding is summarized in Table 6. An encoder MUST | |||
produce only well-formed encoded data items. A decoder MUST NOT | produce only well-formed encoded data items. A decoder MUST NOT | |||
return a decoded data item when it encounters input that is not a | return a decoded data item when it encounters input that is not a | |||
well-formed encoded CBOR data item (this does not detract from the | well-formed encoded CBOR data item (this does not detract from the | |||
usefulness of diagnostic and recovery tools that might make available | usefulness of diagnostic and recovery tools that might make available | |||
some information from a damaged encoded CBOR data item). | some information from a damaged encoded CBOR data item). | |||
The initial byte of each encoded data item contains both information | The initial byte of each encoded data item contains both information | |||
about the major type (the high-order 3 bits, described in | about the major type (the high-order 3 bits, described in | |||
Section 3.1) and additional information (the low-order 5 bits). With | Section 3.1) and additional information (the low-order 5 bits). With | |||
a few exceptions, the additional information's value describes how to | a few exceptions, the additional information's value describes how to | |||
load an unsigned integer "argument": | load an unsigned integer "argument": | |||
Less than 24: The argument's value is the value of the additional | Less than 24: The argument's value is the value of the additional | |||
information. | information. | |||
24, 25, 26, or 27: The argument's value is held in the following 1, | 24, 25, 26, or 27: The argument's value is held in the following 1, | |||
2, 4, or 8 bytes, respectively, in network byte order. For major | 2, 4, or 8 bytes, respectively, in network byte order. For major | |||
type 7 and additional information value 25, 26, 27, these bytes | type 7 and additional information value 25, 26, 27, these bytes | |||
are not used as an integer argument, but as a floating point value | are not used as an integer argument, but as a floating-point value | |||
(see Section 3.3). | (see Section 3.3). | |||
28, 29, 30: These values are reserved for future additions to the | 28, 29, 30: These values are reserved for future additions to the | |||
CBOR format. In the present version of CBOR, the encoded item is | CBOR format. In the present version of CBOR, the encoded item is | |||
not well-formed. | not well-formed. | |||
31: No argument value is derived. If the major type is 0, 1, or 6, | 31: No argument value is derived. If the major type is 0, 1, or 6, | |||
the encoded item is not well-formed. For major types 2 to 5, the | the encoded item is not well-formed. For major types 2 to 5, the | |||
item's length is indefinite, and for major type 7, the byte does | item's length is indefinite, and for major type 7, the byte does | |||
not consitute a data item at all but terminates an indefinite | not consitute a data item at all but terminates an indefinite | |||
length item; both are described in Section 3.2. | length item; both are described in Section 3.2. | |||
The initial byte and any additional bytes consumed to construct the | ||||
argument are collectively referred to as the "head" of the data item. | ||||
The meaning of this argument depends on the major type. For example, | The meaning of this argument depends on the major type. For example, | |||
in major type 0, the argument is the value of the data item itself | in major type 0, the argument is the value of the data item itself | |||
(and in major type 1 the value of the data item is computed from the | (and in major type 1 the value of the data item is computed from the | |||
argument); in major type 2 and 3 it gives the length of the string | argument); in major type 2 and 3 it gives the length of the string | |||
data in bytes that follows; and in major types 4 and 5 it is used to | data in bytes that follows; and in major types 4 and 5 it is used to | |||
determine the number of data items enclosed. | determine the number of data items enclosed. | |||
If the encoded sequence of bytes ends before the end of a data item, | If the encoded sequence of bytes ends before the end of a data item, | |||
that item is not well-formed. If the encoded sequence of bytes still | that item is not well-formed. If the encoded sequence of bytes still | |||
has bytes remaining after the outermost encoded item is decoded, that | has bytes remaining after the outermost encoded item is decoded, that | |||
encoding is not a single well-formed CBOR item; depending on the | encoding is not a single well-formed CBOR item; depending on the | |||
application, the decoder may either treat the encoding as not well- | application, the decoder may either treat the encoding as not well- | |||
formed or just identify the start of the remaining bytes to the | formed or just identify the start of the remaining bytes to the | |||
application. | application. | |||
A CBOR decoder implementation can be based on a jump table with all | A CBOR decoder implementation can be based on a jump table with all | |||
256 defined values for the initial byte (Table 5). A decoder in a | 256 defined values for the initial byte (Table 6). A decoder in a | |||
constrained implementation can instead use the structure of the | constrained implementation can instead use the structure of the | |||
initial byte and following bytes for more compact code (see | initial byte and following bytes for more compact code (see | |||
Appendix C for a rough impression of how this could look). | Appendix C for a rough impression of how this could look). | |||
3.1. Major Types | 3.1. Major Types | |||
The following lists the major types and the additional information | The following lists the major types and the additional information | |||
and other bytes associated with the type. | and other bytes associated with the type. | |||
Major type 0: an integer in the range 0..2**64-1 inclusive. The | Major type 0: an integer in the range 0..2**64-1 inclusive. The | |||
skipping to change at page 11, line 40 ¶ | skipping to change at page 12, line 5 ¶ | |||
Major type 5: a map of pairs of data items. Maps are also called | Major type 5: a map of pairs of data items. Maps are also called | |||
tables, dictionaries, hashes, or objects (in JSON). A map is | tables, dictionaries, hashes, or objects (in JSON). A map is | |||
comprised of pairs of data items, each pair consisting of a key | comprised of pairs of data items, each pair consisting of a key | |||
that is immediately followed by a value. The argument is the | that is immediately followed by a value. The argument is the | |||
number of _pairs_ of data items in the map. For example, a map | number of _pairs_ of data items in the map. For example, a map | |||
that contains 9 pairs would have an initial byte of 0b101_01001 | that contains 9 pairs would have an initial byte of 0b101_01001 | |||
(major type of 5, additional information of 9 for the number of | (major type of 5, additional information of 9 for the number of | |||
pairs) followed by the 18 remaining items. The first item is the | pairs) followed by the 18 remaining items. The first item is the | |||
first key, the second item is the first value, the third item is | first key, the second item is the first value, the third item is | |||
the second key, and so on. A map that has duplicate keys may be | the second key, and so on. Because items in a map come in pairs, | |||
their total number is always even: A map that contains an odd | ||||
number of items (no value data present after the last key data | ||||
item) is not well-formed. A map that has duplicate keys may be | ||||
well-formed, but it is not valid, and thus it causes indeterminate | well-formed, but it is not valid, and thus it causes indeterminate | |||
decoding; see also Section 5.6. | decoding; see also Section 5.6. | |||
Major type 6: a tagged data item whose tag is the argument and whose | Major type 6: a tagged data item ("tag") whose tag number is the | |||
value is the single following encoded item. See Section 3.4. | argument and whose enclosed data item is the single encoded data | |||
item that follows the head. See Section 3.4. | ||||
Major type 7: floating-point numbers and simple values, as well as | Major type 7: floating-point numbers and simple values, as well as | |||
the "break" stop code. See Section 3.3. | the "break" stop code. See Section 3.3. | |||
These eight major types lead to a simple table showing which of the | These eight major types lead to a simple table showing which of the | |||
256 possible values for the initial byte of a data item are used | 256 possible values for the initial byte of a data item are used | |||
(Table 5). | (Table 6). | |||
In major types 6 and 7, many of the possible values are reserved for | In major types 6 and 7, many of the possible values are reserved for | |||
future specification. See Section 9 for more information on these | future specification. See Section 9 for more information on these | |||
values. | values. | |||
Table 1 summarizes the major types defined by CBOR, ignoring the next | ||||
section for now. The number N in this table stands for the argument, | ||||
mt for the major type. | ||||
+----+-----------------------+---------------------------------+ | ||||
| mt | Meaning | Content | | ||||
+----+-----------------------+---------------------------------+ | ||||
| 0 | unsigned integer N | - | | ||||
| | | | | ||||
| 1 | negative integer -1-N | - | | ||||
| | | | | ||||
| 2 | byte string | N bytes | | ||||
| | | | | ||||
| 3 | text string | N bytes (UTF-8 text) | | ||||
| | | | | ||||
| 4 | array | N data items (elements) | | ||||
| | | | | ||||
| 5 | map | 2N data items (key/value pairs) | | ||||
| | | | | ||||
| 6 | tag of number N | 1 data item | | ||||
| | | | | ||||
| 7 | simple/float | - | | ||||
+----+-----------------------+---------------------------------+ | ||||
Table 1: Overview over CBOR major types (definite length encoded) | ||||
3.2. Indefinite Lengths for Some Major Types | 3.2. Indefinite Lengths for Some Major Types | |||
Four CBOR items (arrays, maps, byte strings, and text strings) can be | Four CBOR items (arrays, maps, byte strings, and text strings) can be | |||
encoded with an indefinite length using additional information value | encoded with an indefinite length using additional information value | |||
31. This is useful if the encoding of the item needs to begin before | 31. This is useful if the encoding of the item needs to begin before | |||
the number of items inside the array or map, or the total length of | the number of items inside the array or map, or the total length of | |||
the string, is known. (The application of this is often referred to | the string, is known. (The application of this is often referred to | |||
as "streaming" within a data item.) | as "streaming" within a data item.) | |||
Indefinite-length arrays and maps are dealt with differently than | Indefinite-length arrays and maps are dealt with differently than | |||
skipping to change at page 12, line 40 ¶ | skipping to change at page 13, line 32 ¶ | |||
If the "break" stop code appears anywhere where a data item is | If the "break" stop code appears anywhere where a data item is | |||
expected, other than directly inside an indefinite-length string, | expected, other than directly inside an indefinite-length string, | |||
array, or map -- for example directly inside a definite-length array | array, or map -- for example directly inside a definite-length array | |||
or map -- the enclosing item is not well-formed. | or map -- the enclosing item is not well-formed. | |||
3.2.2. Indefinite-Length Arrays and Maps | 3.2.2. Indefinite-Length Arrays and Maps | |||
Indefinite-length arrays and maps are represented using their major | Indefinite-length arrays and maps are represented using their major | |||
type with the additional information value of 31, followed by an | type with the additional information value of 31, followed by an | |||
arbitrary-length sequence of items for an array or key/value pairs | arbitrary-length sequence of zero or more items for an array or key/ | |||
for a map, followed by the "break" stop code (Section 3.2.1). In | value pairs for a map, followed by the "break" stop code | |||
other words, indefinite-length arrays and maps look identical to | (Section 3.2.1). In other words, indefinite-length arrays and maps | |||
other arrays and maps except for beginning with the additional | look identical to other arrays and maps except for beginning with the | |||
information value of 31 and ending with the "break" stop code. | additional information value of 31 and ending with the "break" stop | |||
code. | ||||
If the break stop code appears after a key in a map, in place of that | If the break stop code appears after a key in a map, in place of that | |||
key's value, the map is not well-formed. | key's value, the map is not well-formed. | |||
There is no restriction against nesting indefinite-length array or | There is no restriction against nesting indefinite-length array or | |||
map items. A "break" only terminates a single item, so nested | map items. A "break" only terminates a single item, so nested | |||
indefinite-length items need exactly as many "break" stop codes as | indefinite-length items need exactly as many "break" stop codes as | |||
there are type bytes starting an indefinite-length item. | there are type bytes starting an indefinite-length item. | |||
For example, assume an encoder wants to represent the abstract array | For example, assume an encoder wants to represent the abstract array | |||
skipping to change at page 14, line 44 ¶ | skipping to change at page 15, line 33 ¶ | |||
F5 -- First value, true | F5 -- First value, true | |||
63 -- Second key, UTF-8 string length 3 | 63 -- Second key, UTF-8 string length 3 | |||
416d74 -- "Amt" | 416d74 -- "Amt" | |||
21 -- Second value, -2 | 21 -- Second value, -2 | |||
FF -- "break" | FF -- "break" | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings | 3.2.3. Indefinite-Length Byte Strings and Text Strings | |||
Indefinite-length strings are represented by a byte containing the | Indefinite-length strings are represented by a byte containing the | |||
major type and additional information value of 31, followed by a | major type and additional information value of 31, followed by a | |||
series of byte or text strings ("chunks") that have definite lengths, | series of zero or more byte or text strings ("chunks") that have | |||
followed by the "break" stop code (Section 3.2.1). The data item | definite lengths, followed by the "break" stop code (Section 3.2.1). | |||
represented by the indefinite-length string is the concatenation of | The data item represented by the indefinite-length string is the | |||
the chunks. | concatenation of the chunks (i.e., the empty byte or text string, | |||
respectively, if no chunk is present). | ||||
If any item between the indefinite-length string indicator | If any item between the indefinite-length string indicator | |||
(0b010_11111 or 0b011_11111) and the "break" stop code is not a | (0b010_11111 or 0b011_11111) and the "break" stop code is not a | |||
definite-length string item of the same major type, the string is not | definite-length string item of the same major type, the string is not | |||
well-formed. | well-formed. | |||
If any definite-length text string inside an indefinite-length text | If any definite-length text string inside an indefinite-length text | |||
string is invalid, the indefinite-length text string is invalid. | string is invalid, the indefinite-length text string is invalid. | |||
Note that this implies that the bytes of a single UTF-8 character | Note that this implies that the bytes of a single UTF-8 character | |||
cannot be spread between chunks: a new chunk can only be started at a | cannot be spread between chunks: a new chunk can only be started at a | |||
skipping to change at page 15, line 30 ¶ | skipping to change at page 16, line 19 ¶ | |||
FF -- "break" | FF -- "break" | |||
After decoding, this results in a single byte string with seven | After decoding, this results in a single byte string with seven | |||
bytes: 0xaabbccddeeff99. | bytes: 0xaabbccddeeff99. | |||
3.3. Floating-Point Numbers and Values with No Content | 3.3. Floating-Point Numbers and Values with No Content | |||
Major type 7 is for two types of data: floating-point numbers and | Major type 7 is for two types of data: floating-point numbers and | |||
"simple values" that do not need any content. Each value of the | "simple values" that do not need any content. Each value of the | |||
5-bit additional information in the initial byte has its own separate | 5-bit additional information in the initial byte has its own separate | |||
meaning, as defined in Table 1. Like the major types for integers, | meaning, as defined in Table 2. Like the major types for integers, | |||
items of this major type do not carry content data; all the | items of this major type do not carry content data; all the | |||
information is in the initial bytes. | information is in the initial bytes. | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
| 5-Bit | Semantics | | | 5-Bit | Semantics | | |||
| Value | | | | Value | | | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
| 0..23 | Simple value (value 0..23) | | | 0..23 | Simple value (value 0..23) | | |||
| | | | | | | | |||
| 24 | Simple value (value 32..255 in following byte) | | | 24 | Simple value (value 32..255 in following byte) | | |||
| | | | | | | | |||
| 25 | IEEE 754 Half-Precision Float (16 bits follow) | | | 25 | IEEE 754 Half-Precision Float (16 bits follow) | | |||
| | | | | | | | |||
| 26 | IEEE 754 Single-Precision Float (32 bits follow) | | | 26 | IEEE 754 Single-Precision Float (32 bits follow) | | |||
| | | | | | | | |||
| 27 | IEEE 754 Double-Precision Float (64 bits follow) | | | 27 | IEEE 754 Double-Precision Float (64 bits follow) | | |||
| | | | | | | | |||
| 28-30 | Unassigned, not well-formed in the present document | | | 28-30 | Reserved, not well-formed in the present document | | |||
| | | | | | | | |||
| 31 | "break" stop code for indefinite-length items | | | 31 | "break" stop code for indefinite-length items | | |||
| | (Section 3.2.1) | | | | (Section 3.2.1) | | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
Table 1: Values for Additional Information in Major Type 7 | Table 2: Values for Additional Information in Major Type 7 | |||
As with all other major types, the 5-bit value 24 signifies a single- | As with all other major types, the 5-bit value 24 signifies a single- | |||
byte extension: it is followed by an additional byte to represent the | byte extension: it is followed by an additional byte to represent the | |||
simple value. (To minimize confusion, only the values 32 to 255 are | simple value. (To minimize confusion, only the values 32 to 255 are | |||
used.) This maintains the structure of the initial bytes: as for the | used.) This maintains the structure of the initial bytes: as for the | |||
other major types, the length of these always depends on the | other major types, the length of these always depends on the | |||
additional information in the first byte. Table 2 lists the values | additional information in the first byte. Table 3 lists the values | |||
assigned and available for simple types. | assigned and available for simple types. | |||
+---------+-----------------+ | +---------+-----------------+ | |||
| Value | Semantics | | | Value | Semantics | | |||
+---------+-----------------+ | +---------+-----------------+ | |||
| 0..19 | (Unassigned) | | | 0..19 | (Unassigned) | | |||
| | | | | | | | |||
| 20 | False | | | 20 | False | | |||
| | | | | | | | |||
| 21 | True | | | 21 | True | | |||
| | | | | | | | |||
| 22 | Null | | | 22 | Null | | |||
| | | | | | | | |||
| 23 | Undefined value | | | 23 | Undefined value | | |||
| | | | | | | | |||
| 24..31 | (Reserved) | | | 24..31 | (Reserved) | | |||
| | | | | | | | |||
| 32..255 | (Unassigned) | | | 32..255 | (Unassigned) | | |||
+---------+-----------------+ | +---------+-----------------+ | |||
Table 2: Simple Values | Table 3: Simple Values | |||
An encoder MUST NOT issue two-byte sequences that start with 0xf8 | An encoder MUST NOT issue two-byte sequences that start with 0xf8 | |||
(major type = 7, additional information = 24) and continue with a | (major type = 7, additional information = 24) and continue with a | |||
byte less than 0x20 (32 decimal). Such sequences are not well- | byte less than 0x20 (32 decimal). Such sequences are not well- | |||
formed. (This implies that an encoder cannot encode false, true, | formed. (This implies that an encoder cannot encode false, true, | |||
null, or undefined in two-byte sequences, only the one-byte variants | null, or undefined in two-byte sequences, only the one-byte variants | |||
of these are well-formed.) | of these are well-formed.) | |||
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | |||
IEEE 754 binary floating-point values [IEEE754]. These floating- | IEEE 754 binary floating-point values [IEEE754]. These floating- | |||
point values are encoded in the additional bytes of the appropriate | point values are encoded in the additional bytes of the appropriate | |||
size. (See Appendix D for some information about 16-bit floating | size. (See Appendix D for some information about 16-bit floating | |||
point.) | point.) | |||
3.4. Optional Tagging of Items | 3.4. Tagging of Items | |||
In CBOR, a data item can optionally be preceded by a tag to give it | In CBOR, a data item can be enclosed by a tag to give it additional | |||
additional semantics while retaining its structure. The tag is major | semantics while retaining its structure. The tag is major type 6, | |||
type 6, and represents an unsigned integer as indicated by the tag's | and represents an unsigned integer as indicated by the tag's argument | |||
argument (Section 3); the (sole) data item is carried as content | (Section 3); the (sole) enclosed data item is carried as content | |||
data. If a tag requires structured data, this structure is encoded | data. If a tag requires structured data, this structure is encoded | |||
into the nested data item. The definition of a tag usually restricts | into the nested data item. The definition of a tag number usually | |||
what kinds of nested data item or items are valid for this tag. | restricts what kinds of nested data item or items are valid for tags | |||
using this tag number. | ||||
For example, assume that a byte string of length 12 is marked with a | For example, assume that a byte string of length 12 is marked with a | |||
tag to indicate it is a positive bignum (Section 3.4.4). This would | tag of number 2 to indicate it is a positive bignum (Section 3.4.4). | |||
be marked as 0b110_00010 (major type 6, additional information 2 for | This would be marked as 0b110_00010 (major type 6, additional | |||
the tag) followed by 0b010_01100 (major type 2, additional | information 2 for the tag number) followed by 0b010_01100 (major type | |||
information of 12 for the length) followed by the 12 bytes of the | 2, additional information of 12 for the length) followed by the 12 | |||
bignum. | bytes of the bignum. | |||
Decoders do not need to understand tags, and thus tags may be of | Decoders do not need to understand tags of every tag number, and tags | |||
little value in applications where the implementation creating a | may be of little value in applications where the implementation | |||
particular CBOR data item and the implementation decoding that stream | creating a particular CBOR data item and the implementation decoding | |||
know the semantic meaning of each item in the data flow. Their | that stream know the semantic meaning of each item in the data flow. | |||
primary purpose in this specification is to define common data types | Their primary purpose in this specification is to define common data | |||
such as dates. A secondary purpose is to allow optional tagging when | types such as dates. A secondary purpose is to allow optional | |||
the decoder is a generic CBOR decoder that might be able to benefit | tagging when the decoder is a generic CBOR decoder that might be able | |||
from hints about the content of items. Understanding the semantic | to benefit from hints about the content of items. Understanding the | |||
tags is optional for a decoder; it can just jump over the initial | semantic tags is optional for a decoder; it can just jump over the | |||
bytes of the tag and interpret the tagged data item itself. | initial bytes of the tag and interpret the tagged data item itself. | |||
A tag always applies to the item that directly follows it. Thus, if | A tag applies semantics to the data item it encloses. Thus, if tag A | |||
tag A is followed by tag B, which is followed by data item C, tag A | encloses tag B, which encloses data item C, tag A applies to the | |||
applies to the result of applying tag B on data item C. That is, a | result of applying tag B on data item C. That is, a tagged item is a | |||
tagged item is a data item consisting of a tag and a value. The | data item consisting of a tag number and an enclosed value. The | |||
content of the tagged item is the data item (the value) that is being | content of the tagged item (the enclosed data item) is the data item | |||
tagged. | (the value) that is being tagged. | |||
IANA maintains a registry of tag values as described in Section 9.2. | IANA maintains a registry of tag numbers as described in Section 9.2. | |||
Table 3 provides a list of values that were defined in [RFC7049], | Table 4 provides a list of tag numbers that were defined in | |||
with definitions in the rest of this section. Note that many other | [RFC7049], with definitions in the rest of this section. Note that | |||
tags have been defined since the publication of [RFC7049]; see the | many other tag numbers have been defined since the publication of | |||
registry described at Section 9.2 for the complete list. | [RFC7049]; see the registry described at Section 9.2 for the complete | |||
list. | ||||
+-------+-----------+-----------------------------------------------+ | +----------+----------+---------------------------------------------+ | |||
| Tag | Data Item | Semantics | | | Tag | Data | Semantics | | |||
+-------+-----------+-----------------------------------------------+ | | Number | Item | | | |||
| 0 | UTF-8 | Standard date/time string; see Section 3.4.2 | | +----------+----------+---------------------------------------------+ | |||
| | string | | | | 0 | text | Standard date/time string; see | | |||
| | | | | | | string | Section 3.4.2 | | |||
| 1 | multiple | Epoch-based date/time; see Section 3.4.3 | | | | | | | |||
| | | | | | 1 | multiple | Epoch-based date/time; see Section 3.4.3 | | |||
| 2 | byte | Positive bignum; see Section 3.4.4 | | | | | | | |||
| | string | | | | 2 | byte | Positive bignum; see Section 3.4.4 | | |||
| | | | | | | string | | | |||
| 3 | byte | Negative bignum; see Section 3.4.4 | | | | | | | |||
| | string | | | | 3 | byte | Negative bignum; see Section 3.4.4 | | |||
| | | | | | | string | | | |||
| 4 | array | Decimal fraction; see Section 3.4.5 | | | | | | | |||
| | | | | | 4 | array | Decimal fraction; see Section 3.4.5 | | |||
| 5 | array | Bigfloat; see Section 3.4.5 | | | | | | | |||
| | | | | | 5 | array | Bigfloat; see Section 3.4.5 | | |||
| 21 | multiple | Expected conversion to base64url encoding; | | | | | | | |||
| | | see Section 3.4.6.2 | | | 21 | multiple | Expected conversion to base64url encoding; | | |||
| | | | | | | | see Section 3.4.6.2 | | |||
| 22 | multiple | Expected conversion to base64 encoding; see | | | | | | | |||
| | | Section 3.4.6.2 | | | 22 | multiple | Expected conversion to base64 encoding; see | | |||
| | | | | | | | Section 3.4.6.2 | | |||
| 23 | multiple | Expected conversion to base16 encoding; see | | | | | | | |||
| | | Section 3.4.6.2 | | | 23 | multiple | Expected conversion to base16 encoding; see | | |||
| | | | | | | | Section 3.4.6.2 | | |||
| 24 | byte | Encoded CBOR data item; see Section 3.4.6.1 | | | | | | | |||
| | string | | | | 24 | byte | Encoded CBOR data item; see Section 3.4.6.1 | | |||
| | | | | | | string | | | |||
| 32 | UTF-8 | URI; see Section 3.4.6.3 | | | | | | | |||
| | string | | | | 32 | text | URI; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 33 | UTF-8 | base64url; see Section 3.4.6.3 | | | | | | | |||
| | string | | | | 33 | text | base64url; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 34 | UTF-8 | base64; see Section 3.4.6.3 | | | | | | | |||
| | string | | | | 34 | text | base64; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 35 | UTF-8 | Regular expression; see Section 3.4.6.3 | | | | | | | |||
| | string | | | | 35 | text | Regular expression; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 36 | UTF-8 | MIME message; see Section 3.4.6.3 | | | | | | | |||
| | string | | | | 36 | text | MIME message; see Section 3.4.6.3 | | |||
| | | | | | | string | | | |||
| 55799 | multiple | Self-described CBOR; see Section 3.4.7 | | | | | | | |||
+-------+-----------+-----------------------------------------------+ | | 55799 | multiple | Self-described CBOR; see Section 3.4.7 | | |||
+----------+----------+---------------------------------------------+ | ||||
Table 3: Values for Tags | Table 4: Tag numbers defined in RFC 7049 | |||
3.4.1. Date and Time | 3.4.1. Date and Time | |||
Protocols using tag values 0 and 1 extend the generic data model | Protocols using tag numbers 0 and 1 extend the generic data model | |||
(Section 2) with data items representing points in time. | (Section 2) with data items representing points in time. | |||
3.4.2. Standard Date/Time String | 3.4.2. Standard Date/Time String | |||
Tag value 0 contains a text string in the standard format described | Tag number 0 contains a text string in the standard format described | |||
by the "date-time" production in [RFC3339], as refined by Section 3.3 | by the "date-time" production in [RFC3339], as refined by Section 3.3 | |||
of [RFC4287], representing the point in time described there. A | of [RFC4287], representing the point in time described there. A | |||
nested item of another type or that doesn't match the [RFC4287] | nested item of another type or that doesn't match the [RFC4287] | |||
format is invalid. | format is invalid. | |||
3.4.3. Epoch-based Date/Time | 3.4.3. Epoch-based Date/Time | |||
Tag value 1 contains a numerical value counting the number of seconds | Tag number 1 contains a numerical value counting the number of | |||
from 1970-01-01T00:00Z in UTC time to the represented point in civil | seconds from 1970-01-01T00:00Z in UTC time to the represented point | |||
time. | in civil time. | |||
The tagged item MUST be an unsigned or negative integer (major types | The enclosed item MUST be an unsigned or negative integer (major | |||
0 and 1), or a floating-point number (major type 7 with additional | types 0 and 1), or a floating-point number (major type 7 with | |||
information 25, 26, or 27). Other contained types are invalid. | additional information 25, 26, or 27). Other contained types are | |||
invalid. | ||||
Non-negative values (major type 0 and non-negative floating-point | Non-negative values (major type 0 and non-negative floating-point | |||
numbers) stand for time values on or after 1970-01-01T00:00Z UTC and | numbers) stand for time values on or after 1970-01-01T00:00Z UTC and | |||
are interpreted according to POSIX [TIME_T]. (POSIX time is also | are interpreted according to POSIX [TIME_T]. (POSIX time is also | |||
known as UNIX Epoch time. Note that leap seconds are handled | known as UNIX Epoch time. Note that leap seconds are handled | |||
specially by POSIX time and this results in a 1 second discontinuity | specially by POSIX time and this results in a 1 second discontinuity | |||
several times per decade.) Note that applications that require the | several times per decade.) Note that applications that require the | |||
expression of times beyond early 2106 cannot leave out support of | expression of times beyond early 2106 cannot leave out support of | |||
64-bit integers for the tagged value. | 64-bit integers for the enclosed value. | |||
Negative values (major type 1 and negative floating-point numbers) | Negative values (major type 1 and negative floating-point numbers) | |||
are interpreted as determined by the application requirements as | are interpreted as determined by the application requirements as | |||
there is no universal standard for UTC count-of-seconds time before | there is no universal standard for UTC count-of-seconds time before | |||
1970-01-01T00:00Z (this is particularly true for points in time that | 1970-01-01T00:00Z (this is particularly true for points in time that | |||
precede discontinuities in national calendars). The same applies to | precede discontinuities in national calendars). The same applies to | |||
non-finite values. | non-finite values. | |||
To indicate fractional seconds, floating point values can be used | To indicate fractional seconds, floating-point values can be used | |||
within Tag 1 instead of integer values. Note that this generally | within Tag number 1 instead of integer values. Note that this | |||
requires binary64 support, as binary16 and binary32 provide non-zero | generally requires binary64 support, as binary16 and binary32 provide | |||
fractions of seconds only for a short period of time around early | non-zero fractions of seconds only for a short period of time around | |||
1970. An application that requires Tag 1 support may restrict the | early 1970. An application that requires Tag number 1 support may | |||
tagged value to be an integer (or a floating-point value) only. | restrict the enclosed value to be an integer (or a floating-point | |||
value) only. | ||||
3.4.4. Bignums | 3.4.4. Bignums | |||
Protocols using tag values 2 and 3 extend the generic data model | Protocols using tag numbers 2 and 3 extend the generic data model | |||
(Section 2) with "bignums" representing arbitrarily sized integers. | (Section 2) with "bignums" representing arbitrarily sized integers. | |||
In the generic data model, bignum values are not equal to integers | In the generic data model, bignum values are not equal to integers | |||
from the basic data model, but specific data models can define that | from the basic data model, but specific data models can define that | |||
equivalence, and preferred encoding never makes use of bignums that | equivalence, and preferred encoding never makes use of bignums that | |||
also can be expressed as basic integers (see below). | also can be expressed as basic integers (see below). | |||
Bignums are encoded as a byte string data item, which is interpreted | Bignums are encoded as a byte string data item, which is interpreted | |||
as an unsigned integer n in network byte order. Contained items of | as an unsigned integer n in network byte order. Contained items of | |||
other types are invalid. For tag value 2, the value of the bignum is | other types are invalid. For tag number 2, the value of the bignum | |||
n. For tag value 3, the value of the bignum is -1 - n. The | is n. For tag number 3, the value of the bignum is -1 - n. The | |||
preferred encoding of the byte string is to leave out any leading | preferred encoding of the byte string is to leave out any leading | |||
zeroes (note that this means the preferred encoding for n = 0 is the | zeroes (note that this means the preferred encoding for n = 0 is the | |||
empty byte string, but see below). Decoders that understand these | empty byte string, but see below). Decoders that understand these | |||
tags MUST be able to decode bignums that do have leading zeroes. The | tags MUST be able to decode bignums that do have leading zeroes. The | |||
preferred encoding of an integer that can be represented using major | preferred encoding of an integer that can be represented using major | |||
type 0 or 1 is to encode it this way instead of as a bignum (which | type 0 or 1 is to encode it this way instead of as a bignum (which | |||
means that the empty string never occurs in a bignum when using | means that the empty string never occurs in a bignum when using | |||
preferred encoding). Note that this means the non-preferred choice | preferred encoding). Note that this means the non-preferred choice | |||
of a bignum representation instead of a basic integer for encoding a | of a bignum representation instead of a basic integer for encoding a | |||
number is not intended to have application semantics (just as the | number is not intended to have application semantics (just as the | |||
choice of a longer basic integer representation than needed, such as | choice of a longer basic integer representation than needed, such as | |||
0x1800 for 0x00 does not). | 0x1800 for 0x00 does not). | |||
For example, the number 18446744073709551616 (2**64) is represented | For example, the number 18446744073709551616 (2**64) is represented | |||
as 0b110_00010 (major type 6, tag 2), followed by 0b010_01001 (major | as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 | |||
type 2, length 9), followed by 0x010000000000000000 (one byte 0x01 | (major type 2, length 9), followed by 0x010000000000000000 (one byte | |||
and eight bytes 0x00). In hexadecimal: | 0x01 and eight bytes 0x00). In hexadecimal: | |||
C2 -- Tag 2 | C2 -- Tag 2 | |||
49 -- Byte string of length 9 | 49 -- Byte string of length 9 | |||
010000000000000000 -- Bytes content | 010000000000000000 -- Bytes content | |||
3.4.5. Decimal Fractions and Bigfloats | 3.4.5. Decimal Fractions and Bigfloats | |||
Protocols using tag value 4 extend the generic data model with data | Protocols using tag number 4 extend the generic data model with data | |||
items representing arbitrary-length decimal fractions m*(10*e). | items representing arbitrary-length decimal fractions m*(10*e). | |||
Protocols using tag value 5 extend the generic data model with data | Protocols using tag number 5 extend the generic data model with data | |||
items representing arbitrary-length binary fractions m*(2*e). As | items representing arbitrary-length binary fractions m*(2*e). As | |||
with bignums, values of different types are not equal in the generic | with bignums, values of different types are not equal in the generic | |||
data model. | data model. | |||
Decimal fractions combine an integer mantissa with a base-10 scaling | Decimal fractions combine an integer mantissa with a base-10 scaling | |||
factor. They are most useful if an application needs the exact | factor. They are most useful if an application needs the exact | |||
representation of a decimal fraction such as 1.1 because there is no | representation of a decimal fraction such as 1.1 because there is no | |||
exact representation for many decimal fractions in binary floating | exact representation for many decimal fractions in binary floating | |||
point. | point. | |||
Bigfloats combine an integer mantissa with a base-2 scaling factor. | Bigfloats combine an integer mantissa with a base-2 scaling factor. | |||
They are binary floating-point values that can exceed the range or | They are binary floating-point values that can exceed the range or | |||
the precision of the three IEEE 754 formats supported by CBOR | the precision of the three IEEE 754 formats supported by CBOR | |||
(Section 3.3). Bigfloats may also be used by constrained | (Section 3.3). Bigfloats may also be used by constrained | |||
applications that need some basic binary floating-point capability | applications that need some basic binary floating-point capability | |||
without the need for supporting IEEE 754. | without the need for supporting IEEE 754. | |||
A decimal fraction or a bigfloat is represented as a tagged array | A decimal fraction or a bigfloat is represented as a tagged array | |||
that contains exactly two integer numbers: an exponent e and a | that contains exactly two integer numbers: an exponent e and a | |||
mantissa m. Decimal fractions (tag 4) use base-10 exponents; the | mantissa m. Decimal fractions (tag number 4) use base-10 exponents; | |||
value of a decimal fraction data item is m*(10**e). Bigfloats (tag | the value of a decimal fraction data item is m*(10**e). Bigfloats | |||
5) use base-2 exponents; the value of a bigfloat data item is | (tag number 5) use base-2 exponents; the value of a bigfloat data | |||
m*(2**e). The exponent e MUST be represented in an integer of major | item is m*(2**e). The exponent e MUST be represented in an integer | |||
type 0 or 1, while the mantissa also can be a bignum (Section 3.4.4). | of major type 0 or 1, while the mantissa also can be a bignum | |||
Contained items with other structures are invalid. | (Section 3.4.4). Contained items with other structures are invalid. | |||
An example of a decimal fraction is that the number 273.15 could be | An example of a decimal fraction is that the number 273.15 could be | |||
represented as 0b110_00100 (major type of 6 for the tag, additional | represented as 0b110_00100 (major type of 6 for the tag, additional | |||
information of 4 for the type of tag), followed by 0b100_00010 (major | information of 4 for the number of tag), followed by 0b100_00010 | |||
type of 4 for the array, additional information of 2 for the length | (major type of 4 for the array, additional information of 2 for the | |||
of the array), followed by 0b001_00001 (major type of 1 for the first | length of the array), followed by 0b001_00001 (major type of 1 for | |||
integer, additional information of 1 for the value of -2), followed | the first integer, additional information of 1 for the value of -2), | |||
by 0b000_11001 (major type of 0 for the second integer, additional | followed by 0b000_11001 (major type of 0 for the second integer, | |||
information of 25 for a two-byte value), followed by | additional information of 25 for a two-byte value), followed by | |||
0b0110101010110011 (27315 in two bytes). In hexadecimal: | 0b0110101010110011 (27315 in two bytes). In hexadecimal: | |||
C4 -- Tag 4 | C4 -- Tag 4 | |||
82 -- Array of length 2 | 82 -- Array of length 2 | |||
21 -- -2 | 21 -- -2 | |||
19 6ab3 -- 27315 | 19 6ab3 -- 27315 | |||
An example of a bigfloat is that the number 1.5 could be represented | An example of a bigfloat is that the number 1.5 could be represented | |||
as 0b110_00101 (major type of 6 for the tag, additional information | as 0b110_00101 (major type of 6 for the tag, additional information | |||
of 5 for the type of tag), followed by 0b100_00010 (major type of 4 | of 5 for the number of tag), followed by 0b100_00010 (major type of 4 | |||
for the array, additional information of 2 for the length of the | for the array, additional information of 2 for the length of the | |||
array), followed by 0b001_00000 (major type of 1 for the first | array), followed by 0b001_00000 (major type of 1 for the first | |||
integer, additional information of 0 for the value of -1), followed | integer, additional information of 0 for the value of -1), followed | |||
by 0b000_00011 (major type of 0 for the second integer, additional | by 0b000_00011 (major type of 0 for the second integer, additional | |||
information of 3 for the value of 3). In hexadecimal: | information of 3 for the value of 3). In hexadecimal: | |||
C5 -- Tag 5 | C5 -- Tag 5 | |||
82 -- Array of length 2 | 82 -- Array of length 2 | |||
20 -- -1 | 20 -- -1 | |||
03 -- 3 | 03 -- 3 | |||
skipping to change at page 22, line 50 ¶ | skipping to change at page 23, line 15 ¶ | |||
3.4.6. Content Hints | 3.4.6. Content Hints | |||
The tags in this section are for content hints that might be used by | The tags in this section are for content hints that might be used by | |||
generic CBOR processors. These content hints do not extend the | generic CBOR processors. These content hints do not extend the | |||
generic data model. | generic data model. | |||
3.4.6.1. Encoded CBOR Data Item | 3.4.6.1. Encoded CBOR Data Item | |||
Sometimes it is beneficial to carry an embedded CBOR data item that | Sometimes it is beneficial to carry an embedded CBOR data item that | |||
is not meant to be decoded immediately at the time the enclosing data | is not meant to be decoded immediately at the time the enclosing data | |||
item is being decoded. Tag 24 (CBOR data item) can be used to tag | item is being decoded. Tag number 24 (CBOR data item) can be used to | |||
the embedded byte string as a data item encoded in CBOR format. | tag the embedded byte string as a data item encoded in CBOR format. | |||
Contained items that aren't byte strings are invalid. Any contained | Contained items that aren't byte strings are invalid. Any contained | |||
byte string is valid, even if it encodes an invalid or ill-formed | byte string is valid, even if it encodes an invalid or ill-formed | |||
CBOR item. | CBOR item. | |||
3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters | |||
Tags 21 to 23 indicate that a byte string might require a specific | Tags number 21 to 23 indicate that a byte string might require a | |||
encoding when interoperating with a text-based representation. These | specific encoding when interoperating with a text-based | |||
tags are useful when an encoder knows that the byte string data it is | representation. These tags are useful when an encoder knows that the | |||
writing is likely to be later converted to a particular JSON-based | byte string data it is writing is likely to be later converted to a | |||
usage. That usage specifies that some strings are encoded as base64, | particular JSON-based usage. That usage specifies that some strings | |||
base64url, and so on. The encoder uses byte strings instead of doing | are encoded as base64, base64url, and so on. The encoder uses byte | |||
the encoding itself to reduce the message size, to reduce the code | strings instead of doing the encoding itself to reduce the message | |||
size of the encoder, or both. The encoder does not know whether or | size, to reduce the code size of the encoder, or both. The encoder | |||
not the converter will be generic, and therefore wants to say what it | does not know whether or not the converter will be generic, and | |||
believes is the proper way to convert binary strings to JSON. | therefore wants to say what it believes is the proper way to convert | |||
binary strings to JSON. | ||||
The data item tagged can be a byte string or any other data item. In | The data item tagged can be a byte string or any other data item. In | |||
the latter case, the tag applies to all of the byte string data items | the latter case, the tag applies to all of the byte string data items | |||
contained in the data item, except for those contained in a nested | contained in the data item, except for those contained in a nested | |||
data item tagged with an expected conversion. | data item tagged with an expected conversion. | |||
These three tag types suggest conversions to three of the base data | These three tag numbers suggest conversions to three of the base data | |||
encodings defined in [RFC4648]. For base64url encoding (tag 21), | encodings defined in [RFC4648]. For base64url encoding (tag number | |||
padding is not used (see Section 3.2 of RFC 4648); that is, all | 21), padding is not used (see Section 3.2 of RFC 4648); that is, all | |||
trailing equals signs ("=") are removed from the encoded string. For | trailing equals signs ("=") are removed from the encoded string. For | |||
base64 encoding (tag 22), padding is used as defined in RFC 4648. | base64 encoding (tag number 22), padding is used as defined in RFC | |||
For both base64url and base64, padding bits are set to zero (see | 4648. For both base64url and base64, padding bits are set to zero | |||
Section 3.5 of RFC 4648), and encoding is performed without the | (see Section 3.5 of RFC 4648), and encoding is performed without the | |||
inclusion of any line breaks, whitespace, or other additional | inclusion of any line breaks, whitespace, or other additional | |||
characters. Note that, for all three tags, the encoding of the empty | characters. Note that, for all three tag numbers, the encoding of | |||
byte string is the empty text string. | the empty byte string is the empty text string. | |||
3.4.6.3. Encoded Text | 3.4.6.3. Encoded Text | |||
Some text strings hold data that have formats widely used on the | Some text strings hold data that have formats widely used on the | |||
Internet, and sometimes those formats can be validated and presented | Internet, and sometimes those formats can be validated and presented | |||
to the application in appropriate form by the decoder. There are | to the application in appropriate form by the decoder. There are | |||
tags for some of these formats. As with tags 21 to 23, if these tags | tags for some of these formats. As with tag numbers 21 to 23, if | |||
are applied to an item other than a text string, they apply to all | these tags are applied to an item other than a text string, they | |||
text string data items it contains. | apply to all text string data items it contains. | |||
o Tag 32 is for URIs, as defined in [RFC3986]. If the text string | o Tag number 32 is for URIs, as defined in [RFC3986]. If the text | |||
doesn't match the "URI-reference" production, the string is | string doesn't match the "URI-reference" production, the string is | |||
invalid. | invalid. | |||
o Tags 33 and 34 are for base64url- and base64-encoded text strings, | o Tag numbers 33 and 34 are for base64url- and base64-encoded text | |||
as defined in [RFC4648]. If any of: | strings, as defined in [RFC4648]. If any of: | |||
* the encoded text string contains non-alphabet characters or | * the encoded text string contains non-alphabet characters or | |||
only 1 character in the last block of 4, or | only 1 character in the last block of 4, or | |||
* the padding bits in a 2- or 3-character block are not 0, or | * the padding bits in a 2- or 3-character block are not 0, or | |||
* the base64 encoding has the wrong number of padding characters, | * the base64 encoding has the wrong number of padding characters, | |||
or | or | |||
* the base64url encoding has padding characters, | * the base64url encoding has padding characters, | |||
the string is invalid. | the string is invalid. | |||
o Tag 35 is for regular expressions that are roughly in Perl | o Tag number 35 is for regular expressions that are roughly in Perl | |||
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a | |||
version of the JavaScript regular expression syntax [ECMA262]. | version of the JavaScript regular expression syntax [ECMA262]. | |||
(Note that more specific identification may be necessary if the | (Note that more specific identification may be necessary if the | |||
actual version of the specification underlying the regular | actual version of the specification underlying the regular | |||
expression, or more than just the text of the regular expression | expression, or more than just the text of the regular expression | |||
itself, need to be conveyed.) Any contained string value is | itself, need to be conveyed.) Any contained string value is | |||
valid. | valid. | |||
o Tag 36 is for MIME messages (including all headers), as defined in | o Tag number 36 is for MIME messages (including all headers), as | |||
[RFC2045]. A text string that isn't a valid MIME message is | defined in [RFC2045]. A text string that isn't a valid MIME | |||
invalid. | message is invalid. | |||
Note that tags 33 and 34 differ from 21 and 22 in that the data is | Note that tag numbers 33 and 34 differ from 21 and 22 in that the | |||
transported in base-encoded form for the former and in raw byte | data is transported in base-encoded form for the former and in raw | |||
string form for the latter. | byte string form for the latter. | |||
3.4.7. Self-Described CBOR | 3.4.7. Self-Described CBOR | |||
In many applications, it will be clear from the context that CBOR is | In many applications, it will be clear from the context that CBOR is | |||
being employed for encoding a data item. For instance, a specific | being employed for encoding a data item. For instance, a specific | |||
protocol might specify the use of CBOR, or a media type is indicated | protocol might specify the use of CBOR, or a media type is indicated | |||
that specifies its use. However, there may be applications where | that specifies its use. However, there may be applications where | |||
such context information is not available, such as when CBOR data is | such context information is not available, such as when CBOR data is | |||
stored in a file that does not have disambiguating metadata. Here, | stored in a file that does not have disambiguating metadata. Here, | |||
it may help to have some distinguishing characteristics for the data | it may help to have some distinguishing characteristics for the data | |||
itself. | itself. | |||
Tag 55799 is defined for this purpose. It does not impart any | Tag number 55799 is defined for this purpose. It does not impart any | |||
special semantics on the data item that follows; that is, the | special semantics on the data item that it encloses; that is, the | |||
semantics of a data item tagged with tag 55799 is exactly identical | semantics of a data item enclosed in tag number 55799 is exactly | |||
to the semantics of the data item itself. | identical to the semantics of the data item itself. | |||
The serialization of this tag is 0xd9d9f7, which does not appear to | The serialization of this tag's head is 0xd9d9f7, which does not | |||
be in use as a distinguishing mark for any frequently used file | appear to be in use as a distinguishing mark for any frequently used | |||
types. In particular, 0xd9d9f7 is not a valid start of a Unicode | file types. In particular, 0xd9d9f7 is not a valid start of a | |||
text in any Unicode encoding if it is followed by a valid CBOR data | Unicode text in any Unicode encoding if it is followed by a valid | |||
item. | CBOR data item. | |||
For instance, a decoder might be able to decode both CBOR and JSON. | For instance, a decoder might be able to decode both CBOR and JSON. | |||
Such a decoder would need to mechanically distinguish the two | Such a decoder would need to mechanically distinguish the two | |||
formats. An easy way for an encoder to help the decoder would be to | formats. An easy way for an encoder to help the decoder would be to | |||
tag the entire CBOR item with tag 55799, the serialization of which | tag the entire CBOR item with tag number 55799, the serialization of | |||
will never be found at the beginning of a JSON text. | which will never be found at the beginning of a JSON text. | |||
4. Serialization Considerations | 4. Serialization Considerations | |||
4.1. Preferred Serialization | 4.1. Preferred Serialization | |||
For some values at the data model level, CBOR provides multiple | For some values at the data model level, CBOR provides multiple | |||
serializations. For many applications, it is desirable that an | serializations. For many applications, it is desirable that an | |||
encoder always chooses a preferred serialization; however, the | encoder always chooses a preferred serialization; however, the | |||
present specification does not put the burden of enforcing this | present specification does not put the burden of enforcing this | |||
preference on either encoder or decoder. | preference on either encoder or decoder. | |||
skipping to change at page 25, line 51 ¶ | skipping to change at page 26, line 22 ¶ | |||
on only ever receiving preferred serializations ("variation-tolerant | on only ever receiving preferred serializations ("variation-tolerant | |||
decoder") can there be said to be more universally interoperable (it | decoder") can there be said to be more universally interoperable (it | |||
might very well optimize for the case of receiving preferred | might very well optimize for the case of receiving preferred | |||
serializations, though). Full implementations of CBOR decoders are | serializations, though). Full implementations of CBOR decoders are | |||
by definition variation-tolerant; the distinction is only relevant if | by definition variation-tolerant; the distinction is only relevant if | |||
a constrained implementation of a CBOR decoder meets a variant | a constrained implementation of a CBOR decoder meets a variant | |||
encoder. | encoder. | |||
The preferred serialization always uses the shortest form of | The preferred serialization always uses the shortest form of | |||
representing the argument (Section 3)); it also uses the shortest | representing the argument (Section 3)); it also uses the shortest | |||
floating point encoding that preserves the value being encoded (see | floating-point encoding that preserves the value being encoded (see | |||
Section 5.5). Definite length encoding is preferred whenever the | Section 5.5). Definite length encoding is preferred whenever the | |||
length is known at the time the serialization of the item starts. | length is known at the time the serialization of the item starts. | |||
4.2. Deterministically Encoded CBOR | 4.2. Deterministically Encoded CBOR | |||
Some protocols may want encoders to only emit CBOR in a particular | Some protocols may want encoders to only emit CBOR in a particular | |||
deterministic format; those protocols might also have the decoders | deterministic format; those protocols might also have the decoders | |||
check that their input is in that deterministic format. Those | check that their input is in that deterministic format. Those | |||
protocols are free to define what they mean by a "deterministic | protocols are free to define what they mean by a "deterministic | |||
format" and what encoders and decoders are expected to do. This | format" and what encoders and decoders are expected to do. This | |||
skipping to change at page 27, line 20 ¶ | skipping to change at page 27, line 39 ¶ | |||
definite-length items instead. | definite-length items instead. | |||
4.2.2. Additional Deterministic Encoding Considerations | 4.2.2. Additional Deterministic Encoding Considerations | |||
If a protocol allows for IEEE floats, then additional deterministic | If a protocol allows for IEEE floats, then additional deterministic | |||
encoding rules might need to be added. One example rule might be to | encoding rules might need to be added. One example rule might be to | |||
have all floats start as a 64-bit float, then do a test conversion to | have all floats start as a 64-bit float, then do a test conversion to | |||
a 32-bit float; if the result is the same numeric value, use the | a 32-bit float; if the result is the same numeric value, use the | |||
shorter value and repeat the process with a test conversion to a | shorter value and repeat the process with a test conversion to a | |||
16-bit float. (This rule selects 16-bit float for positive and | 16-bit float. (This rule selects 16-bit float for positive and | |||
negative Infinity as well.) Also, there are many representations for | negative Infinity as well.) Although IEEE floats can represent both | |||
NaN. If NaN is an allowed value, it must always be represented as | positive and negative zero as distinct values, the application might | |||
0xf97e00. | not distinguish these and might decide to represent all zero values | |||
with a positive sign, disallowing negative zero. Also, there are | ||||
many representations for NaN. If NaN is an allowed value, it must | ||||
always be represented as 0xf97e00. | ||||
CBOR tags present additional considerations for deterministic | CBOR tags present additional considerations for deterministic | |||
encoding. The absence or presence of tags in a deterministic format | encoding. The absence or presence of tags in a deterministic format | |||
is determined by the optionality of the tags in the protocol. In a | is determined by the optionality of the tags in the protocol. In a | |||
CBOR-based protocol that allows optional tagging anywhere, the | CBOR-based protocol that allows optional tagging anywhere, the | |||
deterministic format must not allow them. In a protocol that | deterministic format must not allow them. In a protocol that | |||
requires tags in certain places, the tag needs to appear in the | requires tags in certain places, the tag needs to appear in the | |||
deterministic format. A CBOR-based protocol that uses deterministic | deterministic format. A CBOR-based protocol that uses deterministic | |||
encoding might instead say that all tags that appear in a message | encoding might instead say that all tags that appear in a message | |||
must be retained regardless of whether they are optional. | must be retained regardless of whether they are optional. | |||
Protocols that include floating, big integer, or other complex values | Protocols that include floating, big integer, or other complex values | |||
need to define extra requirements on their deterministic encodings. | need to define extra requirements on their deterministic encodings. | |||
For example: | For example: | |||
o If a protocol includes a field that can express floating values | o If a protocol includes a field that can express floating-point | |||
(Section 3.3), the protocol's deterministic encoding needs to | values (Section 3.3), the protocol's deterministic encoding needs | |||
specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, | to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, | |||
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | 0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for | |||
this are: | this are: | |||
1. Encode integral values that fit in 64 bits as values from | 1. Encode integral values that fit in 64 bits as values from | |||
major types 0 and 1, and other values as the smallest of 16-, | major types 0 and 1, and other values as the smallest of 16-, | |||
32-, or 64-bit floating point that accurately represents the | 32-, or 64-bit floating point that accurately represents the | |||
value, | value, | |||
2. Encode all values as the smallest of 16-, 32-, or 64-bit | 2. Encode all values as the smallest of 16-, 32-, or 64-bit | |||
floating point that accurately represents the value, even for | floating point that accurately represents the value, even for | |||
integral values, or | integral values, or | |||
3. Encode all values as 64-bit floating point. | 3. Encode all values as 64-bit floating point. | |||
If NaN is an allowed value, the protocol needs to pick a single | If NaN is an allowed value, the protocol needs to pick a single | |||
representation, for example 0xf97e00. | representation, for example 0xf97e00. | |||
o If a protocol includes a field that can express integers larger | o If a protocol includes a field that can express integers with an | |||
than 2^64 using tag 2 (Section 3.4.4), the protocol's | absolute value of 2^64 or larger using tag numbers 2 or 3 | |||
deterministic encoding needs to specify whether small integers are | (Section 3.4.4), the protocol's deterministic encoding needs to | |||
expressed using the tag or major types 0 and 1. | specify whether small integers are expressed using the tag or | |||
major types 0 and 1. | ||||
o A protocol might give encoders the choice of representing a URL as | o A protocol might give encoders the choice of representing a URL as | |||
either a text string or, using Section 3.4.6.3, tag 32 containing | either a text string or, using Section 3.4.6.3, tag number 32 | |||
a text string. This protocol's deterministic encoding needs to | containing a text string. This protocol's deterministic encoding | |||
either require that the tag is present or require that it's | needs to either require that the tag is present or require that | |||
absent, not allow either one. | it's absent, not allow either one. | |||
4.2.3. Length-first map key ordering | 4.2.3. Length-first map key ordering | |||
The core deterministic encoding requirements sort map keys in a | The core deterministic encoding requirements sort map keys in a | |||
different order from the one suggested by Section 3.9 of [RFC7049] | different order from the one suggested by Section 3.9 of [RFC7049] | |||
(called "Canonical CBOR" there). Protocols that need to be | (called "Canonical CBOR" there). Protocols that need to be | |||
compatible with [RFC7049]'s order can instead be specified in terms | compatible with [RFC7049]'s order can instead be specified in terms | |||
of this specification's "length-first core deterministic encoding | of this specification's "length-first core deterministic encoding | |||
requirements": | requirements": | |||
skipping to change at page 30, line 36 ¶ | skipping to change at page 31, line 14 ¶ | |||
5.2. Generic Encoders and Decoders | 5.2. Generic Encoders and Decoders | |||
A generic CBOR decoder can decode all well-formed CBOR data and | A generic CBOR decoder can decode all well-formed CBOR data and | |||
present them to an application. See Appendix C. | present them to an application. See Appendix C. | |||
Even though CBOR attempts to minimize these cases, not all well- | Even though CBOR attempts to minimize these cases, not all well- | |||
formed CBOR data is valid: for example, the encoded text string | formed CBOR data is valid: for example, the encoded text string | |||
"0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR | "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR | |||
item. Also, specific tags may make semantic constraints that may be | item. Also, specific tags may make semantic constraints that may be | |||
violated, such as a bignum tag containing another tag, or an instance | violated, such as a bignum tag enclosing another tag, or an instance | |||
of tag 0 containing a byte string or a text string with contents that | of tag number 0 containing a byte string or a text string with | |||
do not match [RFC3339]'s "date-time" production. There is no | contents that do not match [RFC3339]'s "date-time" production. There | |||
requirement that generic encoders and decoders make unnatural choices | is no requirement that generic encoders and decoders make unnatural | |||
for their application interface to enable the processing of invalid | choices for their application interface to enable the processing of | |||
data. Generic encoders and decoders are expected to forward simple | invalid data. Generic encoders and decoders are expected to forward | |||
values and tags even if their specific codepoints are not registered | simple values and tags even if their specific codepoints are not | |||
at the time the encoder/decoder is written (Section 5.4). | registered at the time the encoder/decoder is written (Section 5.4). | |||
Generic decoders provide ways to present well-formed CBOR values, | Generic decoders provide ways to present well-formed CBOR values, | |||
both valid and invalid, to an application. The diagnostic notation | both valid and invalid, to an application. The diagnostic notation | |||
(Section 8) may be used to present well-formed CBOR values to humans. | (Section 8) may be used to present well-formed CBOR values to humans. | |||
Generic encoders provide an application interface that allows the | Generic encoders provide an application interface that allows the | |||
application to specify any well-formed value, including simple values | application to specify any well-formed value, including simple values | |||
and tags unknown to the encoder. | and tags unknown to the encoder. | |||
5.3. Invalid Items | 5.3. Invalid Items | |||
skipping to change at page 31, line 36 ¶ | skipping to change at page 32, line 15 ¶ | |||
Duplicate keys in a map: Generic decoders (Section 5.2) make data | Duplicate keys in a map: Generic decoders (Section 5.2) make data | |||
available to applications using the native CBOR data model. That | available to applications using the native CBOR data model. That | |||
data model includes maps (key-value mappings with unique keys), | data model includes maps (key-value mappings with unique keys), | |||
not multimaps (key-value mappings where multiple entries can have | not multimaps (key-value mappings where multiple entries can have | |||
the same key). Thus, a generic decoder that gets a CBOR map item | the same key). Thus, a generic decoder that gets a CBOR map item | |||
that has duplicate keys will decode to a map with only one | that has duplicate keys will decode to a map with only one | |||
instance of that key, or it might stop processing altogether. On | instance of that key, or it might stop processing altogether. On | |||
the other hand, a "streaming decoder" may not even be able to | the other hand, a "streaming decoder" may not even be able to | |||
notice (Section 5.6). | notice (Section 5.6). | |||
Inadmissible type on the value following a tag: Tags (Section 3.4) | Inadmissible type on the value enclosed by a tag: Tags (Section 3.4) | |||
specify what type of data item is supposed to follow the tag; for | specify what type of data item is supposed to be enclosed by the | |||
example, the tags for positive or negative bignums are supposed to | tag; for example, the tags for positive or negative bignums are | |||
be put on byte strings. A decoder that decodes the tagged data | supposed to be put on byte strings. A decoder that decodes the | |||
item into a native representation (a native big integer in this | tagged data item into a native representation (a native big | |||
example) is expected to check the type of the data item being | integer in this example) is expected to check the type of the data | |||
tagged. Even decoders that don't have such native representations | item being tagged. Even decoders that don't have such native | |||
available in their environment may perform the check on those tags | representations available in their environment may perform the | |||
known to them and react appropriately. | check on those tags known to them and react appropriately. | |||
Invalid UTF-8 string: A decoder might or might not want to verify | Invalid UTF-8 string: A decoder might or might not want to verify | |||
that the sequence of bytes in a UTF-8 string (major type 3) is | that the sequence of bytes in a UTF-8 string (major type 3) is | |||
actually valid UTF-8 and react appropriately. | actually valid UTF-8 and react appropriately. | |||
5.4. Handling Unknown Simple Values and Tags | 5.4. Handling Unknown Simple Values and Tags | |||
A decoder that comes across a simple value (Section 3.3) that it does | A decoder that comes across a simple value (Section 3.3) that it does | |||
not recognize, such as a value that was added to the IANA registry | not recognize, such as a value that was added to the IANA registry | |||
after the decoder was deployed or a value that the decoder chose not | after the decoder was deployed or a value that the decoder chose not | |||
to implement, might issue a warning, might stop processing | to implement, might issue a warning, might stop processing | |||
altogether, might handle the error by making the unknown value | altogether, might handle the error by making the unknown value | |||
available to the application as such (as is expected of generic | available to the application as such (as is expected of generic | |||
decoders), or take some other type of action. | decoders), or take some other type of action. | |||
A decoder that comes across a tag (Section 3.4) that it does not | A decoder that comes across a tag number (Section 3.4) that it does | |||
recognize, such as a tag that was added to the IANA registry after | not recognize, such as a tag number that was added to the IANA | |||
the decoder was deployed or a tag that the decoder chose not to | registry after the decoder was deployed or a tag number that the | |||
implement, might issue a warning, might stop processing altogether, | decoder chose not to implement, might issue a warning, might stop | |||
might handle the error and present the unknown tag value together | processing altogether, might handle the error and present the unknown | |||
with the contained data item to the application (as is expected of | tag number together with the enclosed data item to the application | |||
generic decoders), might ignore the tag and simply present the | (as is expected of generic decoders), might ignore the tag and simply | |||
contained data item only to the application, or take some other type | present the contained data item only to the application, or take some | |||
of action. | other type of action. | |||
5.5. Numbers | 5.5. Numbers | |||
CBOR-based protocols should take into account that different language | CBOR-based protocols should take into account that different language | |||
environments pose different restrictions on the range and precision | environments pose different restrictions on the range and precision | |||
of numbers that are representable. For example, the JavaScript | of numbers that are representable. For example, the JavaScript | |||
number system treats all numbers as floating point, which may result | number system treats all numbers as floating point, which may result | |||
in silent loss of precision in decoding integers with more than 53 | in silent loss of precision in decoding integers with more than 53 | |||
significant bits. A protocol that uses numbers should define its | significant bits. A protocol that uses numbers should define its | |||
expectations on the handling of non-trivial numbers in decoders and | expectations on the handling of non-trivial numbers in decoders and | |||
skipping to change at page 33, line 5 ¶ | skipping to change at page 33, line 32 ¶ | |||
A CBOR-based protocol designed for compactness may want to exclude | A CBOR-based protocol designed for compactness may want to exclude | |||
specific integer encodings that are longer than necessary for the | specific integer encodings that are longer than necessary for the | |||
application, such as to save the need to implement 64-bit integers. | application, such as to save the need to implement 64-bit integers. | |||
There is an expectation that encoders will use the most compact | There is an expectation that encoders will use the most compact | |||
integer representation that can represent a given value. However, a | integer representation that can represent a given value. However, a | |||
compact application should accept values that use a longer-than- | compact application should accept values that use a longer-than- | |||
needed encoding (such as encoding "0" as 0b000_11001 followed by two | needed encoding (such as encoding "0" as 0b000_11001 followed by two | |||
bytes of 0x00) as long as the application can decode an integer of | bytes of 0x00) as long as the application can decode an integer of | |||
the given size. | the given size. | |||
The preferred encoding for a floating point value is the shortest | The preferred encoding for a floating-point value is the shortest | |||
floating point encoding that preserves its value, e.g., 0xf94580 for | floating-point encoding that preserves its value, e.g., 0xf94580 for | |||
the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the | the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the | |||
CBOR-based protocol specifically excludes the use of the shorter | CBOR-based protocol specifically excludes the use of the shorter | |||
floating point encodings. For NaN values, a shorter encoding is | floating-point encodings. For NaN values, a shorter encoding is | |||
preferred if zero-padding the shorter significand towards the right | preferred if zero-padding the shorter significand towards the right | |||
reconstitutes the original NaN value (for many applications, the | reconstitutes the original NaN value (for many applications, the | |||
single NaN encoding 0xf97e00 will suffice). | single NaN encoding 0xf97e00 will suffice). | |||
5.6. Specifying Keys for Maps | 5.6. Specifying Keys for Maps | |||
The encoding and decoding applications need to agree on what types of | The encoding and decoding applications need to agree on what types of | |||
keys are going to be used in maps. In applications that need to | keys are going to be used in maps. In applications that need to | |||
interwork with JSON-based applications, keys probably should be | interwork with JSON-based applications, keys probably should be | |||
limited to UTF-8 strings only; otherwise, there has to be a specified | limited to UTF-8 strings only; otherwise, there has to be a specified | |||
mapping from the other CBOR types to Unicode characters, and this | mapping from the other CBOR types to Unicode characters, and this | |||
often leads to implementation errors. In applications where keys are | often leads to implementation errors. In applications where keys are | |||
numeric in nature and numeric ordering of keys is important to the | numeric in nature and numeric ordering of keys is important to the | |||
application, directly using the numbers for the keys is useful. | application, directly using the numbers for the keys is useful. | |||
If multiple types of keys are to be used, consideration should be | If multiple types of keys are to be used, consideration should be | |||
given to how these types would be represented in the specific | given to how these types would be represented in the specific | |||
programming environments that are to be used. For example, in | programming environments that are to be used. For example, in | |||
JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished | JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished | |||
from a key of floating point 1.0. This means that, if integer keys | from a key of floating-point 1.0. This means that, if integer keys | |||
are used, the protocol needs to avoid use of floating-point keys the | are used, the protocol needs to avoid use of floating-point keys the | |||
values of which happen to be integer numbers in the same map. | values of which happen to be integer numbers in the same map. | |||
Decoders that deliver data items nested within a CBOR data item | Decoders that deliver data items nested within a CBOR data item | |||
immediately on decoding them ("streaming decoders") often do not keep | immediately on decoding them ("streaming decoders") often do not keep | |||
the state that is necessary to ascertain uniqueness of a key in a | the state that is necessary to ascertain uniqueness of a key in a | |||
map. Similarly, an encoder that can start encoding data items before | map. Similarly, an encoder that can start encoding data items before | |||
the enclosing data item is completely available ("streaming encoder") | the enclosing data item is completely available ("streaming encoder") | |||
may want to reduce its overhead significantly by relying on its data | may want to reduce its overhead significantly by relying on its data | |||
source to maintain uniqueness. | source to maintain uniqueness. | |||
skipping to change at page 34, line 6 ¶ | skipping to change at page 34, line 35 ¶ | |||
except that it might have a rule that having identical keys in a map | except that it might have a rule that having identical keys in a map | |||
indicates a malformed map and that the decoder has to stop with an | indicates a malformed map and that the decoder has to stop with an | |||
error. Duplicate keys are also prohibited by CBOR decoders that are | error. Duplicate keys are also prohibited by CBOR decoders that are | |||
using strict mode (Section 5.8). | using strict mode (Section 5.8). | |||
The CBOR data model for maps does not allow ascribing semantics to | The CBOR data model for maps does not allow ascribing semantics to | |||
the order of the key/value pairs in the map representation. Thus, a | the order of the key/value pairs in the map representation. Thus, a | |||
CBOR-based protocol MUST NOT specify that changing the key/value pair | CBOR-based protocol MUST NOT specify that changing the key/value pair | |||
order in a map would change the semantics, except to specify that | order in a map would change the semantics, except to specify that | |||
some, orders are disallowed, for example where they would not meet | some, orders are disallowed, for example where they would not meet | |||
the requirements of a deterministic encoding (Section 4.2. (Any | the requirements of a deterministic encoding (Section 4.2). (Any | |||
secondary effects of map ordering such as on timing, cache usage, and | secondary effects of map ordering such as on timing, cache usage, and | |||
other potential side channels are not considered part of the | other potential side channels are not considered part of the | |||
semantics but may be enough reason on its own for a protocol to | semantics but may be enough reason on its own for a protocol to | |||
require a deterministic encoding format.) | require a deterministic encoding format.) | |||
Applications for constrained devices that have maps with 24 or fewer | Applications for constrained devices that have maps with 24 or fewer | |||
frequently used keys should consider using small integers (and those | frequently used keys should consider using small integers (and those | |||
with up to 48 frequently used keys should consider also using small | with up to 48 frequently used keys should consider also using small | |||
negative integers) because the keys can then be encoded in a single | negative integers) because the keys can then be encoded in a single | |||
byte. | byte. | |||
5.6.1. Equivalence of Keys | 5.6.1. Equivalence of Keys | |||
The specific data model applying to a CBOR data item is used to | The specific data model applying to a CBOR data item is used to | |||
determine whether keys occurring in maps are duplicates or distinct. | determine whether keys occurring in maps are duplicates or distinct. | |||
At the generic data model level, numerically equivalent integer and | At the generic data model level, numerically equivalent integer and | |||
floating point values are distinct from each other, as they are from | floating-point values are distinct from each other, as they are from | |||
the various big numbers (Tags 2 to 5). Similarly, text strings are | the various big numbers (Tags 2 to 5). Similarly, text strings are | |||
distinct from byte strings, even if composed of the same bytes. A | distinct from byte strings, even if composed of the same bytes. A | |||
tagged value is distinct from an untagged value or from a value | tagged value is distinct from an untagged value or from a value | |||
tagged with a different tag. | tagged with a different tag. | |||
Within each of these groups, numeric values are distinct unless they | Within each of these groups, numeric values are distinct unless they | |||
are numerically equal (specifically, -0.0 is equal to 0.0); for the | are numerically equal (specifically, -0.0 is equal to 0.0); for the | |||
purpose of map key equivalence, NaN (not a number) values are | purpose of map key equivalence, NaN (not a number) values are | |||
equivalent if they have the same significand after zero-extending | equivalent if they have the same significand after zero-extending | |||
both significands at the right to 64 bits. | both significands at the right to 64 bits. | |||
(Byte and text) strings are compared byte by byte, arrays element by | (Byte and text) strings are compared byte by byte, arrays element by | |||
element, and are equal if they have the same number of bytes/elements | element, and are equal if they have the same number of bytes/elements | |||
and the same values at the same positions. Two maps are equal if | and the same values at the same positions. Two maps are equal if | |||
they have the same set of pairs regardless of their order; pairs are | they have the same set of pairs regardless of their order; pairs are | |||
equal if both the key and value are equal. | equal if both the key and value are equal. | |||
Tagged values are equal if both the tag and the value are equal. | Tagged values are equal if both the tag number and the enclosed item | |||
Simple values are equal if they simply have the same value. Nothing | are equal. Simple values are equal if they simply have the same | |||
else is equal in the generic data model, a simple value 2 is not | value. Nothing else is equal in the generic data model, a simple | |||
equivalent to an integer 2 and an array is never equivalent to a map. | value 2 is not equivalent to an integer 2 and an array is never | |||
equivalent to a map. | ||||
As discussed in Section 2.2, specific data models can make values | As discussed in Section 2.2, specific data models can make values | |||
equivalent for the purpose of comparing map keys that are distinct in | equivalent for the purpose of comparing map keys that are distinct in | |||
the generic data model. Note that this implies that a generic | the generic data model. Note that this implies that a generic | |||
decoder may deliver a decoded map to an application that needs to be | decoder may deliver a decoded map to an application that needs to be | |||
checked for duplicate map keys by that application (alternatively, | checked for duplicate map keys by that application (alternatively, | |||
the decoder may provide a programming interface to perform this | the decoder may provide a programming interface to perform this | |||
service for the application). Specific data models cannot | service for the application). Specific data models cannot | |||
distinguish values for map keys that are equal for this purpose at | distinguish values for map keys that are equal for this purpose at | |||
the generic data model level. | the generic data model level. | |||
skipping to change at page 35, line 50 ¶ | skipping to change at page 36, line 32 ¶ | |||
(and does not return data) for a CBOR data item that contains any of | (and does not return data) for a CBOR data item that contains any of | |||
the following: | the following: | |||
o a map (major type 5) that has more than one entry with the same | o a map (major type 5) that has more than one entry with the same | |||
key | key | |||
o a tag that is used on a data item of the incorrect type | o a tag that is used on a data item of the incorrect type | |||
o a data item that is incorrectly formatted for the type given to | o a data item that is incorrectly formatted for the type given to | |||
it, such as invalid UTF-8 or data that cannot be interpreted with | it, such as invalid UTF-8 or data that cannot be interpreted with | |||
the specific tag that it has been tagged with | the specific tag number that it has been tagged with | |||
A decoder in strict mode can do one of two things when it encounters | A decoder in strict mode can do one of two things when it encounters | |||
a tag or simple value that it does not recognize: | a tag number or simple value that it does not recognize: | |||
o It can report an error (and not return data). | o It can report an error (and not return data). | |||
o It can emit the unknown item (type, value, and, for tags, the | o It can emit the unknown item (type, value, and, for tags, the | |||
decoded tagged data item) to the application calling the decoder | decoded tagged data item) to the application calling the decoder | |||
with an indication that the decoder did not recognize that tag or | with an indication that the decoder did not recognize that tag | |||
simple value. | number or simple value. | |||
The latter approach, which is also appropriate for non-strict | The latter approach, which is also appropriate for non-strict | |||
decoders, supports forward compatibility with newly registered tags | decoders, supports forward compatibility with newly registered tags | |||
and simple values without the requirement to update the encoder at | and simple values without the requirement to update the encoder at | |||
the same time as the calling application. (For this, the API for the | the same time as the calling application. (For this, the API for the | |||
decoder needs to have a way to mark unknown items so that the calling | decoder needs to have a way to mark unknown items so that the calling | |||
application can handle them in a manner appropriate for the program.) | application can handle them in a manner appropriate for the program.) | |||
Since some of this processing may have an appreciable cost (in | Since some of this processing may have an appreciable cost (in | |||
particular with duplicate detection for maps), support of strict mode | particular with duplicate detection for maps), support of strict mode | |||
is not a requirement placed on all CBOR decoders. | is not a requirement placed on all CBOR decoders. | |||
Some encoders will rely on their applications to provide input data | Some encoders will rely on their applications to provide input data | |||
in such a way that unambiguously decodable CBOR results. A generic | in such a way that unambiguously decodable CBOR results. A generic | |||
encoder also may want to provide a strict mode where it reliably | encoder also may want to provide a strict mode where it reliably | |||
limits its output to unambiguously decodable CBOR, independent of | limits its output to unambiguously decodable CBOR, independent of | |||
whether or not its application is providing API-conformant data. | whether or not its application is providing API-conformant data. | |||
skipping to change at page 37, line 44 ¶ | skipping to change at page 38, line 25 ¶ | |||
o A floating-point value (major type 7, additional information 25 | o A floating-point value (major type 7, additional information 25 | |||
through 27) becomes a JSON number if it is finite (that is, it can | through 27) becomes a JSON number if it is finite (that is, it can | |||
be represented in a JSON number); if the value is non-finite (NaN, | be represented in a JSON number); if the value is non-finite (NaN, | |||
or positive or negative Infinity), it is represented by the | or positive or negative Infinity), it is represented by the | |||
substitute value. | substitute value. | |||
o Any other simple value (major type 7, any additional information | o Any other simple value (major type 7, any additional information | |||
value not yet discussed) is represented by the substitute value. | value not yet discussed) is represented by the substitute value. | |||
o A bignum (major type 6, tag value 2 or 3) is represented by | o A bignum (major type 6, tag number 2 or 3) is represented by | |||
encoding its byte string in base64url without padding and becomes | encoding its byte string in base64url without padding and becomes | |||
a JSON string. For tag value 3 (negative bignum), a "~" (ASCII | a JSON string. For tag number 3 (negative bignum), a "~" (ASCII | |||
tilde) is inserted before the base-encoded value. (The conversion | tilde) is inserted before the base-encoded value. (The conversion | |||
to a binary blob instead of a number is to prevent a likely | to a binary blob instead of a number is to prevent a likely | |||
numeric overflow for the JSON decoder.) | numeric overflow for the JSON decoder.) | |||
o A byte string with an encoding hint (major type 6, tag value 21 | o A byte string with an encoding hint (major type 6, tag number 21 | |||
through 23) is encoded as described and becomes a JSON string. | through 23) is encoded as described and becomes a JSON string. | |||
o For all other tags (major type 6, any other tag value), the | o For all other tags (major type 6, any other tag number), the | |||
embedded CBOR item is represented as a JSON value; the tag value | enclosed CBOR item is represented as a JSON value; the tag number | |||
is ignored. | is ignored. | |||
o Indefinite-length items are made definite before conversion. | o Indefinite-length items are made definite before conversion. | |||
6.2. Converting from JSON to CBOR | 6.2. Converting from JSON to CBOR | |||
All JSON values, once decoded, directly map into one or more CBOR | All JSON values, once decoded, directly map into one or more CBOR | |||
values. As with any kind of CBOR generation, decisions have to be | values. As with any kind of CBOR generation, decisions have to be | |||
made with respect to number representation. In a suggested | made with respect to number representation. In a suggested | |||
conversion: | conversion: | |||
o JSON numbers without fractional parts (integer numbers) are | o JSON numbers without fractional parts (integer numbers) are | |||
represented as integers (major types 0 and 1, possibly major type | represented as integers (major types 0 and 1, possibly major type | |||
6 tag value 2 and 3), choosing the shortest form; integers longer | 6 tag number 2 and 3), choosing the shortest form; integers longer | |||
than an implementation-defined threshold (which is usually either | than an implementation-defined threshold (which is usually either | |||
32 or 64 bits) may instead be represented as floating-point | 32 or 64 bits) may instead be represented as floating-point | |||
values. (If the JSON was generated from a JavaScript | values. (If the JSON was generated from a JavaScript | |||
implementation, its precision is already limited to 53 bits | implementation, its precision is already limited to 53 bits | |||
maximum.) | maximum.) | |||
o Numbers with fractional parts are represented as floating-point | o Numbers with fractional parts are represented as floating-point | |||
values. Preferably, the shortest exact floating-point | values. Preferably, the shortest exact floating-point | |||
representation is used; for instance, 1.5 is represented in a | representation is used; for instance, 1.5 is represented in a | |||
16-bit floating-point value (not all implementations will be | 16-bit floating-point value (not all implementations will be | |||
skipping to change at page 40, line 5 ¶ | skipping to change at page 40, line 33 ¶ | |||
(and 224 slightly less efficient) values, only a small number have | (and 224 slightly less efficient) values, only a small number have | |||
been allocated. Implementations receiving an unknown simple data | been allocated. Implementations receiving an unknown simple data | |||
item may be able to process it as such, given that the structure | item may be able to process it as such, given that the structure | |||
of the value is indeed simple. The IANA registry in Section 9.1 | of the value is indeed simple. The IANA registry in Section 9.1 | |||
is the appropriate way to address the extensibility of this | is the appropriate way to address the extensibility of this | |||
codepoint space. | codepoint space. | |||
o the "tag" space (values in major type 6). Again, only a small | o the "tag" space (values in major type 6). Again, only a small | |||
part of the codepoint space has been allocated, and the space is | part of the codepoint space has been allocated, and the space is | |||
abundant (although the early numbers are more efficient than the | abundant (although the early numbers are more efficient than the | |||
later ones). Implementations receiving an unknown tag can choose | later ones). Implementations receiving an unknown tag number can | |||
to simply ignore it or to process it as an unknown tag wrapping | choose to simply ignore it or to process it as an unknown tag | |||
the following data item. The IANA registry in Section 9.2 is the | number wrapping the enclosed data item. The IANA registry in | |||
appropriate way to address the extensibility of this codepoint | Section 9.2 is the appropriate way to address the extensibility of | |||
space. | this codepoint space. | |||
o the "additional information" space. An implementation receiving | o the "additional information" space. An implementation receiving | |||
an unknown additional information value has no way to continue | an unknown additional information value has no way to continue | |||
decoding, so allocating codepoints to this space is a major step. | decoding, so allocating codepoints to this space is a major step. | |||
There are also very few codepoints left. | There are also very few codepoints left. | |||
7.2. Curating the Additional Information Space | 7.2. Curating the Additional Information Space | |||
The human mind is sometimes drawn to filling in little perceived gaps | The human mind is sometimes drawn to filling in little perceived gaps | |||
to make something neat. We expect the remaining gaps in the | to make something neat. We expect the remaining gaps in the | |||
skipping to change at page 41, line 14 ¶ | skipping to change at page 41, line 44 ¶ | |||
The notation borrows the JSON syntax for numbers (integer and | The notation borrows the JSON syntax for numbers (integer and | |||
floating point), True (>true<), False (>false<), Null (>null<), UTF-8 | floating point), True (>true<), False (>false<), Null (>null<), UTF-8 | |||
strings, arrays, and maps (maps are called objects in JSON; the | strings, arrays, and maps (maps are called objects in JSON; the | |||
diagnostic notation extends JSON here by allowing any data item in | diagnostic notation extends JSON here by allowing any data item in | |||
the key position). Undefined is written >undefined< as in | the key position). Undefined is written >undefined< as in | |||
JavaScript. The non-finite floating-point numbers Infinity, | JavaScript. The non-finite floating-point numbers Infinity, | |||
-Infinity, and NaN are written exactly as in this sentence (this is | -Infinity, and NaN are written exactly as in this sentence (this is | |||
also a way they can be written in JavaScript, although JSON does not | also a way they can be written in JavaScript, although JSON does not | |||
allow them). A tagged item is written as an integer number for the | allow them). A tagged item is written as an integer number for the | |||
tag followed by the item in parentheses; for instance, an RFC 3339 | tag, followed by the item in parentheses; for instance, an RFC 3339 | |||
(ISO 8601) date could be notated as: | (ISO 8601) date could be notated as: | |||
0("2013-03-21T20:04:00Z") | 0("2013-03-21T20:04:00Z") | |||
or the equivalent relative time as | or the equivalent relative time as | |||
1(1363896240) | 1(1363896240) | |||
Byte strings are notated in one of the base encodings, without | Byte strings are notated in one of the base encodings, without | |||
padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | padding, enclosed in single quotes, prefixed by >h< for base16, >b32< | |||
skipping to change at page 42, line 29 ¶ | skipping to change at page 43, line 11 ¶ | |||
IANA has created two registries for new CBOR values. The registries | IANA has created two registries for new CBOR values. The registries | |||
are separate, that is, not under an umbrella registry, and follow the | are separate, that is, not under an umbrella registry, and follow the | |||
rules in [RFC8126]. IANA has also assigned a new MIME media type and | rules in [RFC8126]. IANA has also assigned a new MIME media type and | |||
an associated Constrained Application Protocol (CoAP) Content-Format | an associated Constrained Application Protocol (CoAP) Content-Format | |||
entry. | entry. | |||
9.1. Simple Values Registry | 9.1. Simple Values Registry | |||
IANA has created the "Concise Binary Object Representation (CBOR) | IANA has created the "Concise Binary Object Representation (CBOR) | |||
Simple Values" registry at [IANA.cbor-simple-values]. The initial | Simple Values" registry at [IANA.cbor-simple-values]. The initial | |||
values are shown in Table 2. | values are shown in Table 3. | |||
New entries in the range 0 to 19 are assigned by Standards Action. | New entries in the range 0 to 19 are assigned by Standards Action. | |||
It is suggested that these Standards Actions allocate values starting | It is suggested that these Standards Actions allocate values starting | |||
with the number 16 in order to reserve the lower numbers for | with the number 16 in order to reserve the lower numbers for | |||
contiguous blocks (if any). | contiguous blocks (if any). | |||
New entries in the range 32 to 255 are assigned by Specification | New entries in the range 32 to 255 are assigned by Specification | |||
Required. | Required. | |||
9.2. Tags Registry | 9.2. Tags Registry | |||
skipping to change at page 45, line 47 ¶ | skipping to change at page 46, line 12 ¶ | |||
remotely crash a node, or even remotely execute arbitrary code on it. | remotely crash a node, or even remotely execute arbitrary code on it. | |||
CBOR attempts to narrow the opportunities for introducing such | CBOR attempts to narrow the opportunities for introducing such | |||
vulnerabilities by reducing parser complexity, by giving the entire | vulnerabilities by reducing parser complexity, by giving the entire | |||
range of encodable values a meaning where possible. | range of encodable values a meaning where possible. | |||
Because CBOR decoders are often used as a first step in processing | Because CBOR decoders are often used as a first step in processing | |||
unvalidated input, they need to be fully prepared for all types of | unvalidated input, they need to be fully prepared for all types of | |||
hostile input that may be designed to corrupt, overrun, or achieve | hostile input that may be designed to corrupt, overrun, or achieve | |||
control of the system decoding the CBOR data item. A CBOR decoder | control of the system decoding the CBOR data item. A CBOR decoder | |||
needs to assume that all input may be hostile even if it has been | needs to assume that all input may be hostile even if it has been | |||
checked by a firewall, has come over a TLS-secured channel, is | checked by a firewall, has come over a secure channel such as TLS, is | |||
encrypted or signed, or has come from some other source that is | encrypted or signed, or has come from some other source that is | |||
presumed trusted. | presumed trusted. | |||
Hostile input may be constructed to overrun buffers, overflow or | Hostile input may be constructed to overrun buffers, overflow or | |||
underflow integer arithmetic, or cause other decoding disruption. | underflow integer arithmetic, or cause other decoding disruption. | |||
CBOR data items might have lengths or sizes that are intentionally | CBOR data items might have lengths or sizes that are intentionally | |||
extremely large or too short. Resource exhaustion attacks might | extremely large or too short. Resource exhaustion attacks might | |||
attempt to lure a decoder into allocating very big data items | attempt to lure a decoder into allocating very big data items | |||
(strings, arrays, maps) or exhaust the stack depth by setting up | (strings, arrays, maps, or even arbitrary precision numbers) or | |||
deeply nested items. Decoders need to have appropriate resource | exhaust the stack depth by setting up deeply nested items. Decoders | |||
management to mitigate these attacks. (Items for which very large | need to have appropriate resource management to mitigate these | |||
sizes are given can also attempt to exploit integer overflow | attacks. (Items for which very large sizes are given can also | |||
vulnerabilities.) | attempt to exploit integer overflow vulnerabilities.) | |||
A CBOR decoder, by definition, only accepts well-formed CBOR; this is | A CBOR decoder, by definition, only accepts well-formed CBOR; this is | |||
the first step to its robustness. Input that is not well-formed CBOR | the first step to its robustness. Input that is not well-formed CBOR | |||
causes no further processing from the point where the lack of well- | causes no further processing from the point where the lack of well- | |||
formedness was detected. If possible, any data decoded up to this | formedness was detected. If possible, any data decoded up to this | |||
point should have no impact on the application using the CBOR | point should have no impact on the application using the CBOR | |||
decoder. | decoder. | |||
In addition to ascertaining well-formedness, a CBOR decoder might | In addition to ascertaining well-formedness, a CBOR decoder might | |||
also perform validity checks on the CBOR data. Alternatively, it can | also perform validity checks on the CBOR data. Alternatively, it can | |||
leave those checks to the application using the decoder. This choice | leave those checks to the application using the decoder. This choice | |||
needs to be clearly documented in the decoder. Beyond the validity | needs to be clearly documented in the decoder. Beyond the validity | |||
at the CBOR level, an application also needs to ascertain that the | at the CBOR level, an application also needs to ascertain that the | |||
input is in alignment with the application protocol that is | input is in alignment with the application protocol that is | |||
serialized in CBOR. | serialized in CBOR. | |||
The input check itself may consume resources. This is usually linear | ||||
in the size of the input, which means that an attacker has to spend | ||||
resources that are commensurate to the resources spent by the | ||||
defender on input validation. Processing for arbitrary-precision | ||||
numbers may exceed linear effort. Also, some hash-table | ||||
implementations that are used by decoders to build in-memory | ||||
representations of maps can be attacked to spend quadratic effort, | ||||
unless a secret key is employed (see Section 7 of [SIPHASH]). Such | ||||
superlinear efforts can be employed by an attacker to exhaust | ||||
resources at or before the input validator; they therefore need to be | ||||
avoided in a CBOR decoder implementation. Note that Tag number | ||||
definitions and their implementations can add security considerations | ||||
of this kind; this should then be discussed in the security | ||||
considerations of the Tag number definition. | ||||
CBOR encoders do not receive input directly from the network and are | CBOR encoders do not receive input directly from the network and are | |||
thus not directly attackable in the same way as CBOR decoders. | thus not directly attackable in the same way as CBOR decoders. | |||
However, CBOR encoders often have an API that takes input from | However, CBOR encoders often have an API that takes input from | |||
another level in the implementation and can be attacked through that | another level in the implementation and can be attacked through that | |||
API. The design and implementation of that API should assume the | API. The design and implementation of that API should assume the | |||
behavior of its caller may be based on hostile input or on coding | behavior of its caller may be based on hostile input or on coding | |||
mistakes. It should check inputs for buffer overruns, overflow and | mistakes. It should check inputs for buffer overruns, overflow and | |||
underflow of integer arithmetic, and other such errors that are aimed | underflow of integer arithmetic, and other such errors that are aimed | |||
to disrupt the encoder. | to disrupt the encoder. | |||
skipping to change at page 49, line 15 ¶ | skipping to change at page 49, line 43 ¶ | |||
[RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for | [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for | |||
Constrained-Node Networks", RFC 7228, | Constrained-Node Networks", RFC 7228, | |||
DOI 10.17487/RFC7228, May 2014, | DOI 10.17487/RFC7228, May 2014, | |||
<https://www.rfc-editor.org/info/rfc7228>. | <https://www.rfc-editor.org/info/rfc7228>. | |||
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data | [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data | |||
Interchange Format", STD 90, RFC 8259, | Interchange Format", STD 90, RFC 8259, | |||
DOI 10.17487/RFC8259, December 2017, | DOI 10.17487/RFC8259, December 2017, | |||
<https://www.rfc-editor.org/info/rfc8259>. | <https://www.rfc-editor.org/info/rfc8259>. | |||
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- | ||||
Input PRF", Lecture Notes in Computer Science pp. 489-508, | ||||
DOI 10.1007/978-3-642-34931-7_28, 2012. | ||||
[YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup | [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup | |||
Language (YAML[TM]) Version 1.2", 3rd Edition, October | Language (YAML[TM]) Version 1.2", 3rd Edition, October | |||
2009, <http://www.yaml.org/spec/1.2/spec.html>. | 2009, <http://www.yaml.org/spec/1.2/spec.html>. | |||
Appendix A. Examples | Appendix A. Examples | |||
The following table provides some CBOR-encoded values in hexadecimal | The following table provides some CBOR-encoded values in hexadecimal | |||
(right column), together with diagnostic notation for these values | (right column), together with diagnostic notation for these values | |||
(left column). Note that the string "\u00fc" is one form of | (left column). Note that the string "\u00fc" is one form of | |||
diagnostic notation for a UTF-8 string containing the single Unicode | diagnostic notation for a UTF-8 string containing the single Unicode | |||
skipping to change at page 54, line 6 ¶ | skipping to change at page 54, line 6 ¶ | |||
| 16, 17, 18, 19, 20, 21, 22, | | | | 16, 17, 18, 19, 20, 21, 22, | | | |||
| 23, 24, 25] | | | | 23, 24, 25] | | | |||
| | | | | | | | |||
| {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | | | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | | |||
| | | | | | | | |||
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | | ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | |||
| | | | | | | | |||
| {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | | | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
Table 4: Examples of Encoded CBOR Data Items | Table 5: Examples of Encoded CBOR Data Items | |||
Appendix B. Jump Table | Appendix B. Jump Table | |||
For brevity, this jump table does not show initial bytes that are | For brevity, this jump table does not show initial bytes that are | |||
reserved for future extension. It also only shows a selection of the | reserved for future extension. It also only shows a selection of the | |||
initial bytes that can be used for optional features. (All unsigned | initial bytes that can be used for optional features. (All unsigned | |||
integers are in network byte order.) | integers are in network byte order.) | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
| Byte | Structure/Semantics | | | Byte | Structure/Semantics | | |||
skipping to change at page 57, line 10 ¶ | skipping to change at page 57, line 10 ¶ | |||
| | | | | | | | |||
| 0xf9 | Half-Precision Float (two-byte IEEE 754) | | | 0xf9 | Half-Precision Float (two-byte IEEE 754) | | |||
| | | | | | | | |||
| 0xfa | Single-Precision Float (four-byte IEEE 754) | | | 0xfa | Single-Precision Float (four-byte IEEE 754) | | |||
| | | | | | | | |||
| 0xfb | Double-Precision Float (eight-byte IEEE 754) | | | 0xfb | Double-Precision Float (eight-byte IEEE 754) | | |||
| | | | | | | | |||
| 0xff | "break" stop code | | | 0xff | "break" stop code | | |||
+------------+------------------------------------------------------+ | +------------+------------------------------------------------------+ | |||
Table 5: Jump Table for Initial Byte | Table 6: Jump Table for Initial Byte | |||
Appendix C. Pseudocode | Appendix C. Pseudocode | |||
The well-formedness of a CBOR item can be checked by the pseudocode | The well-formedness of a CBOR item can be checked by the pseudocode | |||
in Figure 1. The data is well-formed if and only if: | in Figure 1. The data is well-formed if and only if: | |||
o the pseudocode does not "fail"; | o the pseudocode does not "fail"; | |||
o after execution of the pseudocode, no bytes are left in the input | o after execution of the pseudocode, no bytes are left in the input | |||
(except in streaming applications) | (except in streaming applications) | |||
skipping to change at page 58, line 5 ¶ | skipping to change at page 57, line 34 ¶ | |||
o take(n) reads n bytes from the input data and returns them as a | o take(n) reads n bytes from the input data and returns them as a | |||
byte string. If n bytes are no longer available, take(n) fails. | byte string. If n bytes are no longer available, take(n) fails. | |||
o uint() converts a byte string into an unsigned integer by | o uint() converts a byte string into an unsigned integer by | |||
interpreting the byte string in network byte order. | interpreting the byte string in network byte order. | |||
o Arithmetic works as in C. | o Arithmetic works as in C. | |||
o All variables are unsigned integers of sufficient range. | o All variables are unsigned integers of sufficient range. | |||
Note that "well_formed" returns the major type for well-formed | ||||
definite length items, but 0 for an indefinite length item (or -1 for | ||||
a break stop code, only if "breakable" is set). This is used in | ||||
"well_formed_indefinite" to ascertain that indefinite length strings | ||||
only contain definite length strings as chunks. | ||||
well_formed (breakable = false) { | well_formed (breakable = false) { | |||
// process initial bytes | // process initial bytes | |||
ib = uint(take(1)); | ib = uint(take(1)); | |||
mt = ib >> 5; | mt = ib >> 5; | |||
val = ai = ib & 0x1f; | val = ai = ib & 0x1f; | |||
switch (ai) { | switch (ai) { | |||
case 24: val = uint(take(1)); break; | case 24: val = uint(take(1)); break; | |||
case 25: val = uint(take(2)); break; | case 25: val = uint(take(2)); break; | |||
case 26: val = uint(take(4)); break; | case 26: val = uint(take(4)); break; | |||
case 27: val = uint(take(8)); break; | case 27: val = uint(take(8)); break; | |||
skipping to change at page 58, line 35 ¶ | skipping to change at page 58, line 35 ¶ | |||
case 6: well_formed(); break; // 1 embedded data item | case 6: well_formed(); break; // 1 embedded data item | |||
case 7: if (ai == 24 && val < 32) fail(); // bad simple | case 7: if (ai == 24 && val < 32) fail(); // bad simple | |||
} | } | |||
return mt; // finite data item | return mt; // finite data item | |||
} | } | |||
well_formed_indefinite(mt, breakable) { | well_formed_indefinite(mt, breakable) { | |||
switch (mt) { | switch (mt) { | |||
case 2: case 3: | case 2: case 3: | |||
while ((it = well_formed(true)) != -1) | while ((it = well_formed(true)) != -1) | |||
if (it != mt) // need finite embedded | if (it != mt) // need finite-length chunk | |||
fail(); // of same type | fail(); // of same type | |||
break; | break; | |||
case 4: while (well_formed(true) != -1); break; | case 4: while (well_formed(true) != -1); break; | |||
case 5: while (well_formed(true) != -1) well_formed(); break; | case 5: while (well_formed(true) != -1) well_formed(); break; | |||
case 7: | case 7: | |||
if (breakable) | if (breakable) | |||
return -1; // signal break out | return -1; // signal break out | |||
else fail(); // no enclosing indefinite | else fail(); // no enclosing indefinite | |||
default: fail(); // wrong mt | default: fail(); // wrong mt | |||
} | } | |||
skipping to change at page 62, line 11 ¶ | skipping to change at page 62, line 11 ¶ | |||
the years from the MessagePack user community to separate out binary | the years from the MessagePack user community to separate out binary | |||
and text strings in the encoding recently have led to an extension | and text strings in the encoding recently have led to an extension | |||
proposal that would leave MessagePack's "raw" data ambiguous between | proposal that would leave MessagePack's "raw" data ambiguous between | |||
its usages for binary and text data. The extension mechanism for | its usages for binary and text data. The extension mechanism for | |||
MessagePack remains unclear. | MessagePack remains unclear. | |||
E.3. BSON | E.3. BSON | |||
[BSON] is a data format that was developed for the storage of JSON- | [BSON] is a data format that was developed for the storage of JSON- | |||
like maps (JSON objects) in the MongoDB database. Its major | like maps (JSON objects) in the MongoDB database. Its major | |||
distinguishing feature is the capability for in-place update, | distinguishing feature is the capability for in-place update, which | |||
foregoing a compact representation. BSON uses a counted | prevents a compact representation. BSON uses a counted | |||
representation except for map keys, which are null-byte terminated. | representation except for map keys, which are null-byte terminated. | |||
While BSON can be used for the representation of JSON-like objects on | While BSON can be used for the representation of JSON-like objects on | |||
the wire, its specification is dominated by the requirements of the | the wire, its specification is dominated by the requirements of the | |||
database application and has become somewhat baroque. The status of | database application and has become somewhat baroque. The status of | |||
how BSON extensions will be implemented remains unclear. | how BSON extensions will be implemented remains unclear. | |||
E.4. MSDTP: RFC 713 | E.4. MSDTP: RFC 713 | |||
Message Services Data Transmission (MSDTP) is a very early example of | Message Services Data Transmission (MSDTP) is a very early example of | |||
a compact message format; it is described in [RFC0713], written in | a compact message format; it is described in [RFC0713], written in | |||
1976. It is included here for its historical value, not because it | 1976. It is included here for its historical value, not because it | |||
was ever widely used. | was ever widely used. | |||
E.5. Conciseness on the Wire | E.5. Conciseness on the Wire | |||
While CBOR's design objective of code compactness for encoders and | While CBOR's design objective of code compactness for encoders and | |||
decoders is a higher priority than its objective of conciseness on | decoders is a higher priority than its objective of conciseness on | |||
the wire, many people focus on the wire size. Table 6 shows some | the wire, many people focus on the wire size. Table 7 shows some | |||
encoding examples for the simple nested array [1, [2, 3]]; where some | encoding examples for the simple nested array [1, [2, 3]]; where some | |||
form of indefinite-length encoding is supported by the encoding, | form of indefinite-length encoding is supported by the encoding, | |||
[_ 1, [2, 3]] (indefinite length on the outer array) is also shown. | [_ 1, [2, 3]] (indefinite length on the outer array) is also shown. | |||
+-------------+--------------------------+--------------------------+ | +-------------+--------------------------+--------------------------+ | |||
| Format | [1, [2, 3]] | [_ 1, [2, 3]] | | | Format | [1, [2, 3]] | [_ 1, [2, 3]] | | |||
+-------------+--------------------------+--------------------------+ | +-------------+--------------------------+--------------------------+ | |||
| RFC 713 | c2 05 81 c2 02 82 83 | | | | RFC 713 | c2 05 81 c2 02 82 83 | | | |||
| | | | | | | | | | |||
| ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | | | ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | | |||
skipping to change at page 63, line 24 ¶ | skipping to change at page 63, line 24 ¶ | |||
| | | | | | | | | | |||
| BSON | 22 00 00 00 10 30 00 01 | | | | BSON | 22 00 00 00 10 30 00 01 | | | |||
| | 00 00 00 04 31 00 13 00 | | | | | 00 00 00 04 31 00 13 00 | | | |||
| | 00 00 10 30 00 02 00 00 | | | | | 00 00 10 30 00 02 00 00 | | | |||
| | 00 10 31 00 03 00 00 00 | | | | | 00 10 31 00 03 00 00 00 | | | |||
| | 00 00 | | | | | 00 00 | | | |||
| | | | | | | | | | |||
| CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | | |||
+-------------+--------------------------+--------------------------+ | +-------------+--------------------------+--------------------------+ | |||
Table 6: Examples for Different Levels of Conciseness | Table 7: Examples for Different Levels of Conciseness | |||
Appendix F. Changes from RFC 7049 | Appendix F. Changes from RFC 7049 | |||
The following is a list of known changes from RFC 7049. This list is | The following is a list of known changes from RFC 7049. This list is | |||
non-authoritative. It is meant to help reviewers see the significant | non-authoritative. It is meant to help reviewers see the significant | |||
differences. | differences. | |||
o Updated reference for [RFC4627] to [RFC8259] in many places | o Updated reference for [RFC4627] to [RFC8259] in many places | |||
o Updated reference for [CNN-TERMS] to [RFC7228] | o Updated reference for [CNN-TERMS] to [RFC7228] | |||
skipping to change at page 64, line 20 ¶ | skipping to change at page 64, line 20 ¶ | |||
for his msgpack-js and msgpack-js-browser projects. Many people have | for his msgpack-js and msgpack-js-browser projects. Many people have | |||
contributed to the discussion about extending MessagePack to separate | contributed to the discussion about extending MessagePack to separate | |||
text string representation from byte string representation. | text string representation from byte string representation. | |||
The encoding of the additional information in CBOR was inspired by | The encoding of the additional information in CBOR was inspired by | |||
the encoding of length information designed by Klaus Hartke for CoAP. | the encoding of length information designed by Klaus Hartke for CoAP. | |||
This document also incorporates suggestions made by many people, | This document also incorporates suggestions made by many people, | |||
notably Dan Frost, James Manger, Jeffrey Yaskin, Joe Hildebrand, | notably Dan Frost, James Manger, Jeffrey Yaskin, Joe Hildebrand, | |||
Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael | Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael | |||
Richardson, Nico Williams, Phillip Hallam-Baker, Ray Polk, Tim Bray, | Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray | |||
Tony Finch, Tony Hansen, and Yaron Sheffer. | Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer. | |||
Authors' Addresses | Authors' Addresses | |||
Carsten Bormann | Carsten Bormann | |||
Universitaet Bremen TZI | Universitaet Bremen TZI | |||
Postfach 330440 | Postfach 330440 | |||
D-28359 Bremen | D-28359 Bremen | |||
Germany | Germany | |||
Phone: +49-421-218-63921 | Phone: +49-421-218-63921 | |||
End of changes. 119 change blocks. | ||||
311 lines changed or deleted | 387 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |