draft-ietf-cbor-7049bis-13.txt | draft-ietf-cbor-7049bis-14.txt | |||
---|---|---|---|---|
Network Working Group C. Bormann | Network Working Group C. Bormann | |||
Internet-Draft Universitaet Bremen TZI | Internet-Draft Universitaet Bremen TZI | |||
Obsoletes: 7049 (if approved) P. Hoffman | Obsoletes: 7049 (if approved) P. Hoffman | |||
Intended status: Standards Track ICANN | Intended status: Standards Track ICANN | |||
Expires: 9 September 2020 8 March 2020 | Expires: 19 December 2020 17 June 2020 | |||
Concise Binary Object Representation (CBOR) | Concise Binary Object Representation (CBOR) | |||
draft-ietf-cbor-7049bis-13 | draft-ietf-cbor-7049bis-14 | |||
Abstract | Abstract | |||
The Concise Binary Object Representation (CBOR) is a data format | The Concise Binary Object Representation (CBOR) is a data format | |||
whose design goals include the possibility of extremely small code | whose design goals include the possibility of extremely small code | |||
size, fairly small message size, and extensibility without the need | size, fairly small message size, and extensibility without the need | |||
for version negotiation. These design goals make it different from | for version negotiation. These design goals make it different from | |||
earlier binary serializations such as ASN.1 and MessagePack. | earlier binary serializations such as ASN.1 and MessagePack. | |||
This document is a revised edition of RFC 7049, with editorial | This document is a revised edition of RFC 7049, with editorial | |||
improvements, added detail, and fixed errata. This revision formally | improvements, added detail, and fixed errata. This revision formally | |||
obsoletes RFC 7049, while keeping full compatibility of the | obsoletes RFC 7049, while keeping full compatibility of the | |||
interchange format from RFC 7049. It does not create a new version | interchange format from RFC 7049. It does not create a new version | |||
of the format. | of the format. | |||
Contributing | Contributing | |||
This note is to be removed before publishing as an RFC. | ||||
This document is being worked on in the CBOR Working Group. Please | This document is being worked on in the CBOR Working Group. Please | |||
contribute on the mailing list there, or in the GitHub repository for | contribute on the mailing list there, or in the GitHub repository for | |||
this draft: https://github.com/cbor-wg/CBORbis | this draft: https://github.com/cbor-wg/CBORbis | |||
The charter for the CBOR Working Group says that the WG will update | The charter for the CBOR Working Group says that the WG will update | |||
RFC 7049 to fix verified errata. Security issues and clarifications | RFC 7049 to fix verified errata. Security issues and clarifications | |||
may be addressed, but changes to this document will ensure backward | may be addressed, but changes to this document will ensure backward | |||
compatibility for popular deployed codebases. This document will be | compatibility for popular deployed codebases. This document will be | |||
targeted at becoming an Internet Standard. | targeted at becoming an Internet Standard. | |||
[RFC editor: please remove this note.] | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 9 September 2020. | This Internet-Draft will expire on 19 December 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2020 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
extracted from this document must include Simplified BSD License text | extracted from this document must include Simplified BSD License text | |||
as described in Section 4.e of the Trust Legal Provisions and are | as described in Section 4.e of the Trust Legal Provisions and are | |||
provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 | 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 | 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 9 | |||
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 | |||
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10 | 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10 | |||
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 | 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 | 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 14 | |||
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 | 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 14 | |||
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 | 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 | |||
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 | 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 | |||
3.2.4. Summary of indefinite-length use of major types . . . 17 | 3.2.4. Summary of indefinite-length use of major types . . . 17 | |||
3.3. Floating-Point Numbers and Values with No Content . . . . 17 | 3.3. Floating-Point Numbers and Values with No Content . . . . 18 | |||
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19 | 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19 | |||
3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22 | 3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22 | |||
3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 22 | 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 23 | |||
3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 23 | 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24 | 3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24 | |||
3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 25 | 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 26 | |||
3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 25 | 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 26 | |||
3.4.5.2. Expected Later Encoding for CBOR-to-JSON | 3.4.5.2. Expected Later Encoding for CBOR-to-JSON | |||
Converters . . . . . . . . . . . . . . . . . . . . 25 | Converters . . . . . . . . . . . . . . . . . . . . 26 | |||
3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 26 | 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 27 | |||
3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 27 | 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 28 | |||
4. Serialization Considerations . . . . . . . . . . . . . . . . 28 | 4. Serialization Considerations . . . . . . . . . . . . . . . . 29 | |||
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 28 | 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 29 | |||
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 29 | 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 30 | |||
4.2.1. Core Deterministic Encoding Requirements . . . . . . 29 | 4.2.1. Core Deterministic Encoding Requirements . . . . . . 30 | |||
4.2.2. Additional Deterministic Encoding Considerations . . 30 | 4.2.2. Additional Deterministic Encoding Considerations . . 31 | |||
4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 32 | 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 33 | |||
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 33 | 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 34 | |||
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 33 | 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 35 | |||
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 34 | 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 35 | |||
5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 35 | 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 36 | |||
5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 35 | 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 36 | |||
5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 35 | 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 37 | |||
5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 36 | 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 37 | |||
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37 | 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 38 | 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 39 | |||
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 39 | 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 41 | |||
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 40 | 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 42 | |||
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 40 | 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 42 | |||
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 41 | 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 42 | |||
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 42 | 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 43 | |||
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 43 | 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 44 | |||
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 43 | 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 45 | |||
7.2. Curating the Additional Information Space . . . . . . . . 44 | 7.2. Curating the Additional Information Space . . . . . . . . 46 | |||
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 45 | 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 46 | |||
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 46 | 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 47 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48 | |||
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 47 | 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 48 | |||
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 47 | 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 48 | |||
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 47 | 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 49 | |||
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 48 | 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 50 | |||
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 49 | 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 50 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 50 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 51 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 52 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 52 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 53 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 53 | 11.2. Informative References . . . . . . . . . . . . . . . . . 54 | |||
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 55 | Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 57 | |||
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 59 | Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 61 | |||
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 62 | Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 64 | |||
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 65 | Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 66 | |||
Appendix E. Comparison of Other Binary Formats to CBOR's Design | Appendix E. Comparison of Other Binary Formats to CBOR's Design | |||
Objectives . . . . . . . . . . . . . . . . . . . . . . . 66 | Objectives . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 67 | E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 68 | |||
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 67 | E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 68 | |||
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 68 | E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 69 | |||
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 68 | E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 69 | |||
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 68 | E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 69 | |||
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 69 | Appendix F. Well-formedness errors and examples . . . . . . . . 70 | |||
Appendix G. Well-formedness errors and examples . . . . . . . . 70 | F.1. Examples for CBOR data items that are not well-formed . . 71 | |||
G.1. Examples for CBOR data items that are not well-formed . . 71 | ||||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 73 | Appendix G. Changes from RFC 7049 . . . . . . . . . . . . . . . 73 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74 | G.1. Errata processing, clerical changes . . . . . . . . . . . 73 | |||
G.2. Changes in IANA considerations . . . . . . . . . . . . . 74 | ||||
G.3. Changes in suggestions and other informational | ||||
components . . . . . . . . . . . . . . . . . . . . . . . 74 | ||||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 76 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76 | ||||
1. Introduction | 1. Introduction | |||
There are hundreds of standardized formats for binary representation | There are hundreds of standardized formats for binary representation | |||
of structured data (also known as binary serialization formats). Of | of structured data (also known as binary serialization formats). Of | |||
those, some are for specific domains of information, while others are | those, some are for specific domains of information, while others are | |||
generalized for arbitrary data. In the IETF, probably the best-known | generalized for arbitrary data. In the IETF, probably the best-known | |||
formats in the latter category are ASN.1's BER and DER [ASN.1]. | formats in the latter category are ASN.1's BER and DER [ASN.1]. | |||
The format defined here follows some specific design goals that are | The format defined here follows some specific design goals that are | |||
skipping to change at page 6, line 44 ¶ | skipping to change at page 7, line 4 ¶ | |||
Decoder: A process that decodes a well-formed encoded CBOR data item | Decoder: A process that decodes a well-formed encoded CBOR data item | |||
and makes it available to an application. Formally speaking, a | and makes it available to an application. Formally speaking, a | |||
decoder contains a parser to break up the input using the syntax | decoder contains a parser to break up the input using the syntax | |||
rules of CBOR, as well as a semantic processor to prepare the data | rules of CBOR, as well as a semantic processor to prepare the data | |||
in a form suitable to the application. | in a form suitable to the application. | |||
Encoder: A process that generates the (well-formed) representation | Encoder: A process that generates the (well-formed) representation | |||
format of a CBOR data item from application information. | format of a CBOR data item from application information. | |||
Data Stream: A sequence of zero or more data items, not further | Data Stream: A sequence of zero or more data items, not further | |||
assembled into a larger containing data item. The independent | assembled into a larger containing data item (see [RFC8742] for | |||
data items that make up a data stream are sometimes also referred | one application). The independent data items that make up a data | |||
to as "top-level data items". | stream are sometimes also referred to as "top-level data items". | |||
Well-formed: A data item that follows the syntactic structure of | Well-formed: A data item that follows the syntactic structure of | |||
CBOR. A well-formed data item uses the initial bytes and the byte | CBOR. A well-formed data item uses the initial bytes and the byte | |||
strings and/or data items that are implied by their values as | strings and/or data items that are implied by their values as | |||
defined in CBOR and does not include following extraneous data. | defined in CBOR and does not include following extraneous data. | |||
CBOR decoders by definition only return contents from well-formed | CBOR decoders by definition only return contents from well-formed | |||
data items. | data items. | |||
Valid: A data item that is well-formed and also follows the semantic | Valid: A data item that is well-formed and also follows the semantic | |||
restrictions that apply to CBOR data items (Section 5.3). | restrictions that apply to CBOR data items (Section 5.3). | |||
Expected: Besides its normal English meaning, the term "expected" is | Expected: Besides its normal English meaning, the term "expected" is | |||
used to describe requirements beyond CBOR validity that an | used to describe requirements beyond CBOR validity that an | |||
application has on its input data. Well-formed (processable at | application has on its input data. Well-formed (processable at | |||
all), valid (checked by a validity-checking generic decoder), and | all), valid (checked by a validity-checking generic decoder), and | |||
skipping to change at page 8, line 40 ¶ | skipping to change at page 9, line 5 ¶ | |||
* a mapping (mathematical function) from zero or more data items | * a mapping (mathematical function) from zero or more data items | |||
("keys") each to a data item ("values"), ("map") | ("keys") each to a data item ("values"), ("map") | |||
* a tagged data item ("tag"), comprising a tag number (an integer in | * a tagged data item ("tag"), comprising a tag number (an integer in | |||
the range 0..2**64-1) and the tag content (a data item) | the range 0..2**64-1) and the tag content (a data item) | |||
Note that integer and floating-point values are distinct in this | Note that integer and floating-point values are distinct in this | |||
model, even if they have the same numeric value. | model, even if they have the same numeric value. | |||
Also note that serialization variants, such as the number of bytes of | Also note that serialization variants are not visible at the generic | |||
the encoded floating-point value, or the choice of one of the ways in | data model level, including the number of bytes of the encoded | |||
which an integer, the length of a text or byte string, the number of | floating-point value or the choice of one of the ways in which an | |||
elements in an array or pairs in a map, or a tag number, | integer, the length of a text or byte string, the number of elements | |||
(collectively "the argument", see Section 3) can be encoded, are not | in an array or pairs in a map, or a tag number, (collectively "the | |||
visible at the generic data model level. | argument", see Section 3) can be encoded. | |||
2.1. Extended Generic Data Models | 2.1. Extended Generic Data Models | |||
This basic generic data model comes pre-extended by the registration | This basic generic data model comes pre-extended by the registration | |||
of a number of simple values and tag numbers right in this document, | of a number of simple values and tag numbers right in this document, | |||
such as: | such as: | |||
* "false", "true", "null", and "undefined" (simple values identified | * "false", "true", "null", and "undefined" (simple values identified | |||
by 20..23) | by 20..23) | |||
skipping to change at page 11, line 51 ¶ | skipping to change at page 12, line 15 ¶ | |||
5 would have an initial byte of 0b010_00101 (major type 2, | 5 would have an initial byte of 0b010_00101 (major type 2, | |||
additional information 5 for the length), followed by 5 bytes of | additional information 5 for the length), followed by 5 bytes of | |||
binary content. A byte string whose length is 500 would have 3 | binary content. A byte string whose length is 500 would have 3 | |||
initial bytes of 0b010_11001 (major type 2, additional information | initial bytes of 0b010_11001 (major type 2, additional information | |||
25 to indicate a two-byte length) followed by the two bytes 0x01f4 | 25 to indicate a two-byte length) followed by the two bytes 0x01f4 | |||
for a length of 500, followed by 500 bytes of binary content. | for a length of 500, followed by 500 bytes of binary content. | |||
Major type 3: a text string (Section 2), encoded as UTF-8 | Major type 3: a text string (Section 2), encoded as UTF-8 | |||
([RFC3629]). The number of bytes in the string is equal to the | ([RFC3629]). The number of bytes in the string is equal to the | |||
argument. A string containing an invalid UTF-8 sequence is well- | argument. A string containing an invalid UTF-8 sequence is well- | |||
formed but invalid. This type is provided for systems that need | formed but invalid (Section 1.2). This type is provided for | |||
to interpret or display human-readable text, and allows the | systems that need to interpret or display human-readable text, and | |||
differentiation between unstructured bytes and text that has a | allows the differentiation between unstructured bytes and text | |||
specified repertoire and encoding. In contrast to formats such as | that has a specified repertoire and encoding. In contrast to | |||
JSON, the Unicode characters in this type are never escaped. | formats such as JSON, the Unicode characters in this type are | |||
Thus, a newline character (U+000A) is always represented in a | never escaped. Thus, a newline character (U+000A) is always | |||
string as the byte 0x0a, and never as the bytes 0x5c6e (the | represented in a string as the byte 0x0a, and never as the bytes | |||
characters "\" and "n") or as 0x5c7530303061 (the characters "\", | 0x5c6e (the characters "\" and "n") or as 0x5c7530303061 (the | |||
"u", "0", "0", "0", and "a"). | characters "\", "u", "0", "0", "0", and "a"). | |||
Major type 4: an array of data items. In other formats, arrays are | Major type 4: an array of data items. In other formats, arrays are | |||
also called lists, sequences, or tuples (a "CBOR sequence" is | also called lists, sequences, or tuples (a "CBOR sequence" is | |||
something slightly different, though [RFC8742]). The argument is | something slightly different, though [RFC8742]). The argument is | |||
the number of data items in the array. Items in an array do not | the number of data items in the array. Items in an array do not | |||
need to all be of the same type. For example, an array that | need to all be of the same type. For example, an array that | |||
contains 10 items of any type would have an initial byte of | contains 10 items of any type would have an initial byte of | |||
0b100_01010 (major type of 4, additional information of 10 for the | 0b100_01010 (major type of 4, additional information of 10 for the | |||
length) followed by the 10 remaining items. | length) followed by the 10 remaining items. | |||
skipping to change at page 14, line 21 ¶ | skipping to change at page 14, line 40 ¶ | |||
Indefinite-length arrays and maps are represented using their major | Indefinite-length arrays and maps are represented using their major | |||
type with the additional information value of 31, followed by an | type with the additional information value of 31, followed by an | |||
arbitrary-length sequence of zero or more items for an array or key/ | arbitrary-length sequence of zero or more items for an array or key/ | |||
value pairs for a map, followed by the "break" stop code | value pairs for a map, followed by the "break" stop code | |||
(Section 3.2.1). In other words, indefinite-length arrays and maps | (Section 3.2.1). In other words, indefinite-length arrays and maps | |||
look identical to other arrays and maps except for beginning with the | look identical to other arrays and maps except for beginning with the | |||
additional information value of 31 and ending with the "break" stop | additional information value of 31 and ending with the "break" stop | |||
code. | code. | |||
If the break stop code appears after a key in a map, in place of that | If the "break" stop code appears after a key in a map, in place of | |||
key's value, the map is not well-formed. | that key's value, the map is not well-formed. | |||
There is no restriction against nesting indefinite-length array or | There is no restriction against nesting indefinite-length array or | |||
map items. A "break" only terminates a single item, so nested | map items. A "break" only terminates a single item, so nested | |||
indefinite-length items need exactly as many "break" stop codes as | indefinite-length items need exactly as many "break" stop codes as | |||
there are type bytes starting an indefinite-length item. | there are type bytes starting an indefinite-length item. | |||
For example, assume an encoder wants to represent the abstract array | For example, assume an encoder wants to represent the abstract array | |||
[1, [2, 3], [4, 5]]. The definite-length encoding would be | [1, [2, 3], [4, 5]]. The definite-length encoding would be | |||
0x8301820203820405: | 0x8301820203820405: | |||
skipping to change at page 16, line 33 ¶ | skipping to change at page 16, line 47 ¶ | |||
respectively, if no chunk is present). (Note that zero-length | respectively, if no chunk is present). (Note that zero-length | |||
chunks, while not particularly useful, are permitted.) | chunks, while not particularly useful, are permitted.) | |||
If any item between the indefinite-length string indicator | If any item between the indefinite-length string indicator | |||
(0b010_11111 or 0b011_11111) and the "break" stop code is not a | (0b010_11111 or 0b011_11111) and the "break" stop code is not a | |||
definite-length string item of the same major type, the string is not | definite-length string item of the same major type, the string is not | |||
well-formed. | well-formed. | |||
If any definite-length text string inside an indefinite-length text | If any definite-length text string inside an indefinite-length text | |||
string is invalid, the indefinite-length text string is invalid. | string is invalid, the indefinite-length text string is invalid. | |||
Note that this implies that the bytes of a single UTF-8 character | Note that this implies that the UTF-8 bytes of a single Unicode code | |||
cannot be split up between chunks: a new chunk of a text string can | point (scalar value) cannot be spread between chunks: a new chunk of | |||
only be started at a character boundary. | a text string can only be started at a code point boundary. | |||
For example, assume an encoded data item consisting of the bytes: | For example, assume an encoded data item consisting of the bytes: | |||
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 | 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 | |||
5F -- Start indefinite-length byte string | 5F -- Start indefinite-length byte string | |||
44 -- Byte string of length 4 | 44 -- Byte string of length 4 | |||
aabbccdd -- Bytes content | aabbccdd -- Bytes content | |||
43 -- Byte string of length 3 | 43 -- Byte string of length 3 | |||
eeff99 -- Bytes content | eeff99 -- Bytes content | |||
skipping to change at page 19, line 10 ¶ | skipping to change at page 19, line 30 ¶ | |||
| 32..255 | (Unassigned) | | | 32..255 | (Unassigned) | | |||
+---------+-----------------+ | +---------+-----------------+ | |||
Table 4: Simple Values | Table 4: Simple Values | |||
An encoder MUST NOT issue two-byte sequences that start with 0xf8 | An encoder MUST NOT issue two-byte sequences that start with 0xf8 | |||
(major type = 7, additional information = 24) and continue with a | (major type = 7, additional information = 24) and continue with a | |||
byte less than 0x20 (32 decimal). Such sequences are not well- | byte less than 0x20 (32 decimal). Such sequences are not well- | |||
formed. (This implies that an encoder cannot encode false, true, | formed. (This implies that an encoder cannot encode false, true, | |||
null, or undefined in two-byte sequences, only the one-byte variants | null, or undefined in two-byte sequences, only the one-byte variants | |||
of these are well-formed.) | of these are well-formed; more generally speaking, each simple value | |||
only has a single representation variant). | ||||
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit | |||
IEEE 754 binary floating-point values [IEEE754]. These floating- | IEEE 754 binary floating-point values [IEEE754]. These floating- | |||
point values are encoded in the additional bytes of the appropriate | point values are encoded in the additional bytes of the appropriate | |||
size. (See Appendix D for some information about 16-bit floating- | size. (See Appendix D for some information about 16-bit floating- | |||
point numbers.) | point numbers.) | |||
3.4. Tagging of Items | 3.4. Tagging of Items | |||
In CBOR, a data item can be enclosed by a tag to give it some | In CBOR, a data item can be enclosed by a tag to give it some | |||
skipping to change at page 20, line 6 ¶ | skipping to change at page 20, line 30 ¶ | |||
for instance as 0x01, 0x1801, or 0x190001. The tag definition may | for instance as 0x01, 0x1801, or 0x190001. The tag definition may | |||
include the definition of a preferred serialization (Section 4.1) | include the definition of a preferred serialization (Section 4.1) | |||
that is recommended for generic encoders; this may prefer basic | that is recommended for generic encoders; this may prefer basic | |||
generic data model representations over ones that employ a tag. | generic data model representations over ones that employ a tag. | |||
The tag definition usually restricts what kinds of nested data item | The tag definition usually restricts what kinds of nested data item | |||
or items are valid for such tags. Tag definitions may restrict their | or items are valid for such tags. Tag definitions may restrict their | |||
content to a very specific syntactic structure, as the tags defined | content to a very specific syntactic structure, as the tags defined | |||
in this document do, or they may aim at a more semantically defined | in this document do, or they may aim at a more semantically defined | |||
definition of their content, as for instance tags 40 and 1040 do | definition of their content, as for instance tags 40 and 1040 do | |||
[rfc8746]: These accept a number of different ways of representing | [RFC8746]: These accept a number of different ways of representing | |||
arrays. | arrays. | |||
As a matter of convention, many tags do not accept null or undefined | As a matter of convention, many tags do not accept null or undefined | |||
values as tag content; instead, the expectation is that a null or | values as tag content; instead, the expectation is that a null or | |||
undefined value can be used in place of the entire tag; Section 3.4.2 | undefined value can be used in place of the entire tag; Section 3.4.2 | |||
provides some further considerations for one specific tag about the | provides some further considerations for one specific tag about the | |||
handling of this convention in application protocols and in mapping | handling of this convention in application protocols and in mapping | |||
to platform types. | to platform types. | |||
Decoders do not need to understand tags of every tag number, and tags | Decoders do not need to understand tags of every tag number, and tags | |||
skipping to change at page 21, line 47 ¶ | skipping to change at page 22, line 22 ¶ | |||
| | | Section 3.4.6 | | | | | Section 3.4.6 | | |||
+------------+-------------+----------------------------------+ | +------------+-------------+----------------------------------+ | |||
Table 5: Tag numbers defined in RFC 7049 | Table 5: Tag numbers defined in RFC 7049 | |||
Conceptually, tags are interpreted in the generic data model, not at | Conceptually, tags are interpreted in the generic data model, not at | |||
(de-)serialization time. A small number of tags (specifically, tag | (de-)serialization time. A small number of tags (specifically, tag | |||
number 25 and tag number 29) have been registered with semantics that | number 25 and tag number 29) have been registered with semantics that | |||
may require processing at (de-)serialization time: The decoder needs | may require processing at (de-)serialization time: The decoder needs | |||
to be aware and the encoder needs to be in control of the exact | to be aware and the encoder needs to be in control of the exact | |||
sequence in which data items are encoded into the CBOR data stream. | sequence in which data items are encoded into the CBOR data item. | |||
This means these tags cannot be implemented on top of every generic | This means these tags cannot be implemented on top of every generic | |||
CBOR encoder/decoder (which might not reflect the serialization order | CBOR encoder/decoder (which might not reflect the serialization order | |||
for entries in a map at the data model level and vice versa); their | for entries in a map at the data model level and vice versa); their | |||
implementation therefore typically needs to be integrated into the | implementation therefore typically needs to be integrated into the | |||
generic encoder/decoder. The definition of new tags with this | generic encoder/decoder. The definition of new tags with this | |||
property is NOT RECOMMENDED. | property is NOT RECOMMENDED. | |||
IANA allocated tag numbers 65535, 4294967295, and | ||||
18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit). | ||||
These can be used as a convenience for implementers that want a | ||||
single integer to indicate either that a specific tag is present, or | ||||
the absence of a tag. That allocation is described in Section 10 of | ||||
[I-D.bormann-cbor-notable-tags]. These tags are not intended to | ||||
occur in actual CBOR data items; implementations may flag such an | ||||
occurrence as an error. | ||||
Protocols using tag numbers 0 and 1 extend the generic data model | Protocols using tag numbers 0 and 1 extend the generic data model | |||
(Section 2) with data items representing points in time; tag numbers | (Section 2) with data items representing points in time; tag numbers | |||
2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, | 2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, | |||
with floating-point values of arbitrary size and precision. | with floating-point values of arbitrary size and precision. | |||
3.4.1. Standard Date/Time String | 3.4.1. Standard Date/Time String | |||
Tag number 0 contains a text string in the standard format described | Tag number 0 contains a text string in the standard format described | |||
by the "date-time" production in [RFC3339], as refined by Section 3.3 | by the "date-time" production in [RFC3339], as refined by Section 3.3 | |||
of [RFC4287], representing the point in time described there. A | of [RFC4287], representing the point in time described there. A | |||
skipping to change at page 25, line 32 ¶ | skipping to change at page 26, line 26 ¶ | |||
The tags in this section are for content hints that might be used by | The tags in this section are for content hints that might be used by | |||
generic CBOR processors. These content hints do not extend the | generic CBOR processors. These content hints do not extend the | |||
generic data model. | generic data model. | |||
3.4.5.1. Encoded CBOR Data Item | 3.4.5.1. Encoded CBOR Data Item | |||
Sometimes it is beneficial to carry an embedded CBOR data item that | Sometimes it is beneficial to carry an embedded CBOR data item that | |||
is not meant to be decoded immediately at the time the enclosing data | is not meant to be decoded immediately at the time the enclosing data | |||
item is being decoded. Tag number 24 (CBOR data item) can be used to | item is being decoded. Tag number 24 (CBOR data item) can be used to | |||
tag the embedded byte string as a data item encoded in CBOR format. | tag the embedded byte string as a single data item encoded in CBOR | |||
Contained items that aren't byte strings are invalid. A contained | format. Contained items that aren't byte strings are invalid. A | |||
byte string is valid if it encodes a well-formed CBOR item; validity | contained byte string is valid if it encodes a well-formed CBOR data | |||
checking of the decoded CBOR item is not required for tag validity | item; validity checking of the decoded CBOR item is not required for | |||
(but could be offered by a generic decoder as a special option). | tag validity (but could be offered by a generic decoder as a special | |||
option). | ||||
3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters | 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters | |||
Tags number 21 to 23 indicate that a byte string might require a | Tags number 21 to 23 indicate that a byte string might require a | |||
specific encoding when interoperating with a text-based | specific encoding when interoperating with a text-based | |||
representation. These tags are useful when an encoder knows that the | representation. These tags are useful when an encoder knows that the | |||
byte string data it is writing is likely to be later converted to a | byte string data it is writing is likely to be later converted to a | |||
particular JSON-based usage. That usage specifies that some strings | particular JSON-based usage. That usage specifies that some strings | |||
are encoded as base64, base64url, and so on. The encoder uses byte | are encoded as base64, base64url, and so on. The encoder uses byte | |||
strings instead of doing the encoding itself to reduce the message | strings instead of doing the encoding itself to reduce the message | |||
skipping to change at page 26, line 29 ¶ | skipping to change at page 27, line 24 ¶ | |||
whitespace, or other additional characters. Tag number 23 suggests | whitespace, or other additional characters. Tag number 23 suggests | |||
conversion to base16 (hex) encoding, with uppercase alphabetics (see | conversion to base16 (hex) encoding, with uppercase alphabetics (see | |||
Section 8 of RFC 4648). Note that, for all three tag numbers, the | Section 8 of RFC 4648). Note that, for all three tag numbers, the | |||
encoding of the empty byte string is the empty text string. | encoding of the empty byte string is the empty text string. | |||
3.4.5.3. Encoded Text | 3.4.5.3. Encoded Text | |||
Some text strings hold data that have formats widely used on the | Some text strings hold data that have formats widely used on the | |||
Internet, and sometimes those formats can be validated and presented | Internet, and sometimes those formats can be validated and presented | |||
to the application in appropriate form by the decoder. There are | to the application in appropriate form by the decoder. There are | |||
tags for some of these formats. As with tag numbers 21 to 23, if | tags for some of these formats. | |||
these tags are applied to an item other than a text string, they | ||||
apply to all text string data items it contains. | ||||
* Tag number 32 is for URIs, as defined in [RFC3986]. If the text | * Tag number 32 is for URIs, as defined in [RFC3986]. If the text | |||
string doesn't match the "URI-reference" production, the string is | string doesn't match the "URI-reference" production, the string is | |||
invalid. | invalid. | |||
* Tag numbers 33 and 34 are for base64url- and base64-encoded text | * Tag numbers 33 and 34 are for base64url- and base64-encoded text | |||
strings, respectively, as defined in [RFC4648]. If any of: | strings, respectively, as defined in [RFC4648]. If any of: | |||
- the encoded text string contains non-alphabet characters or | - the encoded text string contains non-alphabet characters or | |||
only 1 character in the last block of 4, or | only 1 alphabet character in the last block of 4 (where | |||
alphabet is defined by Section 5 of [RFC4648] for tag number 33 | ||||
and Section 4 of [RFC4648] for tag number 34), or | ||||
- the padding bits in a 2- or 3-character block are not 0, or | - the padding bits in a 2- or 3-character block are not 0, or | |||
- the base64 encoding has the wrong number of padding characters, | - the base64 encoding has the wrong number of padding characters, | |||
or | or | |||
- the base64url encoding has padding characters, | - the base64url encoding has padding characters, | |||
the string is invalid. | the string is invalid. | |||
skipping to change at page 27, line 39 ¶ | skipping to change at page 28, line 39 ¶ | |||
In many applications, it will be clear from the context that CBOR is | In many applications, it will be clear from the context that CBOR is | |||
being employed for encoding a data item. For instance, a specific | being employed for encoding a data item. For instance, a specific | |||
protocol might specify the use of CBOR, or a media type is indicated | protocol might specify the use of CBOR, or a media type is indicated | |||
that specifies its use. However, there may be applications where | that specifies its use. However, there may be applications where | |||
such context information is not available, such as when CBOR data is | such context information is not available, such as when CBOR data is | |||
stored in a file that does not have disambiguating metadata. Here, | stored in a file that does not have disambiguating metadata. Here, | |||
it may help to have some distinguishing characteristics for the data | it may help to have some distinguishing characteristics for the data | |||
itself. | itself. | |||
Tag number 55799 is defined for this purpose. It does not impart any | Tag number 55799 is defined for this purpose, specifically for use at | |||
special semantics on the data item that it encloses; that is, the | the start of a stored encoded CBOR data item as specified by an | |||
semantics of the tag content enclosed in tag number 55799 is exactly | application. It does not impart any special semantics on the data | |||
identical to the semantics of the tag content itself. | item that it encloses; that is, the semantics of the tag content | |||
enclosed in tag number 55799 is exactly identical to the semantics of | ||||
the tag content itself. | ||||
The serialization of this tag's head is 0xd9d9f7, which does not | The serialization of this tag's head is 0xd9d9f7, which does not | |||
appear to be in use as a distinguishing mark for any frequently used | appear to be in use as a distinguishing mark for any frequently used | |||
file types. In particular, 0xd9d9f7 is not a valid start of a | file types. In particular, 0xd9d9f7 is not a valid start of a | |||
Unicode text in any Unicode encoding if it is followed by a valid | Unicode text in any Unicode encoding if it is followed by a valid | |||
CBOR data item. | CBOR data item. | |||
For instance, a decoder might be able to decode both CBOR and JSON. | For instance, a decoder might be able to decode both CBOR and JSON. | |||
Such a decoder would need to mechanically distinguish the two | Such a decoder would need to mechanically distinguish the two | |||
formats. An easy way for an encoder to help the decoder would be to | formats. An easy way for an encoder to help the decoder would be to | |||
skipping to change at page 34, line 22 ¶ | skipping to change at page 35, line 37 ¶ | |||
Note that some applications and protocols will not want to use | Note that some applications and protocols will not want to use | |||
indefinite-length encoding. Using indefinite-length encoding allows | indefinite-length encoding. Using indefinite-length encoding allows | |||
an encoder to not need to marshal all the data for counting, but it | an encoder to not need to marshal all the data for counting, but it | |||
requires a decoder to allocate increasing amounts of memory while | requires a decoder to allocate increasing amounts of memory while | |||
waiting for the end of the item. This might be fine for some | waiting for the end of the item. This might be fine for some | |||
applications but not others. | applications but not others. | |||
5.2. Generic Encoders and Decoders | 5.2. Generic Encoders and Decoders | |||
A generic CBOR decoder can decode all well-formed CBOR data and | A generic CBOR decoder can decode all well-formed encoded CBOR data | |||
present them to an application. See Appendix C. | items and present the data items to an application. See Appendix C. | |||
(The diagnostic notation, Section 8, may be used to present well- | ||||
formed CBOR values to humans.) | ||||
Generic CBOR encoders provide an application interface that allows | ||||
the application to specify any well-formed value to be encoded as a | ||||
CBOR data item, including simple values and tags unknown to the | ||||
encoder. | ||||
Even though CBOR attempts to minimize these cases, not all well- | Even though CBOR attempts to minimize these cases, not all well- | |||
formed CBOR data is valid: for example, the encoded text string | formed CBOR data is valid: for example, the encoded text string | |||
"0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR | "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires | |||
item. Also, specific tags may make semantic constraints that may be | always using the shortest form) and so is not a valid CBOR item. | |||
violated, such as a bignum tag enclosing another tag, or an instance | Also, specific tags may make semantic constraints that may be | |||
of tag number 0 containing a byte string, or containing a text string | violated, for instance by a bignum tag enclosing another tag, or by | |||
with contents that do not match [RFC3339]'s "date-time" production. | an instance of tag number 0 containing a byte string, or containing a | |||
There is no requirement that generic encoders and decoders make | text string with contents that do not match [RFC3339]'s "date-time" | |||
unnatural choices for their application interface to enable the | production. There is no requirement that generic encoders and | |||
processing of invalid data. Generic encoders and decoders are | decoders make unnatural choices for their application interface to | |||
expected to forward simple values and tags even if their specific | enable the processing of invalid data. Generic encoders and decoders | |||
are expected to forward simple values and tags even if their specific | ||||
codepoints are not registered at the time the encoder/decoder is | codepoints are not registered at the time the encoder/decoder is | |||
written (Section 5.4). | written (Section 5.4). | |||
Generic decoders provide ways to present well-formed CBOR values, | ||||
both valid and invalid, to an application. The diagnostic notation | ||||
(Section 8) may be used to present well-formed CBOR values to humans. | ||||
Generic encoders provide an application interface that allows the | ||||
application to specify any well-formed value, including simple values | ||||
and tags unknown to the encoder. | ||||
5.3. Validity of Items | 5.3. Validity of Items | |||
A well-formed but invalid CBOR data item presents a problem with | A well-formed but invalid CBOR data item (Section 1.2) presents a | |||
interpreting the data encoded in it in the CBOR data model. A CBOR- | problem with interpreting the data encoded in it in the CBOR data | |||
based protocol could be specified in several layers, in which the | model. A CBOR-based protocol could be specified in several layers, | |||
lower layers don't process the semantics of some of the CBOR data | in which the lower layers don't process the semantics of some of the | |||
they forward. These layers can't notice any validity errors in data | CBOR data they forward. These layers can't notice any validity | |||
they don't process and MUST forward that data as-is. The first layer | errors in data they don't process and MUST forward that data as-is. | |||
that does process the semantics of an invalid CBOR item MUST take one | The first layer that does process the semantics of an invalid CBOR | |||
of two choices: | item MUST take one of two choices: | |||
1. Replace the problematic item with an error marker and continue | 1. Replace the problematic item with an error marker and continue | |||
with the next item, or | with the next item, or | |||
2. Issue an error and stop processing altogether. | 2. Issue an error and stop processing altogether. | |||
A CBOR-based protocol MUST specify which of these options its | A CBOR-based protocol MUST specify which of these options its | |||
decoders take, for each kind of invalid item they might encounter. | decoders take, for each kind of invalid item they might encounter. | |||
Such problems might occur at the basic validity level of CBOR or in | Such problems might occur at the basic validity level of CBOR or in | |||
skipping to change at page 35, line 40 ¶ | skipping to change at page 36, line 48 ¶ | |||
model: | model: | |||
Duplicate keys in a map: Generic decoders (Section 5.2) make data | Duplicate keys in a map: Generic decoders (Section 5.2) make data | |||
available to applications using the native CBOR data model. That | available to applications using the native CBOR data model. That | |||
data model includes maps (key-value mappings with unique keys), | data model includes maps (key-value mappings with unique keys), | |||
not multimaps (key-value mappings where multiple entries can have | not multimaps (key-value mappings where multiple entries can have | |||
the same key). Thus, a generic decoder that gets a CBOR map item | the same key). Thus, a generic decoder that gets a CBOR map item | |||
that has duplicate keys will decode to a map with only one | that has duplicate keys will decode to a map with only one | |||
instance of that key, or it might stop processing altogether. On | instance of that key, or it might stop processing altogether. On | |||
the other hand, a "streaming decoder" may not even be able to | the other hand, a "streaming decoder" may not even be able to | |||
notice (Section 5.6). | notice. See Section 5.6 for more discussion of keys in maps. | |||
Invalid UTF-8 string: A decoder might or might not want to verify | Invalid UTF-8 string: A decoder might or might not want to verify | |||
that the sequence of bytes in a UTF-8 string (major type 3) is | that the sequence of bytes in a UTF-8 string (major type 3) is | |||
actually valid UTF-8 and react appropriately. | actually valid UTF-8 and react appropriately. | |||
5.3.2. Tag validity | 5.3.2. Tag validity | |||
Two additional kinds of validity errors are introduced by adding tags | Two additional kinds of validity errors are introduced by adding tags | |||
to the basic generic data model: | to the basic generic data model: | |||
skipping to change at page 43, line 51 ¶ | skipping to change at page 45, line 23 ¶ | |||
7.1. Extension Points | 7.1. Extension Points | |||
In a protocol design, opportunities for evolution are often included | In a protocol design, opportunities for evolution are often included | |||
in the form of extension points. For example, there may be a | in the form of extension points. For example, there may be a | |||
codepoint space that is not fully allocated from the outset, and the | codepoint space that is not fully allocated from the outset, and the | |||
protocol is designed to tolerate and embrace implementations that | protocol is designed to tolerate and embrace implementations that | |||
start using more codepoints than initially allocated. | start using more codepoints than initially allocated. | |||
Sizing the codepoint space may be difficult because the range | Sizing the codepoint space may be difficult because the range | |||
required may be hard to predict. An attempt should be made to make | required may be hard to predict. Protocol designs should attempt to | |||
the codepoint space large enough so that it can slowly be filled over | make the codepoint space large enough so that it can slowly be filled | |||
the intended lifetime of the protocol. | over the intended lifetime of the protocol. | |||
CBOR has three major extension points: | CBOR has three major extension points: | |||
* the "simple" space (values in major type 7). Of the 24 efficient | * the "simple" space (values in major type 7). Of the 24 efficient | |||
(and 224 slightly less efficient) values, only a small number have | (and 224 slightly less efficient) values, only a small number have | |||
been allocated. Implementations receiving an unknown simple data | been allocated. Implementations receiving an unknown simple data | |||
item may be able to process it as such, given that the structure | item may easily be able to process it as such, given that the | |||
of the value is indeed simple. The IANA registry in Section 9.1 | structure of the value is indeed simple. The IANA registry in | |||
is the appropriate way to address the extensibility of this | Section 9.1 is the appropriate way to address the extensibility of | |||
codepoint space. | this codepoint space. | |||
* the "tag" space (values in major type 6). Again, only a small | * the "tag" space (values in major type 6). The total codepoint | |||
part of the codepoint space has been allocated, and the space is | space is abundant; only a tiny part of it has been allocated. | |||
abundant (although the early numbers are more efficient than the | However, not all of these codepoints are equally efficient: the | |||
later ones). Implementations receiving an unknown tag number can | first 24 only consume a single ("1+0") byte, and half of them have | |||
choose to simply ignore it (process just the enclosed tag content) | already been allocated. The next 232 values only consume two | |||
or to process it as an unknown tag number wrapping the tag | ("1+1") bytes, with nearly a quarter already allocated. These | |||
content. The IANA registry in Section 9.2 is the appropriate way | subspaces need some curation to last for a few more decades. | |||
to address the extensibility of this codepoint space. | Implementations receiving an unknown tag number can choose to | |||
process just the enclosed tag content or, preferably, to process | ||||
the tag as an unknown tag number wrapping the tag content. The | ||||
IANA registry in Section 9.2 is the appropriate way to address the | ||||
extensibility of this codepoint space. | ||||
* the "additional information" space. An implementation receiving | * the "additional information" space. An implementation receiving | |||
an unknown additional information value has no way to continue | an unknown additional information value has no way to continue | |||
decoding, so allocating codepoints to this space is a major step. | decoding, so allocating codepoints in this space is a major step | |||
There are also very few codepoints left. See also Section 7.2. | beyond just exercising an extension point. There are also very | |||
few codepoints left. See also Section 7.2. | ||||
7.2. Curating the Additional Information Space | 7.2. Curating the Additional Information Space | |||
The human mind is sometimes drawn to filling in little perceived gaps | The human mind is sometimes drawn to filling in little perceived gaps | |||
to make something neat. We expect the remaining gaps in the | to make something neat. We expect the remaining gaps in the | |||
codepoint space for the additional information values to be an | codepoint space for the additional information values to be an | |||
attractor for new ideas, just because they are there. | attractor for new ideas, just because they are there. | |||
The present specification does not manage the additional information | The present specification does not manage the additional information | |||
codepoint space by an IANA registry. Instead, allocations out of | codepoint space by an IANA registry. Instead, allocations out of | |||
skipping to change at page 47, line 26 ¶ | skipping to change at page 49, line 5 ¶ | |||
New entries in the range 32 to 255 are assigned by Specification | New entries in the range 32 to 255 are assigned by Specification | |||
Required. | Required. | |||
9.2. Tags Registry | 9.2. Tags Registry | |||
IANA has created the "Concise Binary Object Representation (CBOR) | IANA has created the "Concise Binary Object Representation (CBOR) | |||
Tags" registry at [IANA.cbor-tags]. The tags that were defined in | Tags" registry at [IANA.cbor-tags]. The tags that were defined in | |||
[RFC7049] are described in detail in Section 3.4, and other tags have | [RFC7049] are described in detail in Section 3.4, and other tags have | |||
already been defined. | already been defined. | |||
New entries in the range 0 to 23 are assigned by Standards Action. | New entries in the range 0 to 23 ("1+0") are assigned by Standards | |||
New entries in the range 24 to 255 are assigned by Specification | Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767 | |||
Required. New entries in the range 256 to 18446744073709551615 are | (lower half of "1+2") are assigned by Specification Required. New | |||
assigned by First Come First Served. The template for registration | entries in the range 32768 to 18446744073709551615 (upper half of | |||
requests is: | "1+2", "1+4", and "1+8") are assigned by First Come First Served. | |||
The template for registration requests is: | ||||
* Data item | * Data item | |||
* Semantics (short form) | * Semantics (short form) | |||
In addition, First Come First Served requests should include: | In addition, First Come First Served requests should include: | |||
* Point of contact | * Point of contact | |||
* Description of semantics (URL) - This description is optional; the | * Description of semantics (URL) - This description is optional; the | |||
URL can point to something like an Internet-Draft or a web page. | URL can point to something like an Internet-Draft or a web page. | |||
Applicants exercising the First Come First Served range and making a | ||||
suggestion for a tag number that is not representable in 32 bits | ||||
(i.e., larger than 4294967295) should be aware that this could reduce | ||||
interoperability with implementations that do not support 64-bit | ||||
numbers. | ||||
9.3. Media Type ("MIME Type") | 9.3. Media Type ("MIME Type") | |||
The Internet media type [RFC6838] for a single encoded CBOR data item | The Internet media type [RFC6838] for a single encoded CBOR data item | |||
is application/cbor, as defined in [IANA.media-types]: | is application/cbor, as defined in [IANA.media-types]: | |||
Type name: application | Type name: application | |||
Subtype name: cbor | Subtype name: cbor | |||
Required parameters: n/a | Required parameters: n/a | |||
skipping to change at page 49, line 49 ¶ | skipping to change at page 51, line 31 ¶ | |||
"xxx/yyy+cbor". | "xxx/yyy+cbor". | |||
Security Considerations: See Section 10 of this document | Security Considerations: See Section 10 of this document | |||
Contact: IETF CBOR Working Group cbor@ietf.org | Contact: IETF CBOR Working Group cbor@ietf.org | |||
(mailto:cbor@ietf.org) or IETF Applications and Real-Time Area | (mailto:cbor@ietf.org) or IETF Applications and Real-Time Area | |||
art@ietf.org (mailto:art@ietf.org) | art@ietf.org (mailto:art@ietf.org) | |||
Author/Change Controller: The IESG iesg@ietf.org | Author/Change Controller: The IESG iesg@ietf.org | |||
(mailto:iesg@ietf.org) | (mailto:iesg@ietf.org) | |||
// Editors' note: RFC 6838 has a template | // Editors' note: RFC 6838 has a template field Author/Change | |||
field Author/Change | // controller, the descriptive text of which makes clear that this | |||
// controller, the descriptive text of | is | |||
which makes clear that this is | // the change controller, not the author. Go figure. There is no | |||
// the change controller, not the author. | // separate author entry as in the media types registry. (RFC | |||
Go figure. There is no | // editor: Please remove this note before publication.) | |||
// separate author entry as in the media | ||||
types registry. (RFC | ||||
// editor: Please remove this note before | ||||
publication.) | ||||
10. Security Considerations | 10. Security Considerations | |||
A network-facing application can exhibit vulnerabilities in its | A network-facing application can exhibit vulnerabilities in its | |||
processing logic for incoming data. Complex parsers are well known | processing logic for incoming data. Complex parsers are well known | |||
as a likely source of such vulnerabilities, such as the ability to | as a likely source of such vulnerabilities, such as the ability to | |||
remotely crash a node, or even remotely execute arbitrary code on it. | remotely crash a node, or even remotely execute arbitrary code on it. | |||
CBOR attempts to narrow the opportunities for introducing such | CBOR attempts to narrow the opportunities for introducing such | |||
vulnerabilities by reducing parser complexity, by giving the entire | vulnerabilities by reducing parser complexity, by giving the entire | |||
range of encodable values a meaning where possible. | range of encodable values a meaning where possible. | |||
skipping to change at page 51, line 15 ¶ | skipping to change at page 52, line 42 ¶ | |||
input is in alignment with the application protocol that is | input is in alignment with the application protocol that is | |||
serialized in CBOR. | serialized in CBOR. | |||
The input check itself may consume resources. This is usually linear | The input check itself may consume resources. This is usually linear | |||
in the size of the input, which means that an attacker has to spend | in the size of the input, which means that an attacker has to spend | |||
resources that are commensurate to the resources spent by the | resources that are commensurate to the resources spent by the | |||
defender on input validation. Processing for arbitrary-precision | defender on input validation. Processing for arbitrary-precision | |||
numbers may exceed linear effort. Also, some hash-table | numbers may exceed linear effort. Also, some hash-table | |||
implementations that are used by decoders to build in-memory | implementations that are used by decoders to build in-memory | |||
representations of maps can be attacked to spend quadratic effort, | representations of maps can be attacked to spend quadratic effort, | |||
unless a secret key is employed (see Section 7 of [SIPHASH]). Such | unless a secret key (see Section 7 of [SIPHASH]) or some other | |||
superlinear efforts can be employed by an attacker to exhaust | mitigation is employed. Such superlinear efforts can be exploited by | |||
resources at or before the input validator; they therefore need to be | an attacker to exhaust resources at or before the input validator; | |||
avoided in a CBOR decoder implementation. Note that tag number | they therefore need to be avoided in a CBOR decoder implementation. | |||
definitions and their implementations can add security considerations | Note that tag number definitions and their implementations can add | |||
of this kind; this should then be discussed in the security | security considerations of this kind; this should then be discussed | |||
considerations of the tag number definition. | in the security considerations of the tag number definition. | |||
CBOR encoders do not receive input directly from the network and are | CBOR encoders do not receive input directly from the network and are | |||
thus not directly attackable in the same way as CBOR decoders. | thus not directly attackable in the same way as CBOR decoders. | |||
However, CBOR encoders often have an API that takes input from | However, CBOR encoders often have an API that takes input from | |||
another level in the implementation and can be attacked through that | another level in the implementation and can be attacked through that | |||
API. The design and implementation of that API should assume the | API. The design and implementation of that API should assume the | |||
behavior of its caller may be based on hostile input or on coding | behavior of its caller may be based on hostile input or on coding | |||
mistakes. It should check inputs for buffer overruns, overflow and | mistakes. It should check inputs for buffer overruns, overflow and | |||
underflow of integer arithmetic, and other such errors that are aimed | underflow of integer arithmetic, and other such errors that are aimed | |||
to disrupt the encoder. | to disrupt the encoder. | |||
skipping to change at page 53, line 26 ¶ | skipping to change at page 55, line 8 ¶ | |||
[ASN.1] International Telecommunication Union, "Information | [ASN.1] International Telecommunication Union, "Information | |||
Technology -- ASN.1 encoding rules: Specification of Basic | Technology -- ASN.1 encoding rules: Specification of Basic | |||
Encoding Rules (BER), Canonical Encoding Rules (CER) and | Encoding Rules (BER), Canonical Encoding Rules (CER) and | |||
Distinguished Encoding Rules (DER)", ITU-T Recommendation | Distinguished Encoding Rules (DER)", ITU-T Recommendation | |||
X.690, 1994. | X.690, 1994. | |||
[BSON] Various, "BSON - Binary JSON", 2013, | [BSON] Various, "BSON - Binary JSON", 2013, | |||
<http://bsonspec.org/>. | <http://bsonspec.org/>. | |||
[I-D.bormann-cbor-notable-tags] | ||||
Bormann, C., "Notable CBOR Tags", Work in Progress, | ||||
Internet-Draft, draft-bormann-cbor-notable-tags-01, 15 May | ||||
2020, <http://www.ietf.org/internet-drafts/draft-bormann- | ||||
cbor-notable-tags-01.txt>. | ||||
[IANA.cbor-simple-values] | [IANA.cbor-simple-values] | |||
IANA, "Concise Binary Object Representation (CBOR) Simple | IANA, "Concise Binary Object Representation (CBOR) Simple | |||
Values", | Values", | |||
<http://www.iana.org/assignments/cbor-simple-values>. | <http://www.iana.org/assignments/cbor-simple-values>. | |||
[IANA.cbor-tags] | [IANA.cbor-tags] | |||
IANA, "Concise Binary Object Representation (CBOR) Tags", | IANA, "Concise Binary Object Representation (CBOR) Tags", | |||
<http://www.iana.org/assignments/cbor-tags>. | <http://www.iana.org/assignments/cbor-tags>. | |||
[IANA.core-parameters] | [IANA.core-parameters] | |||
skipping to change at page 55, line 10 ¶ | skipping to change at page 56, line 47 ¶ | |||
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) | [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) | |||
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, | Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, | |||
<https://www.rfc-editor.org/info/rfc8742>. | <https://www.rfc-editor.org/info/rfc8742>. | |||
[RFC8746] Bormann, C., Ed., "Concise Binary Object Representation | [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation | |||
(CBOR) Tags for Typed Arrays", RFC 8746, | (CBOR) Tags for Typed Arrays", RFC 8746, | |||
DOI 10.17487/RFC8746, February 2020, | DOI 10.17487/RFC8746, February 2020, | |||
<https://www.rfc-editor.org/info/rfc8746>. | <https://www.rfc-editor.org/info/rfc8746>. | |||
[rfc8746] Bormann, C., Ed., "Concise Binary Object Representation | ||||
(CBOR) Tags for Typed Arrays", RFC 8746, | ||||
DOI 10.17487/RFC8746, February 2020, | ||||
<https://www.rfc-editor.org/info/rfc8746>. | ||||
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- | [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- | |||
Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture | Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture | |||
Notes in Computer Science pp. 489-508, 2012, | Notes in Computer Science pp. 489-508, 2012, | |||
<https://doi.org/10.1007/978-3-642-34931-7_28>. | <https://doi.org/10.1007/978-3-642-34931-7_28>. | |||
[YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup | [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup | |||
Language (YAML[TM]) Version 1.2", 3rd Edition, October | Language (YAML[TM]) Version 1.2", 3rd Edition, October | |||
2009, <http://www.yaml.org/spec/1.2/spec.html>. | 2009, <http://www.yaml.org/spec/1.2/spec.html>. | |||
Appendix A. Examples | Appendix A. Examples | |||
skipping to change at page 63, line 11 ¶ | skipping to change at page 64, line 43 ¶ | |||
* uint() converts a byte string into an unsigned integer by | * uint() converts a byte string into an unsigned integer by | |||
interpreting the byte string in network byte order. | interpreting the byte string in network byte order. | |||
* Arithmetic works as in C. | * Arithmetic works as in C. | |||
* All variables are unsigned integers of sufficient range. | * All variables are unsigned integers of sufficient range. | |||
Note that "well_formed" returns the major type for well-formed | Note that "well_formed" returns the major type for well-formed | |||
definite length items, but 0 for an indefinite length item (or -1 for | definite length items, but 0 for an indefinite length item (or -1 for | |||
a break stop code, only if "breakable" is set). This is used in | a "break" stop code, only if "breakable" is set). This is used in | |||
"well_formed_indefinite" to ascertain that indefinite length strings | "well_formed_indefinite" to ascertain that indefinite length strings | |||
only contain definite length strings as chunks. | only contain definite length strings as chunks. | |||
well_formed (breakable = false) { | well_formed (breakable = false) { | |||
// process initial bytes | // process initial bytes | |||
ib = uint(take(1)); | ib = uint(take(1)); | |||
mt = ib >> 5; | mt = ib >> 5; | |||
val = ai = ib & 0x1f; | val = ai = ib & 0x1f; | |||
switch (ai) { | switch (ai) { | |||
case 24: val = uint(take(1)); break; | case 24: val = uint(take(1)); break; | |||
skipping to change at page 67, line 36 ¶ | skipping to change at page 68, line 39 ¶ | |||
E.1. ASN.1 DER, BER, and PER | E.1. ASN.1 DER, BER, and PER | |||
[ASN.1] has many serializations. In the IETF, DER and BER are the | [ASN.1] has many serializations. In the IETF, DER and BER are the | |||
most common. The serialized output is not particularly compact for | most common. The serialized output is not particularly compact for | |||
many items, and the code needed to decode numeric items can be | many items, and the code needed to decode numeric items can be | |||
complex on a constrained device. | complex on a constrained device. | |||
Few (if any) IETF protocols have adopted one of the several variants | Few (if any) IETF protocols have adopted one of the several variants | |||
of Packed Encoding Rules (PER). There could be many reasons for | of Packed Encoding Rules (PER). There could be many reasons for | |||
this, but one that is commonly stated is that PER makes use of the | this, but one that is commonly stated is that PER makes use of the | |||
schema even for parsing the surface structure of the data stream, | schema even for parsing the surface structure of the data item, | |||
requiring significant tool support. There are different versions of | requiring significant tool support. There are different versions of | |||
the ASN.1 schema language in use, which has also hampered adoption. | the ASN.1 schema language in use, which has also hampered adoption. | |||
E.2. MessagePack | E.2. MessagePack | |||
[MessagePack] is a concise, widely implemented counted binary | [MessagePack] is a concise, widely implemented counted binary | |||
serialization format, similar in many properties to CBOR, although | serialization format, similar in many properties to CBOR, although | |||
somewhat less regular. While the data model can be used to represent | somewhat less regular. While the data model can be used to represent | |||
JSON data, MessagePack has also been used in many remote procedure | JSON data, MessagePack has also been used in many remote procedure | |||
call (RPC) applications and for long-term storage of data. | call (RPC) applications and for long-term storage of data. | |||
skipping to change at page 69, line 27 ¶ | skipping to change at page 70, line 27 ¶ | |||
| | 00 00 04 31 00 13 00 00 00 | | | | | 00 00 04 31 00 13 00 00 00 | | | |||
| | 10 30 00 02 00 00 00 10 31 | | | | | 10 30 00 02 00 00 00 10 31 | | | |||
| | 00 03 00 00 00 00 00 | | | | | 00 03 00 00 00 00 00 | | | |||
+-------------+----------------------------+----------------+ | +-------------+----------------------------+----------------+ | |||
| CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | | |||
| | | ff | | | | | ff | | |||
+-------------+----------------------------+----------------+ | +-------------+----------------------------+----------------+ | |||
Table 8: Examples for Different Levels of Conciseness | Table 8: Examples for Different Levels of Conciseness | |||
Appendix F. Changes from RFC 7049 | Appendix F. Well-formedness errors and examples | |||
The following is a list of known changes from RFC 7049. This list is | ||||
non-authoritative. It is meant to help reviewers see the significant | ||||
differences. | ||||
* Made some use of new RFCXML functionality [RFC7991] | ||||
* Updated references, e.g. for [RFC4627] to [RFC8259] in many | ||||
places, for [CNN-TERMS] to [RFC7228]; added missing reference to | ||||
[IEEE754] and updated to [ECMA262] | ||||
* Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and | ||||
in the last paragraph of Section 3.6 ("0b000_11101" -> | ||||
"0b000_11001") | ||||
* Added a comment to the last example in Section 3.2.2 (added | ||||
"Second value") | ||||
* Applied numerous small editorial changes | ||||
* Added a few tables for illustration | ||||
* More stringently used terminology for well-formed and valid data, | ||||
avoiding less well-defined alternative terms such as "syntax | ||||
error", "decoding error" and "strict mode" outside examples | ||||
* Streamlined terminology to talk about tags, tag numbers, and tag | ||||
content | ||||
* Clarified the restrictions on tag content, in general and | ||||
specifically for tag 1 | ||||
* Added text about the CBOR data model and its small variations | ||||
(basic generic, extended generic, specific) | ||||
* More clearly separated integers from floating-point values; | ||||
provided a suggestion (based on I-JSON [RFC7493]) for handling | ||||
these types when converting JSON to CBOR | ||||
* Added term "preferred serialization" and defined it for various | ||||
kinds of data items | ||||
* Added comment about tags with semantics that depend on | ||||
serialization order | ||||
* Defined "deterministic encoding", making use of "preferred | ||||
serialization", and simplified the suggested map ordering for the | ||||
"Core Deterministic Encoding Requirements", easing implementation, | ||||
while keeping RFC 7049 map ordering as an alternative "length- | ||||
first map key ordering"; now avoiding the terms "canonical" and | ||||
"canonicalization" | ||||
* Clarified map validity (handling of duplicate keys) and explained | ||||
the domain of applicability of certain implementation choices | ||||
* Updated IANA considerations | ||||
* Added security considerations | ||||
* Clarified handling of non-well-formed simple values in text and | ||||
pseudocode | ||||
* Added Appendix G, well-formedness errors and examples | ||||
* Removed UBJSON from Appendix E, as that format has completely | ||||
changed since RFC 7049; added reference to [RFC8618] | ||||
Appendix G. Well-formedness errors and examples | ||||
There are three basic kinds of well-formedness errors that can occur | There are three basic kinds of well-formedness errors that can occur | |||
in decoding a CBOR data item: | in decoding a CBOR data item: | |||
* Too much data: There are input bytes left that were not consumed. | * Too much data: There are input bytes left that were not consumed. | |||
This is only an error if the application assumed that the input | This is only an error if the application assumed that the input | |||
bytes would span exactly one data item. Where the application | bytes would span exactly one data item. Where the application | |||
uses the self-delimiting nature of CBOR encoding to permit | uses the self-delimiting nature of CBOR encoding to permit | |||
additional data after the data item, as is for example done in | additional data after the data item, as is for example done in | |||
CBOR sequences [RFC8742], the CBOR decoder can simply indicate | CBOR sequences [RFC8742], the CBOR decoder can simply indicate | |||
skipping to change at page 71, line 40 ¶ | skipping to change at page 71, line 24 ¶ | |||
calling fail(), in order: | calling fail(), in order: | |||
* a reserved value is used for additional information (28, 29, 30) | * a reserved value is used for additional information (28, 29, 30) | |||
* major type 7, additional information 24, value < 32 (incorrect or | * major type 7, additional information 24, value < 32 (incorrect or | |||
incorrectly encoded simple type) | incorrectly encoded simple type) | |||
* incorrect substructure of indefinite length byte/text string (may | * incorrect substructure of indefinite length byte/text string (may | |||
only contain definite length strings of the same major type) | only contain definite length strings of the same major type) | |||
* break stop code (mt=7, ai=31) occurs in a value position of a map | * "break" stop code (mt=7, ai=31) occurs in a value position of a | |||
or except at a position directly in an indefinite length item | map or except at a position directly in an indefinite length item | |||
where also another enclosed data item could occur | where also another enclosed data item could occur | |||
* additional information 31 used with major type 0, 1, or 6 | * additional information 31 used with major type 0, 1, or 6 | |||
G.1. Examples for CBOR data items that are not well-formed | F.1. Examples for CBOR data items that are not well-formed | |||
This subsection shows a few examples for CBOR data items that are not | This subsection shows a few examples for CBOR data items that are not | |||
well-formed. Each example is a sequence of bytes each shown in | well-formed. Each example is a sequence of bytes each shown in | |||
hexadecimal; multiple examples in a list are separated by commas. | hexadecimal; multiple examples in a list are separated by commas. | |||
Examples for well-formedness error kind 1 (too much data) can easily | Examples for well-formedness error kind 1 (too much data) can easily | |||
be formed by adding data to a well-formed encoded CBOR data item. | be formed by adding data to a well-formed encoded CBOR data item. | |||
Similarly, examples for well-formedness error kind 2 (too little | Similarly, examples for well-formedness error kind 2 (too little | |||
data) can be formed by truncating a well-formed encoded CBOR data | data) can be formed by truncating a well-formed encoded CBOR data | |||
item. In test suites, it may be beneficial to specifically test with | item. In test suites, it may be beneficial to specifically test with | |||
incomplete data items that would require large amounts of addition to | incomplete data items that would require large amounts of addition to | |||
be completed (for instance by starting the encoding of a string of a | be completed (for instance by starting the encoding of a string of a | |||
very large size). | very large size). | |||
A premature end of the input can occur in a head or within the | A premature end of the input can occur in a head or within the | |||
enclosed data, which may be bare strings or enclosed data items that | enclosed data, which may be bare strings or enclosed data items that | |||
are either counted or should have been ended by a break stop code. | are either counted or should have been ended by a "break" stop code. | |||
* End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 | * End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 | |||
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa | 03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa | |||
00 00, fb 00 00 00 | 00 00, fb 00 00 00 | |||
* Definite length strings with short data: 41, 61, 5a ff ff ff ff | * Definite length strings with short data: 41, 61, 5a ff ff ff ff | |||
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f | 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f | |||
ff ff ff ff ff ff ff 01 02 03 | ff ff ff ff ff ff ff 01 02 03 | |||
* Definite length maps and arrays not closed with enough items: 81, | * Definite length maps and arrays not closed with enough items: 81, | |||
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 | 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 | |||
00 | 00 | |||
* Tag number not followed by tag content: c0 | * Tag number not followed by tag content: c0 | |||
* Indefinite length strings not closed by a break stop code: 5f 41 | * Indefinite length strings not closed by a "break" stop code: 5f 41 | |||
00, 7f 61 00 | 00, 7f 61 00 | |||
* Indefinite length maps and arrays not closed by a break stop code: | * Indefinite length maps and arrays not closed by a "break" stop | |||
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f | code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f | |||
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff | 9f 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff | |||
A few examples for the five subkinds of well-formedness error kind 3 | A few examples for the five subkinds of well-formedness error kind 3 | |||
(syntax error) are shown below. | (syntax error) are shown below. | |||
Subkind 1: | Subkind 1: | |||
* Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, | * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, | |||
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, | 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, | |||
fd, fe, | fd, fe, | |||
skipping to change at page 73, line 30 ¶ | skipping to change at page 73, line 12 ¶ | |||
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 | 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 | |||
9f 81 9f 9f ff ff ff ff | 9f 81 9f 9f ff ff ff ff | |||
* Break in indefinite length map would lead to odd number of items | * Break in indefinite length map would lead to odd number of items | |||
(break in a value position): bf 00 ff, bf 00 00 00 ff | (break in a value position): bf 00 ff, bf 00 00 00 ff | |||
Subkind 5: | Subkind 5: | |||
* Major type 0, 1, 6 with additional information 31: 1f, 3f, df | * Major type 0, 1, 6 with additional information 31: 1f, 3f, df | |||
Appendix G. Changes from RFC 7049 | ||||
As discussed in the introduction, this document is a revised edition | ||||
of RFC 7049, with editorial improvements, added detail, and fixed | ||||
errata. This document formally obsoletes RFC 7049, while keeping | ||||
full compatibility of the interchange format from RFC 7049. This | ||||
document does not create a new version of the format. | ||||
G.1. Errata processing, clerical changes | ||||
The two verified errata on RFC 7049, EID 3764 and EID 3770, concerned | ||||
two encoding examples in the text that have been corrected | ||||
(Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" -> | ||||
"0b000_11001"). Also, RFC 7049 contained an example using the simple | ||||
type value 24 (EID 5917), which is not well-formed; this example has | ||||
been removed. Errata report 5763 pointed to an accident in the | ||||
wording of the definition of tags; this was resolved during a re- | ||||
write of Section 3.4. Errata report 5434 pointed out that the UBJSON | ||||
example in Appendix E no longer complied with the version of UBJSON | ||||
current at the time of submitting the report. It turned out that the | ||||
UBJSON specification had completely changed since 2013; this example | ||||
therefore also was removed. Further errata reports (4409, 4963, | ||||
4964) complained that the map key sorting rules for canonical | ||||
encoding were onerous; these led to a reconsideration of the | ||||
canonical encoding suggestions and replacement by the deterministic | ||||
encoding suggestions (described below). An editorial suggestion in | ||||
errata report 4294 was also implemented (improved symmetry by adding | ||||
"Second value" to a comment to the last example in Section 3.2.2). | ||||
Other more clerical changes include: | ||||
* use of new RFCXML functionality [RFC7991]; | ||||
* explain some more of the notation used; | ||||
* updated references, e.g. for RFC4627 to [RFC8259] in many places, | ||||
for CNN-TERMS to [RFC7228]; added missing reference to [IEEE754] | ||||
(importing required definitions) and updated to [ECMA262]; added a | ||||
reference to [RFC8618] that further illustrates the discussion in | ||||
Appendix E; | ||||
* the discussion of diagnostic notation mentions the "Extended | ||||
Diagnostic Notation" (EDN) defined in [RFC8610]; | ||||
* the addition of this appendix. | ||||
G.2. Changes in IANA considerations | ||||
The IANA considerations were generally updated (clerical changes, | ||||
e.g., now pointing to the CBOR working group as the author of the | ||||
specification). References to the respective IANA registries have | ||||
been added to the informative references. | ||||
Tags in the space from 256 to 32767 (lower half of "1+2") are no | ||||
longer assigned by First Come First Served; this range is now | ||||
Specification Required. | ||||
G.3. Changes in suggestions and other informational components | ||||
In revising the document, beyond processing errata reports, the WG | ||||
could use nearly seven years of experience with the use of CBOR in a | ||||
diverse set of applications. This led to a number of editorial | ||||
changes, including adding tables for illustration, but also to | ||||
emphasizing some aspects and de-emphasizing others. | ||||
A significant addition in this revision is Section 2, which discusses | ||||
the CBOR data model and its small variations involved in the | ||||
processing of CBOR. Introducing terms for those (basic generic, | ||||
extended generic, specific) enables more concise language in other | ||||
places of the document, but also helps in clarifying expectations on | ||||
implementations and on the extensibility features of the format. | ||||
RFC 7049, as a format derived from the JSON ecosystem, was influenced | ||||
by the JSON number system that was in turn inherited from JavaScript | ||||
at the time. JSON does not provide distinct integers and floating | ||||
point values (and the latter are decimal in the format). CBOR | ||||
provides binary representations of numbers, which do differ between | ||||
integers and floating point values. Experience from implementation | ||||
and use now suggested that the separation between these two number | ||||
domains should be more clearly drawn in the document; language that | ||||
suggested an integer could seamlessly stand in for a floating point | ||||
value was removed. Also, a suggestion (based on I-JSON [RFC7493]) | ||||
was added for handling these types when converting JSON to CBOR. | ||||
For a single value in the data model, CBOR often provides multiple | ||||
encoding options. The revision adds a new section Section 4, which | ||||
first introduces the term "preferred serialization" (Section 4.1) and | ||||
defines it for various kinds of data items. On the basis of this | ||||
terminology, the section goes on to discuss how a CBOR-based protocol | ||||
can define "deterministic encoding" (Section 4.2), which now avoids | ||||
the RFC 7049 terms "canonical" and "canonicalization". The | ||||
suggestion of "Core Deterministic Encoding Requirements" | ||||
Section 4.2.1 enables generic support for such protocol-defined | ||||
encoding requirements. The present revision further eases the | ||||
implementation of deterministic encoding by simplifying the map | ||||
ordering suggested in RFC 7049 to simple lexicographic ordering of | ||||
encoded keys. A description of the older suggestion is kept as an | ||||
alternative, now termed "length-first map key ordering" | ||||
(Section 4.2.3). | ||||
The terminology for well-formed and valid data was sharpened and more | ||||
stringently used, avoiding less well-defined alternative terms such | ||||
as "syntax error", "decoding error" and "strict mode" outside | ||||
examples. Also, a third level of requirements beyond CBOR-level | ||||
validity that an application has on its input data is now explicitly | ||||
called out. Well-formed (processable at all), valid (checked by a | ||||
validity-checking generic decoder), and expected input (as checked by | ||||
the application) are treated as a hierarchy of layers of | ||||
acceptability. | ||||
The handling of non-well-formed simple values was clarified in text | ||||
and pseudocode. Appendix F was added to discuss well-formedness | ||||
errors and provide examples for them. | ||||
The discussion of validity has been sharpened in two areas. Map | ||||
validity (handling of duplicate keys) was clarified and the domain of | ||||
applicability of certain implementation choices explained. Also, | ||||
while streamlining the terminology for tags, tag numbers, and tag | ||||
content, discussion was added on tag validity, and the restrictions | ||||
pwere clarified on tag content, in general and specifically for tag | ||||
1. | ||||
An implementation note (and note for future tag definitions) was | ||||
added to Section 3.4 about defining tags with semantics that depend | ||||
on serialization order. | ||||
Terminology was introduced in Section 3 for "argument" and "head", | ||||
simplifying further discussion. | ||||
The security considerations were mostly rewritten and significantly | ||||
expanded; in multiple other places, the document is now more explicit | ||||
that a decoder cannot simply condone well-formedness errors. | ||||
Acknowledgements | Acknowledgements | |||
CBOR was inspired by MessagePack. MessagePack was developed and | CBOR was inspired by MessagePack. MessagePack was developed and | |||
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | promoted by Sadayuki Furuhashi ("frsyuki"). This reference to | |||
MessagePack is solely for attribution; CBOR is not intended as a | MessagePack is solely for attribution; CBOR is not intended as a | |||
version of or replacement for MessagePack, as it has different design | version of or replacement for MessagePack, as it has different design | |||
goals and requirements. | goals and requirements. | |||
The need for functionality beyond the original MessagePack | The need for functionality beyond the original MessagePack | |||
Specification became obvious to many people at about the same time | Specification became obvious to many people at about the same time | |||
End of changes. 51 change blocks. | ||||
258 lines changed or deleted | 349 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |