draft-ietf-cbor-7049bis-13.txt   draft-ietf-cbor-7049bis-14.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI Internet-Draft Universitaet Bremen TZI
Obsoletes: 7049 (if approved) P. Hoffman Obsoletes: 7049 (if approved) P. Hoffman
Intended status: Standards Track ICANN Intended status: Standards Track ICANN
Expires: 9 September 2020 8 March 2020 Expires: 19 December 2020 17 June 2020
Concise Binary Object Representation (CBOR) Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-13 draft-ietf-cbor-7049bis-14
Abstract Abstract
The Concise Binary Object Representation (CBOR) is a data format The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack. earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial This document is a revised edition of RFC 7049, with editorial
improvements, added detail, and fixed errata. This revision formally improvements, added detail, and fixed errata. This revision formally
obsoletes RFC 7049, while keeping full compatibility of the obsoletes RFC 7049, while keeping full compatibility of the
interchange format from RFC 7049. It does not create a new version interchange format from RFC 7049. It does not create a new version
of the format. of the format.
Contributing Contributing
This note is to be removed before publishing as an RFC.
This document is being worked on in the CBOR Working Group. Please This document is being worked on in the CBOR Working Group. Please
contribute on the mailing list there, or in the GitHub repository for contribute on the mailing list there, or in the GitHub repository for
this draft: https://github.com/cbor-wg/CBORbis this draft: https://github.com/cbor-wg/CBORbis
The charter for the CBOR Working Group says that the WG will update The charter for the CBOR Working Group says that the WG will update
RFC 7049 to fix verified errata. Security issues and clarifications RFC 7049 to fix verified errata. Security issues and clarifications
may be addressed, but changes to this document will ensure backward may be addressed, but changes to this document will ensure backward
compatibility for popular deployed codebases. This document will be compatibility for popular deployed codebases. This document will be
targeted at becoming an Internet Standard. targeted at becoming an Internet Standard.
[RFC editor: please remove this note.]
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 9 September 2020. This Internet-Draft will expire on 19 December 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 8
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 9
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 14
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 14
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16
3.2.4. Summary of indefinite-length use of major types . . . 17 3.2.4. Summary of indefinite-length use of major types . . . 17
3.3. Floating-Point Numbers and Values with No Content . . . . 17 3.3. Floating-Point Numbers and Values with No Content . . . . 18
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19
3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22 3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22
3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 22 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 23
3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24 3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24
3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 25 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 26
3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 25 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 26
3.4.5.2. Expected Later Encoding for CBOR-to-JSON 3.4.5.2. Expected Later Encoding for CBOR-to-JSON
Converters . . . . . . . . . . . . . . . . . . . . 25 Converters . . . . . . . . . . . . . . . . . . . . 26
3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 26 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 27
3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 27 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 28
4. Serialization Considerations . . . . . . . . . . . . . . . . 28 4. Serialization Considerations . . . . . . . . . . . . . . . . 29
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 28 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 29
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 29 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 30
4.2.1. Core Deterministic Encoding Requirements . . . . . . 29 4.2.1. Core Deterministic Encoding Requirements . . . . . . 30
4.2.2. Additional Deterministic Encoding Considerations . . 30 4.2.2. Additional Deterministic Encoding Considerations . . 31
4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 32 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 33
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 33 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 34
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 33 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 35
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 34 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 35
5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 35 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 36
5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 35 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 36
5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 35 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 37
5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 36 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 37
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 38 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 39
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 39 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 41
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 40 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 42
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 40 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 42
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 41 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 42
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 42 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 43
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 43 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 44
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 43 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 45
7.2. Curating the Additional Information Space . . . . . . . . 44 7.2. Curating the Additional Information Space . . . . . . . . 46
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 45 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 46
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 46 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 47
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 47 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 48
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 47 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 48
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 47 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 49
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 48 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 50
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 49 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 50
10. Security Considerations . . . . . . . . . . . . . . . . . . . 50 10. Security Considerations . . . . . . . . . . . . . . . . . . . 51
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 52 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 53
11.1. Normative References . . . . . . . . . . . . . . . . . . 52 11.1. Normative References . . . . . . . . . . . . . . . . . . 53
11.2. Informative References . . . . . . . . . . . . . . . . . 53 11.2. Informative References . . . . . . . . . . . . . . . . . 54
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 55 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 57
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 59 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 61
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 62 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 64
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 65 Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 66
Appendix E. Comparison of Other Binary Formats to CBOR's Design Appendix E. Comparison of Other Binary Formats to CBOR's Design
Objectives . . . . . . . . . . . . . . . . . . . . . . . 66 Objectives . . . . . . . . . . . . . . . . . . . . . . . 67
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 67 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 68
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 67 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 68
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 68 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 69
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 68 E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 69
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 68 E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 69
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 69 Appendix F. Well-formedness errors and examples . . . . . . . . 70
Appendix G. Well-formedness errors and examples . . . . . . . . 70 F.1. Examples for CBOR data items that are not well-formed . . 71
G.1. Examples for CBOR data items that are not well-formed . . 71
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 73 Appendix G. Changes from RFC 7049 . . . . . . . . . . . . . . . 73
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74 G.1. Errata processing, clerical changes . . . . . . . . . . . 73
G.2. Changes in IANA considerations . . . . . . . . . . . . . 74
G.3. Changes in suggestions and other informational
components . . . . . . . . . . . . . . . . . . . . . . . 74
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 76
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76
1. Introduction 1. Introduction
There are hundreds of standardized formats for binary representation There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER [ASN.1]. formats in the latter category are ASN.1's BER and DER [ASN.1].
The format defined here follows some specific design goals that are The format defined here follows some specific design goals that are
skipping to change at page 6, line 44 skipping to change at page 7, line 4
Decoder: A process that decodes a well-formed encoded CBOR data item Decoder: A process that decodes a well-formed encoded CBOR data item
and makes it available to an application. Formally speaking, a and makes it available to an application. Formally speaking, a
decoder contains a parser to break up the input using the syntax decoder contains a parser to break up the input using the syntax
rules of CBOR, as well as a semantic processor to prepare the data rules of CBOR, as well as a semantic processor to prepare the data
in a form suitable to the application. in a form suitable to the application.
Encoder: A process that generates the (well-formed) representation Encoder: A process that generates the (well-formed) representation
format of a CBOR data item from application information. format of a CBOR data item from application information.
Data Stream: A sequence of zero or more data items, not further Data Stream: A sequence of zero or more data items, not further
assembled into a larger containing data item. The independent assembled into a larger containing data item (see [RFC8742] for
data items that make up a data stream are sometimes also referred one application). The independent data items that make up a data
to as "top-level data items". stream are sometimes also referred to as "top-level data items".
Well-formed: A data item that follows the syntactic structure of Well-formed: A data item that follows the syntactic structure of
CBOR. A well-formed data item uses the initial bytes and the byte CBOR. A well-formed data item uses the initial bytes and the byte
strings and/or data items that are implied by their values as strings and/or data items that are implied by their values as
defined in CBOR and does not include following extraneous data. defined in CBOR and does not include following extraneous data.
CBOR decoders by definition only return contents from well-formed CBOR decoders by definition only return contents from well-formed
data items. data items.
Valid: A data item that is well-formed and also follows the semantic Valid: A data item that is well-formed and also follows the semantic
restrictions that apply to CBOR data items (Section 5.3). restrictions that apply to CBOR data items (Section 5.3).
Expected: Besides its normal English meaning, the term "expected" is Expected: Besides its normal English meaning, the term "expected" is
used to describe requirements beyond CBOR validity that an used to describe requirements beyond CBOR validity that an
application has on its input data. Well-formed (processable at application has on its input data. Well-formed (processable at
all), valid (checked by a validity-checking generic decoder), and all), valid (checked by a validity-checking generic decoder), and
skipping to change at page 8, line 40 skipping to change at page 9, line 5
* a mapping (mathematical function) from zero or more data items * a mapping (mathematical function) from zero or more data items
("keys") each to a data item ("values"), ("map") ("keys") each to a data item ("values"), ("map")
* a tagged data item ("tag"), comprising a tag number (an integer in * a tagged data item ("tag"), comprising a tag number (an integer in
the range 0..2**64-1) and the tag content (a data item) the range 0..2**64-1) and the tag content (a data item)
Note that integer and floating-point values are distinct in this Note that integer and floating-point values are distinct in this
model, even if they have the same numeric value. model, even if they have the same numeric value.
Also note that serialization variants, such as the number of bytes of Also note that serialization variants are not visible at the generic
the encoded floating-point value, or the choice of one of the ways in data model level, including the number of bytes of the encoded
which an integer, the length of a text or byte string, the number of floating-point value or the choice of one of the ways in which an
elements in an array or pairs in a map, or a tag number, integer, the length of a text or byte string, the number of elements
(collectively "the argument", see Section 3) can be encoded, are not in an array or pairs in a map, or a tag number, (collectively "the
visible at the generic data model level. argument", see Section 3) can be encoded.
2.1. Extended Generic Data Models 2.1. Extended Generic Data Models
This basic generic data model comes pre-extended by the registration This basic generic data model comes pre-extended by the registration
of a number of simple values and tag numbers right in this document, of a number of simple values and tag numbers right in this document,
such as: such as:
* "false", "true", "null", and "undefined" (simple values identified * "false", "true", "null", and "undefined" (simple values identified
by 20..23) by 20..23)
skipping to change at page 11, line 51 skipping to change at page 12, line 15
5 would have an initial byte of 0b010_00101 (major type 2, 5 would have an initial byte of 0b010_00101 (major type 2,
additional information 5 for the length), followed by 5 bytes of additional information 5 for the length), followed by 5 bytes of
binary content. A byte string whose length is 500 would have 3 binary content. A byte string whose length is 500 would have 3
initial bytes of 0b010_11001 (major type 2, additional information initial bytes of 0b010_11001 (major type 2, additional information
25 to indicate a two-byte length) followed by the two bytes 0x01f4 25 to indicate a two-byte length) followed by the two bytes 0x01f4
for a length of 500, followed by 500 bytes of binary content. for a length of 500, followed by 500 bytes of binary content.
Major type 3: a text string (Section 2), encoded as UTF-8 Major type 3: a text string (Section 2), encoded as UTF-8
([RFC3629]). The number of bytes in the string is equal to the ([RFC3629]). The number of bytes in the string is equal to the
argument. A string containing an invalid UTF-8 sequence is well- argument. A string containing an invalid UTF-8 sequence is well-
formed but invalid. This type is provided for systems that need formed but invalid (Section 1.2). This type is provided for
to interpret or display human-readable text, and allows the systems that need to interpret or display human-readable text, and
differentiation between unstructured bytes and text that has a allows the differentiation between unstructured bytes and text
specified repertoire and encoding. In contrast to formats such as that has a specified repertoire and encoding. In contrast to
JSON, the Unicode characters in this type are never escaped. formats such as JSON, the Unicode characters in this type are
Thus, a newline character (U+000A) is always represented in a never escaped. Thus, a newline character (U+000A) is always
string as the byte 0x0a, and never as the bytes 0x5c6e (the represented in a string as the byte 0x0a, and never as the bytes
characters "\" and "n") or as 0x5c7530303061 (the characters "\", 0x5c6e (the characters "\" and "n") or as 0x5c7530303061 (the
"u", "0", "0", "0", and "a"). characters "\", "u", "0", "0", "0", and "a").
Major type 4: an array of data items. In other formats, arrays are Major type 4: an array of data items. In other formats, arrays are
also called lists, sequences, or tuples (a "CBOR sequence" is also called lists, sequences, or tuples (a "CBOR sequence" is
something slightly different, though [RFC8742]). The argument is something slightly different, though [RFC8742]). The argument is
the number of data items in the array. Items in an array do not the number of data items in the array. Items in an array do not
need to all be of the same type. For example, an array that need to all be of the same type. For example, an array that
contains 10 items of any type would have an initial byte of contains 10 items of any type would have an initial byte of
0b100_01010 (major type of 4, additional information of 10 for the 0b100_01010 (major type of 4, additional information of 10 for the
length) followed by the 10 remaining items. length) followed by the 10 remaining items.
skipping to change at page 14, line 21 skipping to change at page 14, line 40
Indefinite-length arrays and maps are represented using their major Indefinite-length arrays and maps are represented using their major
type with the additional information value of 31, followed by an type with the additional information value of 31, followed by an
arbitrary-length sequence of zero or more items for an array or key/ arbitrary-length sequence of zero or more items for an array or key/
value pairs for a map, followed by the "break" stop code value pairs for a map, followed by the "break" stop code
(Section 3.2.1). In other words, indefinite-length arrays and maps (Section 3.2.1). In other words, indefinite-length arrays and maps
look identical to other arrays and maps except for beginning with the look identical to other arrays and maps except for beginning with the
additional information value of 31 and ending with the "break" stop additional information value of 31 and ending with the "break" stop
code. code.
If the break stop code appears after a key in a map, in place of that If the "break" stop code appears after a key in a map, in place of
key's value, the map is not well-formed. that key's value, the map is not well-formed.
There is no restriction against nesting indefinite-length array or There is no restriction against nesting indefinite-length array or
map items. A "break" only terminates a single item, so nested map items. A "break" only terminates a single item, so nested
indefinite-length items need exactly as many "break" stop codes as indefinite-length items need exactly as many "break" stop codes as
there are type bytes starting an indefinite-length item. there are type bytes starting an indefinite-length item.
For example, assume an encoder wants to represent the abstract array For example, assume an encoder wants to represent the abstract array
[1, [2, 3], [4, 5]]. The definite-length encoding would be [1, [2, 3], [4, 5]]. The definite-length encoding would be
0x8301820203820405: 0x8301820203820405:
skipping to change at page 16, line 33 skipping to change at page 16, line 47
respectively, if no chunk is present). (Note that zero-length respectively, if no chunk is present). (Note that zero-length
chunks, while not particularly useful, are permitted.) chunks, while not particularly useful, are permitted.)
If any item between the indefinite-length string indicator If any item between the indefinite-length string indicator
(0b010_11111 or 0b011_11111) and the "break" stop code is not a (0b010_11111 or 0b011_11111) and the "break" stop code is not a
definite-length string item of the same major type, the string is not definite-length string item of the same major type, the string is not
well-formed. well-formed.
If any definite-length text string inside an indefinite-length text If any definite-length text string inside an indefinite-length text
string is invalid, the indefinite-length text string is invalid. string is invalid, the indefinite-length text string is invalid.
Note that this implies that the bytes of a single UTF-8 character Note that this implies that the UTF-8 bytes of a single Unicode code
cannot be split up between chunks: a new chunk of a text string can point (scalar value) cannot be spread between chunks: a new chunk of
only be started at a character boundary. a text string can only be started at a code point boundary.
For example, assume an encoded data item consisting of the bytes: For example, assume an encoded data item consisting of the bytes:
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
5F -- Start indefinite-length byte string 5F -- Start indefinite-length byte string
44 -- Byte string of length 4 44 -- Byte string of length 4
aabbccdd -- Bytes content aabbccdd -- Bytes content
43 -- Byte string of length 3 43 -- Byte string of length 3
eeff99 -- Bytes content eeff99 -- Bytes content
skipping to change at page 19, line 10 skipping to change at page 19, line 30
| 32..255 | (Unassigned) | | 32..255 | (Unassigned) |
+---------+-----------------+ +---------+-----------------+
Table 4: Simple Values Table 4: Simple Values
An encoder MUST NOT issue two-byte sequences that start with 0xf8 An encoder MUST NOT issue two-byte sequences that start with 0xf8
(major type = 7, additional information = 24) and continue with a (major type = 7, additional information = 24) and continue with a
byte less than 0x20 (32 decimal). Such sequences are not well- byte less than 0x20 (32 decimal). Such sequences are not well-
formed. (This implies that an encoder cannot encode false, true, formed. (This implies that an encoder cannot encode false, true,
null, or undefined in two-byte sequences, only the one-byte variants null, or undefined in two-byte sequences, only the one-byte variants
of these are well-formed.) of these are well-formed; more generally speaking, each simple value
only has a single representation variant).
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
IEEE 754 binary floating-point values [IEEE754]. These floating- IEEE 754 binary floating-point values [IEEE754]. These floating-
point values are encoded in the additional bytes of the appropriate point values are encoded in the additional bytes of the appropriate
size. (See Appendix D for some information about 16-bit floating- size. (See Appendix D for some information about 16-bit floating-
point numbers.) point numbers.)
3.4. Tagging of Items 3.4. Tagging of Items
In CBOR, a data item can be enclosed by a tag to give it some In CBOR, a data item can be enclosed by a tag to give it some
skipping to change at page 20, line 6 skipping to change at page 20, line 30
for instance as 0x01, 0x1801, or 0x190001. The tag definition may for instance as 0x01, 0x1801, or 0x190001. The tag definition may
include the definition of a preferred serialization (Section 4.1) include the definition of a preferred serialization (Section 4.1)
that is recommended for generic encoders; this may prefer basic that is recommended for generic encoders; this may prefer basic
generic data model representations over ones that employ a tag. generic data model representations over ones that employ a tag.
The tag definition usually restricts what kinds of nested data item The tag definition usually restricts what kinds of nested data item
or items are valid for such tags. Tag definitions may restrict their or items are valid for such tags. Tag definitions may restrict their
content to a very specific syntactic structure, as the tags defined content to a very specific syntactic structure, as the tags defined
in this document do, or they may aim at a more semantically defined in this document do, or they may aim at a more semantically defined
definition of their content, as for instance tags 40 and 1040 do definition of their content, as for instance tags 40 and 1040 do
[rfc8746]: These accept a number of different ways of representing [RFC8746]: These accept a number of different ways of representing
arrays. arrays.
As a matter of convention, many tags do not accept null or undefined As a matter of convention, many tags do not accept null or undefined
values as tag content; instead, the expectation is that a null or values as tag content; instead, the expectation is that a null or
undefined value can be used in place of the entire tag; Section 3.4.2 undefined value can be used in place of the entire tag; Section 3.4.2
provides some further considerations for one specific tag about the provides some further considerations for one specific tag about the
handling of this convention in application protocols and in mapping handling of this convention in application protocols and in mapping
to platform types. to platform types.
Decoders do not need to understand tags of every tag number, and tags Decoders do not need to understand tags of every tag number, and tags
skipping to change at page 21, line 47 skipping to change at page 22, line 22
| | | Section 3.4.6 | | | | Section 3.4.6 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
Table 5: Tag numbers defined in RFC 7049 Table 5: Tag numbers defined in RFC 7049
Conceptually, tags are interpreted in the generic data model, not at Conceptually, tags are interpreted in the generic data model, not at
(de-)serialization time. A small number of tags (specifically, tag (de-)serialization time. A small number of tags (specifically, tag
number 25 and tag number 29) have been registered with semantics that number 25 and tag number 29) have been registered with semantics that
may require processing at (de-)serialization time: The decoder needs may require processing at (de-)serialization time: The decoder needs
to be aware and the encoder needs to be in control of the exact to be aware and the encoder needs to be in control of the exact
sequence in which data items are encoded into the CBOR data stream. sequence in which data items are encoded into the CBOR data item.
This means these tags cannot be implemented on top of every generic This means these tags cannot be implemented on top of every generic
CBOR encoder/decoder (which might not reflect the serialization order CBOR encoder/decoder (which might not reflect the serialization order
for entries in a map at the data model level and vice versa); their for entries in a map at the data model level and vice versa); their
implementation therefore typically needs to be integrated into the implementation therefore typically needs to be integrated into the
generic encoder/decoder. The definition of new tags with this generic encoder/decoder. The definition of new tags with this
property is NOT RECOMMENDED. property is NOT RECOMMENDED.
IANA allocated tag numbers 65535, 4294967295, and
18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit).
These can be used as a convenience for implementers that want a
single integer to indicate either that a specific tag is present, or
the absence of a tag. That allocation is described in Section 10 of
[I-D.bormann-cbor-notable-tags]. These tags are not intended to
occur in actual CBOR data items; implementations may flag such an
occurrence as an error.
Protocols using tag numbers 0 and 1 extend the generic data model Protocols using tag numbers 0 and 1 extend the generic data model
(Section 2) with data items representing points in time; tag numbers (Section 2) with data items representing points in time; tag numbers
2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, 2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5,
with floating-point values of arbitrary size and precision. with floating-point values of arbitrary size and precision.
3.4.1. Standard Date/Time String 3.4.1. Standard Date/Time String
Tag number 0 contains a text string in the standard format described Tag number 0 contains a text string in the standard format described
by the "date-time" production in [RFC3339], as refined by Section 3.3 by the "date-time" production in [RFC3339], as refined by Section 3.3
of [RFC4287], representing the point in time described there. A of [RFC4287], representing the point in time described there. A
skipping to change at page 25, line 32 skipping to change at page 26, line 26
The tags in this section are for content hints that might be used by The tags in this section are for content hints that might be used by
generic CBOR processors. These content hints do not extend the generic CBOR processors. These content hints do not extend the
generic data model. generic data model.
3.4.5.1. Encoded CBOR Data Item 3.4.5.1. Encoded CBOR Data Item
Sometimes it is beneficial to carry an embedded CBOR data item that Sometimes it is beneficial to carry an embedded CBOR data item that
is not meant to be decoded immediately at the time the enclosing data is not meant to be decoded immediately at the time the enclosing data
item is being decoded. Tag number 24 (CBOR data item) can be used to item is being decoded. Tag number 24 (CBOR data item) can be used to
tag the embedded byte string as a data item encoded in CBOR format. tag the embedded byte string as a single data item encoded in CBOR
Contained items that aren't byte strings are invalid. A contained format. Contained items that aren't byte strings are invalid. A
byte string is valid if it encodes a well-formed CBOR item; validity contained byte string is valid if it encodes a well-formed CBOR data
checking of the decoded CBOR item is not required for tag validity item; validity checking of the decoded CBOR item is not required for
(but could be offered by a generic decoder as a special option). tag validity (but could be offered by a generic decoder as a special
option).
3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters
Tags number 21 to 23 indicate that a byte string might require a Tags number 21 to 23 indicate that a byte string might require a
specific encoding when interoperating with a text-based specific encoding when interoperating with a text-based
representation. These tags are useful when an encoder knows that the representation. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a byte string data it is writing is likely to be later converted to a
particular JSON-based usage. That usage specifies that some strings particular JSON-based usage. That usage specifies that some strings
are encoded as base64, base64url, and so on. The encoder uses byte are encoded as base64, base64url, and so on. The encoder uses byte
strings instead of doing the encoding itself to reduce the message strings instead of doing the encoding itself to reduce the message
skipping to change at page 26, line 29 skipping to change at page 27, line 24
whitespace, or other additional characters. Tag number 23 suggests whitespace, or other additional characters. Tag number 23 suggests
conversion to base16 (hex) encoding, with uppercase alphabetics (see conversion to base16 (hex) encoding, with uppercase alphabetics (see
Section 8 of RFC 4648). Note that, for all three tag numbers, the Section 8 of RFC 4648). Note that, for all three tag numbers, the
encoding of the empty byte string is the empty text string. encoding of the empty byte string is the empty text string.
3.4.5.3. Encoded Text 3.4.5.3. Encoded Text
Some text strings hold data that have formats widely used on the Some text strings hold data that have formats widely used on the
Internet, and sometimes those formats can be validated and presented Internet, and sometimes those formats can be validated and presented
to the application in appropriate form by the decoder. There are to the application in appropriate form by the decoder. There are
tags for some of these formats. As with tag numbers 21 to 23, if tags for some of these formats.
these tags are applied to an item other than a text string, they
apply to all text string data items it contains.
* Tag number 32 is for URIs, as defined in [RFC3986]. If the text * Tag number 32 is for URIs, as defined in [RFC3986]. If the text
string doesn't match the "URI-reference" production, the string is string doesn't match the "URI-reference" production, the string is
invalid. invalid.
* Tag numbers 33 and 34 are for base64url- and base64-encoded text * Tag numbers 33 and 34 are for base64url- and base64-encoded text
strings, respectively, as defined in [RFC4648]. If any of: strings, respectively, as defined in [RFC4648]. If any of:
- the encoded text string contains non-alphabet characters or - the encoded text string contains non-alphabet characters or
only 1 character in the last block of 4, or only 1 alphabet character in the last block of 4 (where
alphabet is defined by Section 5 of [RFC4648] for tag number 33
and Section 4 of [RFC4648] for tag number 34), or
- the padding bits in a 2- or 3-character block are not 0, or - the padding bits in a 2- or 3-character block are not 0, or
- the base64 encoding has the wrong number of padding characters, - the base64 encoding has the wrong number of padding characters,
or or
- the base64url encoding has padding characters, - the base64url encoding has padding characters,
the string is invalid. the string is invalid.
skipping to change at page 27, line 39 skipping to change at page 28, line 39
In many applications, it will be clear from the context that CBOR is In many applications, it will be clear from the context that CBOR is
being employed for encoding a data item. For instance, a specific being employed for encoding a data item. For instance, a specific
protocol might specify the use of CBOR, or a media type is indicated protocol might specify the use of CBOR, or a media type is indicated
that specifies its use. However, there may be applications where that specifies its use. However, there may be applications where
such context information is not available, such as when CBOR data is such context information is not available, such as when CBOR data is
stored in a file that does not have disambiguating metadata. Here, stored in a file that does not have disambiguating metadata. Here,
it may help to have some distinguishing characteristics for the data it may help to have some distinguishing characteristics for the data
itself. itself.
Tag number 55799 is defined for this purpose. It does not impart any Tag number 55799 is defined for this purpose, specifically for use at
special semantics on the data item that it encloses; that is, the the start of a stored encoded CBOR data item as specified by an
semantics of the tag content enclosed in tag number 55799 is exactly application. It does not impart any special semantics on the data
identical to the semantics of the tag content itself. item that it encloses; that is, the semantics of the tag content
enclosed in tag number 55799 is exactly identical to the semantics of
the tag content itself.
The serialization of this tag's head is 0xd9d9f7, which does not The serialization of this tag's head is 0xd9d9f7, which does not
appear to be in use as a distinguishing mark for any frequently used appear to be in use as a distinguishing mark for any frequently used
file types. In particular, 0xd9d9f7 is not a valid start of a file types. In particular, 0xd9d9f7 is not a valid start of a
Unicode text in any Unicode encoding if it is followed by a valid Unicode text in any Unicode encoding if it is followed by a valid
CBOR data item. CBOR data item.
For instance, a decoder might be able to decode both CBOR and JSON. For instance, a decoder might be able to decode both CBOR and JSON.
Such a decoder would need to mechanically distinguish the two Such a decoder would need to mechanically distinguish the two
formats. An easy way for an encoder to help the decoder would be to formats. An easy way for an encoder to help the decoder would be to
skipping to change at page 34, line 22 skipping to change at page 35, line 37
Note that some applications and protocols will not want to use Note that some applications and protocols will not want to use
indefinite-length encoding. Using indefinite-length encoding allows indefinite-length encoding. Using indefinite-length encoding allows
an encoder to not need to marshal all the data for counting, but it an encoder to not need to marshal all the data for counting, but it
requires a decoder to allocate increasing amounts of memory while requires a decoder to allocate increasing amounts of memory while
waiting for the end of the item. This might be fine for some waiting for the end of the item. This might be fine for some
applications but not others. applications but not others.
5.2. Generic Encoders and Decoders 5.2. Generic Encoders and Decoders
A generic CBOR decoder can decode all well-formed CBOR data and A generic CBOR decoder can decode all well-formed encoded CBOR data
present them to an application. See Appendix C. items and present the data items to an application. See Appendix C.
(The diagnostic notation, Section 8, may be used to present well-
formed CBOR values to humans.)
Generic CBOR encoders provide an application interface that allows
the application to specify any well-formed value to be encoded as a
CBOR data item, including simple values and tags unknown to the
encoder.
Even though CBOR attempts to minimize these cases, not all well- Even though CBOR attempts to minimize these cases, not all well-
formed CBOR data is valid: for example, the encoded text string formed CBOR data is valid: for example, the encoded text string
"0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires
item. Also, specific tags may make semantic constraints that may be always using the shortest form) and so is not a valid CBOR item.
violated, such as a bignum tag enclosing another tag, or an instance Also, specific tags may make semantic constraints that may be
of tag number 0 containing a byte string, or containing a text string violated, for instance by a bignum tag enclosing another tag, or by
with contents that do not match [RFC3339]'s "date-time" production. an instance of tag number 0 containing a byte string, or containing a
There is no requirement that generic encoders and decoders make text string with contents that do not match [RFC3339]'s "date-time"
unnatural choices for their application interface to enable the production. There is no requirement that generic encoders and
processing of invalid data. Generic encoders and decoders are decoders make unnatural choices for their application interface to
expected to forward simple values and tags even if their specific enable the processing of invalid data. Generic encoders and decoders
are expected to forward simple values and tags even if their specific
codepoints are not registered at the time the encoder/decoder is codepoints are not registered at the time the encoder/decoder is
written (Section 5.4). written (Section 5.4).
Generic decoders provide ways to present well-formed CBOR values,
both valid and invalid, to an application. The diagnostic notation
(Section 8) may be used to present well-formed CBOR values to humans.
Generic encoders provide an application interface that allows the
application to specify any well-formed value, including simple values
and tags unknown to the encoder.
5.3. Validity of Items 5.3. Validity of Items
A well-formed but invalid CBOR data item presents a problem with A well-formed but invalid CBOR data item (Section 1.2) presents a
interpreting the data encoded in it in the CBOR data model. A CBOR- problem with interpreting the data encoded in it in the CBOR data
based protocol could be specified in several layers, in which the model. A CBOR-based protocol could be specified in several layers,
lower layers don't process the semantics of some of the CBOR data in which the lower layers don't process the semantics of some of the
they forward. These layers can't notice any validity errors in data CBOR data they forward. These layers can't notice any validity
they don't process and MUST forward that data as-is. The first layer errors in data they don't process and MUST forward that data as-is.
that does process the semantics of an invalid CBOR item MUST take one The first layer that does process the semantics of an invalid CBOR
of two choices: item MUST take one of two choices:
1. Replace the problematic item with an error marker and continue 1. Replace the problematic item with an error marker and continue
with the next item, or with the next item, or
2. Issue an error and stop processing altogether. 2. Issue an error and stop processing altogether.
A CBOR-based protocol MUST specify which of these options its A CBOR-based protocol MUST specify which of these options its
decoders take, for each kind of invalid item they might encounter. decoders take, for each kind of invalid item they might encounter.
Such problems might occur at the basic validity level of CBOR or in Such problems might occur at the basic validity level of CBOR or in
skipping to change at page 35, line 40 skipping to change at page 36, line 48
model: model:
Duplicate keys in a map: Generic decoders (Section 5.2) make data Duplicate keys in a map: Generic decoders (Section 5.2) make data
available to applications using the native CBOR data model. That available to applications using the native CBOR data model. That
data model includes maps (key-value mappings with unique keys), data model includes maps (key-value mappings with unique keys),
not multimaps (key-value mappings where multiple entries can have not multimaps (key-value mappings where multiple entries can have
the same key). Thus, a generic decoder that gets a CBOR map item the same key). Thus, a generic decoder that gets a CBOR map item
that has duplicate keys will decode to a map with only one that has duplicate keys will decode to a map with only one
instance of that key, or it might stop processing altogether. On instance of that key, or it might stop processing altogether. On
the other hand, a "streaming decoder" may not even be able to the other hand, a "streaming decoder" may not even be able to
notice (Section 5.6). notice. See Section 5.6 for more discussion of keys in maps.
Invalid UTF-8 string: A decoder might or might not want to verify Invalid UTF-8 string: A decoder might or might not want to verify
that the sequence of bytes in a UTF-8 string (major type 3) is that the sequence of bytes in a UTF-8 string (major type 3) is
actually valid UTF-8 and react appropriately. actually valid UTF-8 and react appropriately.
5.3.2. Tag validity 5.3.2. Tag validity
Two additional kinds of validity errors are introduced by adding tags Two additional kinds of validity errors are introduced by adding tags
to the basic generic data model: to the basic generic data model:
skipping to change at page 43, line 51 skipping to change at page 45, line 23
7.1. Extension Points 7.1. Extension Points
In a protocol design, opportunities for evolution are often included In a protocol design, opportunities for evolution are often included
in the form of extension points. For example, there may be a in the form of extension points. For example, there may be a
codepoint space that is not fully allocated from the outset, and the codepoint space that is not fully allocated from the outset, and the
protocol is designed to tolerate and embrace implementations that protocol is designed to tolerate and embrace implementations that
start using more codepoints than initially allocated. start using more codepoints than initially allocated.
Sizing the codepoint space may be difficult because the range Sizing the codepoint space may be difficult because the range
required may be hard to predict. An attempt should be made to make required may be hard to predict. Protocol designs should attempt to
the codepoint space large enough so that it can slowly be filled over make the codepoint space large enough so that it can slowly be filled
the intended lifetime of the protocol. over the intended lifetime of the protocol.
CBOR has three major extension points: CBOR has three major extension points:
* the "simple" space (values in major type 7). Of the 24 efficient * the "simple" space (values in major type 7). Of the 24 efficient
(and 224 slightly less efficient) values, only a small number have (and 224 slightly less efficient) values, only a small number have
been allocated. Implementations receiving an unknown simple data been allocated. Implementations receiving an unknown simple data
item may be able to process it as such, given that the structure item may easily be able to process it as such, given that the
of the value is indeed simple. The IANA registry in Section 9.1 structure of the value is indeed simple. The IANA registry in
is the appropriate way to address the extensibility of this Section 9.1 is the appropriate way to address the extensibility of
codepoint space. this codepoint space.
* the "tag" space (values in major type 6). Again, only a small * the "tag" space (values in major type 6). The total codepoint
part of the codepoint space has been allocated, and the space is space is abundant; only a tiny part of it has been allocated.
abundant (although the early numbers are more efficient than the However, not all of these codepoints are equally efficient: the
later ones). Implementations receiving an unknown tag number can first 24 only consume a single ("1+0") byte, and half of them have
choose to simply ignore it (process just the enclosed tag content) already been allocated. The next 232 values only consume two
or to process it as an unknown tag number wrapping the tag ("1+1") bytes, with nearly a quarter already allocated. These
content. The IANA registry in Section 9.2 is the appropriate way subspaces need some curation to last for a few more decades.
to address the extensibility of this codepoint space. Implementations receiving an unknown tag number can choose to
process just the enclosed tag content or, preferably, to process
the tag as an unknown tag number wrapping the tag content. The
IANA registry in Section 9.2 is the appropriate way to address the
extensibility of this codepoint space.
* the "additional information" space. An implementation receiving * the "additional information" space. An implementation receiving
an unknown additional information value has no way to continue an unknown additional information value has no way to continue
decoding, so allocating codepoints to this space is a major step. decoding, so allocating codepoints in this space is a major step
There are also very few codepoints left. See also Section 7.2. beyond just exercising an extension point. There are also very
few codepoints left. See also Section 7.2.
7.2. Curating the Additional Information Space 7.2. Curating the Additional Information Space
The human mind is sometimes drawn to filling in little perceived gaps The human mind is sometimes drawn to filling in little perceived gaps
to make something neat. We expect the remaining gaps in the to make something neat. We expect the remaining gaps in the
codepoint space for the additional information values to be an codepoint space for the additional information values to be an
attractor for new ideas, just because they are there. attractor for new ideas, just because they are there.
The present specification does not manage the additional information The present specification does not manage the additional information
codepoint space by an IANA registry. Instead, allocations out of codepoint space by an IANA registry. Instead, allocations out of
skipping to change at page 47, line 26 skipping to change at page 49, line 5
New entries in the range 32 to 255 are assigned by Specification New entries in the range 32 to 255 are assigned by Specification
Required. Required.
9.2. Tags Registry 9.2. Tags Registry
IANA has created the "Concise Binary Object Representation (CBOR) IANA has created the "Concise Binary Object Representation (CBOR)
Tags" registry at [IANA.cbor-tags]. The tags that were defined in Tags" registry at [IANA.cbor-tags]. The tags that were defined in
[RFC7049] are described in detail in Section 3.4, and other tags have [RFC7049] are described in detail in Section 3.4, and other tags have
already been defined. already been defined.
New entries in the range 0 to 23 are assigned by Standards Action. New entries in the range 0 to 23 ("1+0") are assigned by Standards
New entries in the range 24 to 255 are assigned by Specification Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767
Required. New entries in the range 256 to 18446744073709551615 are (lower half of "1+2") are assigned by Specification Required. New
assigned by First Come First Served. The template for registration entries in the range 32768 to 18446744073709551615 (upper half of
requests is: "1+2", "1+4", and "1+8") are assigned by First Come First Served.
The template for registration requests is:
* Data item * Data item
* Semantics (short form) * Semantics (short form)
In addition, First Come First Served requests should include: In addition, First Come First Served requests should include:
* Point of contact * Point of contact
* Description of semantics (URL) - This description is optional; the * Description of semantics (URL) - This description is optional; the
URL can point to something like an Internet-Draft or a web page. URL can point to something like an Internet-Draft or a web page.
Applicants exercising the First Come First Served range and making a
suggestion for a tag number that is not representable in 32 bits
(i.e., larger than 4294967295) should be aware that this could reduce
interoperability with implementations that do not support 64-bit
numbers.
9.3. Media Type ("MIME Type") 9.3. Media Type ("MIME Type")
The Internet media type [RFC6838] for a single encoded CBOR data item The Internet media type [RFC6838] for a single encoded CBOR data item
is application/cbor, as defined in [IANA.media-types]: is application/cbor, as defined in [IANA.media-types]:
Type name: application Type name: application
Subtype name: cbor Subtype name: cbor
Required parameters: n/a Required parameters: n/a
skipping to change at page 49, line 49 skipping to change at page 51, line 31
"xxx/yyy+cbor". "xxx/yyy+cbor".
Security Considerations: See Section 10 of this document Security Considerations: See Section 10 of this document
Contact: IETF CBOR Working Group cbor@ietf.org Contact: IETF CBOR Working Group cbor@ietf.org
(mailto:cbor@ietf.org) or IETF Applications and Real-Time Area (mailto:cbor@ietf.org) or IETF Applications and Real-Time Area
art@ietf.org (mailto:art@ietf.org) art@ietf.org (mailto:art@ietf.org)
Author/Change Controller: The IESG iesg@ietf.org Author/Change Controller: The IESG iesg@ietf.org
(mailto:iesg@ietf.org) (mailto:iesg@ietf.org)
// Editors' note: RFC 6838 has a template // Editors' note: RFC 6838 has a template field Author/Change
field Author/Change // controller, the descriptive text of which makes clear that this
// controller, the descriptive text of is
which makes clear that this is // the change controller, not the author. Go figure. There is no
// the change controller, not the author. // separate author entry as in the media types registry. (RFC
Go figure. There is no // editor: Please remove this note before publication.)
// separate author entry as in the media
types registry. (RFC
// editor: Please remove this note before
publication.)
10. Security Considerations 10. Security Considerations
A network-facing application can exhibit vulnerabilities in its A network-facing application can exhibit vulnerabilities in its
processing logic for incoming data. Complex parsers are well known processing logic for incoming data. Complex parsers are well known
as a likely source of such vulnerabilities, such as the ability to as a likely source of such vulnerabilities, such as the ability to
remotely crash a node, or even remotely execute arbitrary code on it. remotely crash a node, or even remotely execute arbitrary code on it.
CBOR attempts to narrow the opportunities for introducing such CBOR attempts to narrow the opportunities for introducing such
vulnerabilities by reducing parser complexity, by giving the entire vulnerabilities by reducing parser complexity, by giving the entire
range of encodable values a meaning where possible. range of encodable values a meaning where possible.
skipping to change at page 51, line 15 skipping to change at page 52, line 42
input is in alignment with the application protocol that is input is in alignment with the application protocol that is
serialized in CBOR. serialized in CBOR.
The input check itself may consume resources. This is usually linear The input check itself may consume resources. This is usually linear
in the size of the input, which means that an attacker has to spend in the size of the input, which means that an attacker has to spend
resources that are commensurate to the resources spent by the resources that are commensurate to the resources spent by the
defender on input validation. Processing for arbitrary-precision defender on input validation. Processing for arbitrary-precision
numbers may exceed linear effort. Also, some hash-table numbers may exceed linear effort. Also, some hash-table
implementations that are used by decoders to build in-memory implementations that are used by decoders to build in-memory
representations of maps can be attacked to spend quadratic effort, representations of maps can be attacked to spend quadratic effort,
unless a secret key is employed (see Section 7 of [SIPHASH]). Such unless a secret key (see Section 7 of [SIPHASH]) or some other
superlinear efforts can be employed by an attacker to exhaust mitigation is employed. Such superlinear efforts can be exploited by
resources at or before the input validator; they therefore need to be an attacker to exhaust resources at or before the input validator;
avoided in a CBOR decoder implementation. Note that tag number they therefore need to be avoided in a CBOR decoder implementation.
definitions and their implementations can add security considerations Note that tag number definitions and their implementations can add
of this kind; this should then be discussed in the security security considerations of this kind; this should then be discussed
considerations of the tag number definition. in the security considerations of the tag number definition.
CBOR encoders do not receive input directly from the network and are CBOR encoders do not receive input directly from the network and are
thus not directly attackable in the same way as CBOR decoders. thus not directly attackable in the same way as CBOR decoders.
However, CBOR encoders often have an API that takes input from However, CBOR encoders often have an API that takes input from
another level in the implementation and can be attacked through that another level in the implementation and can be attacked through that
API. The design and implementation of that API should assume the API. The design and implementation of that API should assume the
behavior of its caller may be based on hostile input or on coding behavior of its caller may be based on hostile input or on coding
mistakes. It should check inputs for buffer overruns, overflow and mistakes. It should check inputs for buffer overruns, overflow and
underflow of integer arithmetic, and other such errors that are aimed underflow of integer arithmetic, and other such errors that are aimed
to disrupt the encoder. to disrupt the encoder.
skipping to change at page 53, line 26 skipping to change at page 55, line 8
[ASN.1] International Telecommunication Union, "Information [ASN.1] International Telecommunication Union, "Information
Technology -- ASN.1 encoding rules: Specification of Basic Technology -- ASN.1 encoding rules: Specification of Basic
Encoding Rules (BER), Canonical Encoding Rules (CER) and Encoding Rules (BER), Canonical Encoding Rules (CER) and
Distinguished Encoding Rules (DER)", ITU-T Recommendation Distinguished Encoding Rules (DER)", ITU-T Recommendation
X.690, 1994. X.690, 1994.
[BSON] Various, "BSON - Binary JSON", 2013, [BSON] Various, "BSON - Binary JSON", 2013,
<http://bsonspec.org/>. <http://bsonspec.org/>.
[I-D.bormann-cbor-notable-tags]
Bormann, C., "Notable CBOR Tags", Work in Progress,
Internet-Draft, draft-bormann-cbor-notable-tags-01, 15 May
2020, <http://www.ietf.org/internet-drafts/draft-bormann-
cbor-notable-tags-01.txt>.
[IANA.cbor-simple-values] [IANA.cbor-simple-values]
IANA, "Concise Binary Object Representation (CBOR) Simple IANA, "Concise Binary Object Representation (CBOR) Simple
Values", Values",
<http://www.iana.org/assignments/cbor-simple-values>. <http://www.iana.org/assignments/cbor-simple-values>.
[IANA.cbor-tags] [IANA.cbor-tags]
IANA, "Concise Binary Object Representation (CBOR) Tags", IANA, "Concise Binary Object Representation (CBOR) Tags",
<http://www.iana.org/assignments/cbor-tags>. <http://www.iana.org/assignments/cbor-tags>.
[IANA.core-parameters] [IANA.core-parameters]
skipping to change at page 55, line 10 skipping to change at page 56, line 47
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
<https://www.rfc-editor.org/info/rfc8742>. <https://www.rfc-editor.org/info/rfc8742>.
[RFC8746] Bormann, C., Ed., "Concise Binary Object Representation [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746, (CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020, DOI 10.17487/RFC8746, February 2020,
<https://www.rfc-editor.org/info/rfc8746>. <https://www.rfc-editor.org/info/rfc8746>.
[rfc8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020,
<https://www.rfc-editor.org/info/rfc8746>.
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture
Notes in Computer Science pp. 489-508, 2012, Notes in Computer Science pp. 489-508, 2012,
<https://doi.org/10.1007/978-3-642-34931-7_28>. <https://doi.org/10.1007/978-3-642-34931-7_28>.
[YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
Language (YAML[TM]) Version 1.2", 3rd Edition, October Language (YAML[TM]) Version 1.2", 3rd Edition, October
2009, <http://www.yaml.org/spec/1.2/spec.html>. 2009, <http://www.yaml.org/spec/1.2/spec.html>.
Appendix A. Examples Appendix A. Examples
skipping to change at page 63, line 11 skipping to change at page 64, line 43
* uint() converts a byte string into an unsigned integer by * uint() converts a byte string into an unsigned integer by
interpreting the byte string in network byte order. interpreting the byte string in network byte order.
* Arithmetic works as in C. * Arithmetic works as in C.
* All variables are unsigned integers of sufficient range. * All variables are unsigned integers of sufficient range.
Note that "well_formed" returns the major type for well-formed Note that "well_formed" returns the major type for well-formed
definite length items, but 0 for an indefinite length item (or -1 for definite length items, but 0 for an indefinite length item (or -1 for
a break stop code, only if "breakable" is set). This is used in a "break" stop code, only if "breakable" is set). This is used in
"well_formed_indefinite" to ascertain that indefinite length strings "well_formed_indefinite" to ascertain that indefinite length strings
only contain definite length strings as chunks. only contain definite length strings as chunks.
well_formed (breakable = false) { well_formed (breakable = false) {
// process initial bytes // process initial bytes
ib = uint(take(1)); ib = uint(take(1));
mt = ib >> 5; mt = ib >> 5;
val = ai = ib & 0x1f; val = ai = ib & 0x1f;
switch (ai) { switch (ai) {
case 24: val = uint(take(1)); break; case 24: val = uint(take(1)); break;
skipping to change at page 67, line 36 skipping to change at page 68, line 39
E.1. ASN.1 DER, BER, and PER E.1. ASN.1 DER, BER, and PER
[ASN.1] has many serializations. In the IETF, DER and BER are the [ASN.1] has many serializations. In the IETF, DER and BER are the
most common. The serialized output is not particularly compact for most common. The serialized output is not particularly compact for
many items, and the code needed to decode numeric items can be many items, and the code needed to decode numeric items can be
complex on a constrained device. complex on a constrained device.
Few (if any) IETF protocols have adopted one of the several variants Few (if any) IETF protocols have adopted one of the several variants
of Packed Encoding Rules (PER). There could be many reasons for of Packed Encoding Rules (PER). There could be many reasons for
this, but one that is commonly stated is that PER makes use of the this, but one that is commonly stated is that PER makes use of the
schema even for parsing the surface structure of the data stream, schema even for parsing the surface structure of the data item,
requiring significant tool support. There are different versions of requiring significant tool support. There are different versions of
the ASN.1 schema language in use, which has also hampered adoption. the ASN.1 schema language in use, which has also hampered adoption.
E.2. MessagePack E.2. MessagePack
[MessagePack] is a concise, widely implemented counted binary [MessagePack] is a concise, widely implemented counted binary
serialization format, similar in many properties to CBOR, although serialization format, similar in many properties to CBOR, although
somewhat less regular. While the data model can be used to represent somewhat less regular. While the data model can be used to represent
JSON data, MessagePack has also been used in many remote procedure JSON data, MessagePack has also been used in many remote procedure
call (RPC) applications and for long-term storage of data. call (RPC) applications and for long-term storage of data.
skipping to change at page 69, line 27 skipping to change at page 70, line 27
| | 00 00 04 31 00 13 00 00 00 | | | | 00 00 04 31 00 13 00 00 00 | |
| | 10 30 00 02 00 00 00 10 31 | | | | 10 30 00 02 00 00 00 10 31 | |
| | 00 03 00 00 00 00 00 | | | | 00 03 00 00 00 00 00 | |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 |
| | | ff | | | | ff |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
Table 8: Examples for Different Levels of Conciseness Table 8: Examples for Different Levels of Conciseness
Appendix F. Changes from RFC 7049 Appendix F. Well-formedness errors and examples
The following is a list of known changes from RFC 7049. This list is
non-authoritative. It is meant to help reviewers see the significant
differences.
* Made some use of new RFCXML functionality [RFC7991]
* Updated references, e.g. for [RFC4627] to [RFC8259] in many
places, for [CNN-TERMS] to [RFC7228]; added missing reference to
[IEEE754] and updated to [ECMA262]
* Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and
in the last paragraph of Section 3.6 ("0b000_11101" ->
"0b000_11001")
* Added a comment to the last example in Section 3.2.2 (added
"Second value")
* Applied numerous small editorial changes
* Added a few tables for illustration
* More stringently used terminology for well-formed and valid data,
avoiding less well-defined alternative terms such as "syntax
error", "decoding error" and "strict mode" outside examples
* Streamlined terminology to talk about tags, tag numbers, and tag
content
* Clarified the restrictions on tag content, in general and
specifically for tag 1
* Added text about the CBOR data model and its small variations
(basic generic, extended generic, specific)
* More clearly separated integers from floating-point values;
provided a suggestion (based on I-JSON [RFC7493]) for handling
these types when converting JSON to CBOR
* Added term "preferred serialization" and defined it for various
kinds of data items
* Added comment about tags with semantics that depend on
serialization order
* Defined "deterministic encoding", making use of "preferred
serialization", and simplified the suggested map ordering for the
"Core Deterministic Encoding Requirements", easing implementation,
while keeping RFC 7049 map ordering as an alternative "length-
first map key ordering"; now avoiding the terms "canonical" and
"canonicalization"
* Clarified map validity (handling of duplicate keys) and explained
the domain of applicability of certain implementation choices
* Updated IANA considerations
* Added security considerations
* Clarified handling of non-well-formed simple values in text and
pseudocode
* Added Appendix G, well-formedness errors and examples
* Removed UBJSON from Appendix E, as that format has completely
changed since RFC 7049; added reference to [RFC8618]
Appendix G. Well-formedness errors and examples
There are three basic kinds of well-formedness errors that can occur There are three basic kinds of well-formedness errors that can occur
in decoding a CBOR data item: in decoding a CBOR data item:
* Too much data: There are input bytes left that were not consumed. * Too much data: There are input bytes left that were not consumed.
This is only an error if the application assumed that the input This is only an error if the application assumed that the input
bytes would span exactly one data item. Where the application bytes would span exactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in additional data after the data item, as is for example done in
CBOR sequences [RFC8742], the CBOR decoder can simply indicate CBOR sequences [RFC8742], the CBOR decoder can simply indicate
skipping to change at page 71, line 40 skipping to change at page 71, line 24
calling fail(), in order: calling fail(), in order:
* a reserved value is used for additional information (28, 29, 30) * a reserved value is used for additional information (28, 29, 30)
* major type 7, additional information 24, value < 32 (incorrect or * major type 7, additional information 24, value < 32 (incorrect or
incorrectly encoded simple type) incorrectly encoded simple type)
* incorrect substructure of indefinite length byte/text string (may * incorrect substructure of indefinite length byte/text string (may
only contain definite length strings of the same major type) only contain definite length strings of the same major type)
* break stop code (mt=7, ai=31) occurs in a value position of a map * "break" stop code (mt=7, ai=31) occurs in a value position of a
or except at a position directly in an indefinite length item map or except at a position directly in an indefinite length item
where also another enclosed data item could occur where also another enclosed data item could occur
* additional information 31 used with major type 0, 1, or 6 * additional information 31 used with major type 0, 1, or 6
G.1. Examples for CBOR data items that are not well-formed F.1. Examples for CBOR data items that are not well-formed
This subsection shows a few examples for CBOR data items that are not This subsection shows a few examples for CBOR data items that are not
well-formed. Each example is a sequence of bytes each shown in well-formed. Each example is a sequence of bytes each shown in
hexadecimal; multiple examples in a list are separated by commas. hexadecimal; multiple examples in a list are separated by commas.
Examples for well-formedness error kind 1 (too much data) can easily Examples for well-formedness error kind 1 (too much data) can easily
be formed by adding data to a well-formed encoded CBOR data item. be formed by adding data to a well-formed encoded CBOR data item.
Similarly, examples for well-formedness error kind 2 (too little Similarly, examples for well-formedness error kind 2 (too little
data) can be formed by truncating a well-formed encoded CBOR data data) can be formed by truncating a well-formed encoded CBOR data
item. In test suites, it may be beneficial to specifically test with item. In test suites, it may be beneficial to specifically test with
incomplete data items that would require large amounts of addition to incomplete data items that would require large amounts of addition to
be completed (for instance by starting the encoding of a string of a be completed (for instance by starting the encoding of a string of a
very large size). very large size).
A premature end of the input can occur in a head or within the A premature end of the input can occur in a head or within the
enclosed data, which may be bare strings or enclosed data items that enclosed data, which may be bare strings or enclosed data items that
are either counted or should have been ended by a break stop code. are either counted or should have been ended by a "break" stop code.
* End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 * End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa
00 00, fb 00 00 00 00 00, fb 00 00 00
* Definite length strings with short data: 41, 61, 5a ff ff ff ff * Definite length strings with short data: 41, 61, 5a ff ff ff ff
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
ff ff ff ff ff ff ff 01 02 03 ff ff ff ff ff ff ff 01 02 03
* Definite length maps and arrays not closed with enough items: 81, * Definite length maps and arrays not closed with enough items: 81,
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
00 00
* Tag number not followed by tag content: c0 * Tag number not followed by tag content: c0
* Indefinite length strings not closed by a break stop code: 5f 41 * Indefinite length strings not closed by a "break" stop code: 5f 41
00, 7f 61 00 00, 7f 61 00
* Indefinite length maps and arrays not closed by a break stop code: * Indefinite length maps and arrays not closed by a "break" stop
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff 9f 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
A few examples for the five subkinds of well-formedness error kind 3 A few examples for the five subkinds of well-formedness error kind 3
(syntax error) are shown below. (syntax error) are shown below.
Subkind 1: Subkind 1:
* Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
fd, fe, fd, fe,
skipping to change at page 73, line 30 skipping to change at page 73, line 12
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82
9f 81 9f 9f ff ff ff ff 9f 81 9f 9f ff ff ff ff
* Break in indefinite length map would lead to odd number of items * Break in indefinite length map would lead to odd number of items
(break in a value position): bf 00 ff, bf 00 00 00 ff (break in a value position): bf 00 ff, bf 00 00 00 ff
Subkind 5: Subkind 5:
* Major type 0, 1, 6 with additional information 31: 1f, 3f, df * Major type 0, 1, 6 with additional information 31: 1f, 3f, df
Appendix G. Changes from RFC 7049
As discussed in the introduction, this document is a revised edition
of RFC 7049, with editorial improvements, added detail, and fixed
errata. This document formally obsoletes RFC 7049, while keeping
full compatibility of the interchange format from RFC 7049. This
document does not create a new version of the format.
G.1. Errata processing, clerical changes
The two verified errata on RFC 7049, EID 3764 and EID 3770, concerned
two encoding examples in the text that have been corrected
(Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" ->
"0b000_11001"). Also, RFC 7049 contained an example using the simple
type value 24 (EID 5917), which is not well-formed; this example has
been removed. Errata report 5763 pointed to an accident in the
wording of the definition of tags; this was resolved during a re-
write of Section 3.4. Errata report 5434 pointed out that the UBJSON
example in Appendix E no longer complied with the version of UBJSON
current at the time of submitting the report. It turned out that the
UBJSON specification had completely changed since 2013; this example
therefore also was removed. Further errata reports (4409, 4963,
4964) complained that the map key sorting rules for canonical
encoding were onerous; these led to a reconsideration of the
canonical encoding suggestions and replacement by the deterministic
encoding suggestions (described below). An editorial suggestion in
errata report 4294 was also implemented (improved symmetry by adding
"Second value" to a comment to the last example in Section 3.2.2).
Other more clerical changes include:
* use of new RFCXML functionality [RFC7991];
* explain some more of the notation used;
* updated references, e.g. for RFC4627 to [RFC8259] in many places,
for CNN-TERMS to [RFC7228]; added missing reference to [IEEE754]
(importing required definitions) and updated to [ECMA262]; added a
reference to [RFC8618] that further illustrates the discussion in
Appendix E;
* the discussion of diagnostic notation mentions the "Extended
Diagnostic Notation" (EDN) defined in [RFC8610];
* the addition of this appendix.
G.2. Changes in IANA considerations
The IANA considerations were generally updated (clerical changes,
e.g., now pointing to the CBOR working group as the author of the
specification). References to the respective IANA registries have
been added to the informative references.
Tags in the space from 256 to 32767 (lower half of "1+2") are no
longer assigned by First Come First Served; this range is now
Specification Required.
G.3. Changes in suggestions and other informational components
In revising the document, beyond processing errata reports, the WG
could use nearly seven years of experience with the use of CBOR in a
diverse set of applications. This led to a number of editorial
changes, including adding tables for illustration, but also to
emphasizing some aspects and de-emphasizing others.
A significant addition in this revision is Section 2, which discusses
the CBOR data model and its small variations involved in the
processing of CBOR. Introducing terms for those (basic generic,
extended generic, specific) enables more concise language in other
places of the document, but also helps in clarifying expectations on
implementations and on the extensibility features of the format.
RFC 7049, as a format derived from the JSON ecosystem, was influenced
by the JSON number system that was in turn inherited from JavaScript
at the time. JSON does not provide distinct integers and floating
point values (and the latter are decimal in the format). CBOR
provides binary representations of numbers, which do differ between
integers and floating point values. Experience from implementation
and use now suggested that the separation between these two number
domains should be more clearly drawn in the document; language that
suggested an integer could seamlessly stand in for a floating point
value was removed. Also, a suggestion (based on I-JSON [RFC7493])
was added for handling these types when converting JSON to CBOR.
For a single value in the data model, CBOR often provides multiple
encoding options. The revision adds a new section Section 4, which
first introduces the term "preferred serialization" (Section 4.1) and
defines it for various kinds of data items. On the basis of this
terminology, the section goes on to discuss how a CBOR-based protocol
can define "deterministic encoding" (Section 4.2), which now avoids
the RFC 7049 terms "canonical" and "canonicalization". The
suggestion of "Core Deterministic Encoding Requirements"
Section 4.2.1 enables generic support for such protocol-defined
encoding requirements. The present revision further eases the
implementation of deterministic encoding by simplifying the map
ordering suggested in RFC 7049 to simple lexicographic ordering of
encoded keys. A description of the older suggestion is kept as an
alternative, now termed "length-first map key ordering"
(Section 4.2.3).
The terminology for well-formed and valid data was sharpened and more
stringently used, avoiding less well-defined alternative terms such
as "syntax error", "decoding error" and "strict mode" outside
examples. Also, a third level of requirements beyond CBOR-level
validity that an application has on its input data is now explicitly
called out. Well-formed (processable at all), valid (checked by a
validity-checking generic decoder), and expected input (as checked by
the application) are treated as a hierarchy of layers of
acceptability.
The handling of non-well-formed simple values was clarified in text
and pseudocode. Appendix F was added to discuss well-formedness
errors and provide examples for them.
The discussion of validity has been sharpened in two areas. Map
validity (handling of duplicate keys) was clarified and the domain of
applicability of certain implementation choices explained. Also,
while streamlining the terminology for tags, tag numbers, and tag
content, discussion was added on tag validity, and the restrictions
pwere clarified on tag content, in general and specifically for tag
1.
An implementation note (and note for future tag definitions) was
added to Section 3.4 about defining tags with semantics that depend
on serialization order.
Terminology was introduced in Section 3 for "argument" and "head",
simplifying further discussion.
The security considerations were mostly rewritten and significantly
expanded; in multiple other places, the document is now more explicit
that a decoder cannot simply condone well-formedness errors.
Acknowledgements Acknowledgements
CBOR was inspired by MessagePack. MessagePack was developed and CBOR was inspired by MessagePack. MessagePack was developed and
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to promoted by Sadayuki Furuhashi ("frsyuki"). This reference to
MessagePack is solely for attribution; CBOR is not intended as a MessagePack is solely for attribution; CBOR is not intended as a
version of or replacement for MessagePack, as it has different design version of or replacement for MessagePack, as it has different design
goals and requirements. goals and requirements.
The need for functionality beyond the original MessagePack The need for functionality beyond the original MessagePack
Specification became obvious to many people at about the same time Specification became obvious to many people at about the same time
 End of changes. 51 change blocks. 
258 lines changed or deleted 349 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/