--- 1/draft-ietf-cbor-7049bis-13.txt 2020-06-16 17:13:09.673940163 -0700
+++ 2/draft-ietf-cbor-7049bis-14.txt 2020-06-16 17:13:09.829944134 -0700
@@ -1,158 +1,163 @@
Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI
Obsoletes: 7049 (if approved) P. Hoffman
Intended status: Standards Track ICANN
-Expires: 9 September 2020 8 March 2020
+Expires: 19 December 2020 17 June 2020
Concise Binary Object Representation (CBOR)
- draft-ietf-cbor-7049bis-13
+ draft-ietf-cbor-7049bis-14
Abstract
The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial
improvements, added detail, and fixed errata. This revision formally
obsoletes RFC 7049, while keeping full compatibility of the
interchange format from RFC 7049. It does not create a new version
of the format.
Contributing
+ This note is to be removed before publishing as an RFC.
+
This document is being worked on in the CBOR Working Group. Please
contribute on the mailing list there, or in the GitHub repository for
this draft: https://github.com/cbor-wg/CBORbis
The charter for the CBOR Working Group says that the WG will update
RFC 7049 to fix verified errata. Security issues and clarifications
may be addressed, but changes to this document will ensure backward
compatibility for popular deployed codebases. This document will be
targeted at becoming an Internet Standard.
- [RFC editor: please remove this note.]
-
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
- This Internet-Draft will expire on 9 September 2020.
+ This Internet-Draft will expire on 19 December 2020.
Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
- 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7
- 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8
+ 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 8
+ 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 9
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11
- 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13
- 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13
+ 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 14
+ 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 14
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16
3.2.4. Summary of indefinite-length use of major types . . . 17
- 3.3. Floating-Point Numbers and Values with No Content . . . . 17
+ 3.3. Floating-Point Numbers and Values with No Content . . . . 18
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19
3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22
- 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 22
- 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 23
+ 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 23
+ 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24
- 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 25
- 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 25
+ 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 26
+ 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 26
3.4.5.2. Expected Later Encoding for CBOR-to-JSON
- Converters . . . . . . . . . . . . . . . . . . . . 25
- 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 26
- 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 27
+ Converters . . . . . . . . . . . . . . . . . . . . 26
+ 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 27
+ 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 28
- 4. Serialization Considerations . . . . . . . . . . . . . . . . 28
- 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 28
- 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 29
- 4.2.1. Core Deterministic Encoding Requirements . . . . . . 29
- 4.2.2. Additional Deterministic Encoding Considerations . . 30
- 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 32
- 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 33
- 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 33
- 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 34
- 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 35
- 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 35
- 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 35
- 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 36
- 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37
- 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 38
- 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 39
- 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 40
- 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 40
- 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 41
- 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 42
- 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 43
- 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 43
- 7.2. Curating the Additional Information Space . . . . . . . . 44
- 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 45
- 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 46
- 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46
- 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 47
- 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 47
- 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 47
- 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 48
- 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 49
- 10. Security Considerations . . . . . . . . . . . . . . . . . . . 50
- 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 52
- 11.1. Normative References . . . . . . . . . . . . . . . . . . 52
- 11.2. Informative References . . . . . . . . . . . . . . . . . 53
- Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 55
- Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 59
- Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 62
- Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 65
+ 4. Serialization Considerations . . . . . . . . . . . . . . . . 29
+ 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 29
+ 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 30
+ 4.2.1. Core Deterministic Encoding Requirements . . . . . . 30
+ 4.2.2. Additional Deterministic Encoding Considerations . . 31
+ 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 33
+ 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 34
+ 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 35
+ 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 35
+ 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 36
+ 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 36
+ 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 37
+ 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 37
+ 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 38
+ 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 39
+ 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 41
+ 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 42
+ 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 42
+ 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 42
+ 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 43
+ 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 44
+ 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 45
+ 7.2. Curating the Additional Information Space . . . . . . . . 46
+ 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 46
+ 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 47
+ 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48
+ 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 48
+ 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 48
+ 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 49
+ 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 50
+ 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 50
+ 10. Security Considerations . . . . . . . . . . . . . . . . . . . 51
+ 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 53
+ 11.1. Normative References . . . . . . . . . . . . . . . . . . 53
+ 11.2. Informative References . . . . . . . . . . . . . . . . . 54
+ Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 57
+ Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 61
+ Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 64
+ Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 66
Appendix E. Comparison of Other Binary Formats to CBOR's Design
- Objectives . . . . . . . . . . . . . . . . . . . . . . . 66
- E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 67
- E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 67
- E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 68
- E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 68
- E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 68
- Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 69
- Appendix G. Well-formedness errors and examples . . . . . . . . 70
- G.1. Examples for CBOR data items that are not well-formed . . 71
- Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 73
- Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74
+ Objectives . . . . . . . . . . . . . . . . . . . . . . . 67
+ E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 68
+ E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 68
+ E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 69
+ E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 69
+ E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 69
+ Appendix F. Well-formedness errors and examples . . . . . . . . 70
+ F.1. Examples for CBOR data items that are not well-formed . . 71
+
+ Appendix G. Changes from RFC 7049 . . . . . . . . . . . . . . . 73
+ G.1. Errata processing, clerical changes . . . . . . . . . . . 73
+ G.2. Changes in IANA considerations . . . . . . . . . . . . . 74
+ G.3. Changes in suggestions and other informational
+ components . . . . . . . . . . . . . . . . . . . . . . . 74
+ Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 76
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76
1. Introduction
There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER [ASN.1].
The format defined here follows some specific design goals that are
@@ -271,29 +276,28 @@
Decoder: A process that decodes a well-formed encoded CBOR data item
and makes it available to an application. Formally speaking, a
decoder contains a parser to break up the input using the syntax
rules of CBOR, as well as a semantic processor to prepare the data
in a form suitable to the application.
Encoder: A process that generates the (well-formed) representation
format of a CBOR data item from application information.
Data Stream: A sequence of zero or more data items, not further
- assembled into a larger containing data item. The independent
- data items that make up a data stream are sometimes also referred
- to as "top-level data items".
+ assembled into a larger containing data item (see [RFC8742] for
+ one application). The independent data items that make up a data
+ stream are sometimes also referred to as "top-level data items".
Well-formed: A data item that follows the syntactic structure of
CBOR. A well-formed data item uses the initial bytes and the byte
strings and/or data items that are implied by their values as
defined in CBOR and does not include following extraneous data.
-
CBOR decoders by definition only return contents from well-formed
data items.
Valid: A data item that is well-formed and also follows the semantic
restrictions that apply to CBOR data items (Section 5.3).
Expected: Besides its normal English meaning, the term "expected" is
used to describe requirements beyond CBOR validity that an
application has on its input data. Well-formed (processable at
all), valid (checked by a validity-checking generic decoder), and
@@ -364,26 +368,26 @@
* a mapping (mathematical function) from zero or more data items
("keys") each to a data item ("values"), ("map")
* a tagged data item ("tag"), comprising a tag number (an integer in
the range 0..2**64-1) and the tag content (a data item)
Note that integer and floating-point values are distinct in this
model, even if they have the same numeric value.
- Also note that serialization variants, such as the number of bytes of
- the encoded floating-point value, or the choice of one of the ways in
- which an integer, the length of a text or byte string, the number of
- elements in an array or pairs in a map, or a tag number,
- (collectively "the argument", see Section 3) can be encoded, are not
- visible at the generic data model level.
+ Also note that serialization variants are not visible at the generic
+ data model level, including the number of bytes of the encoded
+ floating-point value or the choice of one of the ways in which an
+ integer, the length of a text or byte string, the number of elements
+ in an array or pairs in a map, or a tag number, (collectively "the
+ argument", see Section 3) can be encoded.
2.1. Extended Generic Data Models
This basic generic data model comes pre-extended by the registration
of a number of simple values and tag numbers right in this document,
such as:
* "false", "true", "null", and "undefined" (simple values identified
by 20..23)
@@ -518,29 +522,29 @@
5 would have an initial byte of 0b010_00101 (major type 2,
additional information 5 for the length), followed by 5 bytes of
binary content. A byte string whose length is 500 would have 3
initial bytes of 0b010_11001 (major type 2, additional information
25 to indicate a two-byte length) followed by the two bytes 0x01f4
for a length of 500, followed by 500 bytes of binary content.
Major type 3: a text string (Section 2), encoded as UTF-8
([RFC3629]). The number of bytes in the string is equal to the
argument. A string containing an invalid UTF-8 sequence is well-
- formed but invalid. This type is provided for systems that need
- to interpret or display human-readable text, and allows the
- differentiation between unstructured bytes and text that has a
- specified repertoire and encoding. In contrast to formats such as
- JSON, the Unicode characters in this type are never escaped.
- Thus, a newline character (U+000A) is always represented in a
- string as the byte 0x0a, and never as the bytes 0x5c6e (the
- characters "\" and "n") or as 0x5c7530303061 (the characters "\",
- "u", "0", "0", "0", and "a").
+ formed but invalid (Section 1.2). This type is provided for
+ systems that need to interpret or display human-readable text, and
+ allows the differentiation between unstructured bytes and text
+ that has a specified repertoire and encoding. In contrast to
+ formats such as JSON, the Unicode characters in this type are
+ never escaped. Thus, a newline character (U+000A) is always
+ represented in a string as the byte 0x0a, and never as the bytes
+ 0x5c6e (the characters "\" and "n") or as 0x5c7530303061 (the
+ characters "\", "u", "0", "0", "0", and "a").
Major type 4: an array of data items. In other formats, arrays are
also called lists, sequences, or tuples (a "CBOR sequence" is
something slightly different, though [RFC8742]). The argument is
the number of data items in the array. Items in an array do not
need to all be of the same type. For example, an array that
contains 10 items of any type would have an initial byte of
0b100_01010 (major type of 4, additional information of 10 for the
length) followed by the 10 remaining items.
@@ -631,22 +635,22 @@
Indefinite-length arrays and maps are represented using their major
type with the additional information value of 31, followed by an
arbitrary-length sequence of zero or more items for an array or key/
value pairs for a map, followed by the "break" stop code
(Section 3.2.1). In other words, indefinite-length arrays and maps
look identical to other arrays and maps except for beginning with the
additional information value of 31 and ending with the "break" stop
code.
- If the break stop code appears after a key in a map, in place of that
- key's value, the map is not well-formed.
+ If the "break" stop code appears after a key in a map, in place of
+ that key's value, the map is not well-formed.
There is no restriction against nesting indefinite-length array or
map items. A "break" only terminates a single item, so nested
indefinite-length items need exactly as many "break" stop codes as
there are type bytes starting an indefinite-length item.
For example, assume an encoder wants to represent the abstract array
[1, [2, 3], [4, 5]]. The definite-length encoding would be
0x8301820203820405:
@@ -732,23 +736,23 @@
respectively, if no chunk is present). (Note that zero-length
chunks, while not particularly useful, are permitted.)
If any item between the indefinite-length string indicator
(0b010_11111 or 0b011_11111) and the "break" stop code is not a
definite-length string item of the same major type, the string is not
well-formed.
If any definite-length text string inside an indefinite-length text
string is invalid, the indefinite-length text string is invalid.
- Note that this implies that the bytes of a single UTF-8 character
- cannot be split up between chunks: a new chunk of a text string can
- only be started at a character boundary.
+ Note that this implies that the UTF-8 bytes of a single Unicode code
+ point (scalar value) cannot be spread between chunks: a new chunk of
+ a text string can only be started at a code point boundary.
For example, assume an encoded data item consisting of the bytes:
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
5F -- Start indefinite-length byte string
44 -- Byte string of length 4
aabbccdd -- Bytes content
43 -- Byte string of length 3
eeff99 -- Bytes content
@@ -843,21 +847,22 @@
| 32..255 | (Unassigned) |
+---------+-----------------+
Table 4: Simple Values
An encoder MUST NOT issue two-byte sequences that start with 0xf8
(major type = 7, additional information = 24) and continue with a
byte less than 0x20 (32 decimal). Such sequences are not well-
formed. (This implies that an encoder cannot encode false, true,
null, or undefined in two-byte sequences, only the one-byte variants
- of these are well-formed.)
+ of these are well-formed; more generally speaking, each simple value
+ only has a single representation variant).
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
IEEE 754 binary floating-point values [IEEE754]. These floating-
point values are encoded in the additional bytes of the appropriate
size. (See Appendix D for some information about 16-bit floating-
point numbers.)
3.4. Tagging of Items
In CBOR, a data item can be enclosed by a tag to give it some
@@ -888,21 +893,21 @@
for instance as 0x01, 0x1801, or 0x190001. The tag definition may
include the definition of a preferred serialization (Section 4.1)
that is recommended for generic encoders; this may prefer basic
generic data model representations over ones that employ a tag.
The tag definition usually restricts what kinds of nested data item
or items are valid for such tags. Tag definitions may restrict their
content to a very specific syntactic structure, as the tags defined
in this document do, or they may aim at a more semantically defined
definition of their content, as for instance tags 40 and 1040 do
- [rfc8746]: These accept a number of different ways of representing
+ [RFC8746]: These accept a number of different ways of representing
arrays.
As a matter of convention, many tags do not accept null or undefined
values as tag content; instead, the expectation is that a null or
undefined value can be used in place of the entire tag; Section 3.4.2
provides some further considerations for one specific tag about the
handling of this convention in application protocols and in mapping
to platform types.
Decoders do not need to understand tags of every tag number, and tags
@@ -977,28 +982,37 @@
| | | Section 3.4.6 |
+------------+-------------+----------------------------------+
Table 5: Tag numbers defined in RFC 7049
Conceptually, tags are interpreted in the generic data model, not at
(de-)serialization time. A small number of tags (specifically, tag
number 25 and tag number 29) have been registered with semantics that
may require processing at (de-)serialization time: The decoder needs
to be aware and the encoder needs to be in control of the exact
- sequence in which data items are encoded into the CBOR data stream.
+ sequence in which data items are encoded into the CBOR data item.
This means these tags cannot be implemented on top of every generic
CBOR encoder/decoder (which might not reflect the serialization order
for entries in a map at the data model level and vice versa); their
implementation therefore typically needs to be integrated into the
generic encoder/decoder. The definition of new tags with this
property is NOT RECOMMENDED.
+ IANA allocated tag numbers 65535, 4294967295, and
+ 18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit).
+ These can be used as a convenience for implementers that want a
+ single integer to indicate either that a specific tag is present, or
+ the absence of a tag. That allocation is described in Section 10 of
+ [I-D.bormann-cbor-notable-tags]. These tags are not intended to
+ occur in actual CBOR data items; implementations may flag such an
+ occurrence as an error.
+
Protocols using tag numbers 0 and 1 extend the generic data model
(Section 2) with data items representing points in time; tag numbers
2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5,
with floating-point values of arbitrary size and precision.
3.4.1. Standard Date/Time String
Tag number 0 contains a text string in the standard format described
by the "date-time" production in [RFC3339], as refined by Section 3.3
of [RFC4287], representing the point in time described there. A
@@ -1154,25 +1168,26 @@
The tags in this section are for content hints that might be used by
generic CBOR processors. These content hints do not extend the
generic data model.
3.4.5.1. Encoded CBOR Data Item
Sometimes it is beneficial to carry an embedded CBOR data item that
is not meant to be decoded immediately at the time the enclosing data
item is being decoded. Tag number 24 (CBOR data item) can be used to
- tag the embedded byte string as a data item encoded in CBOR format.
- Contained items that aren't byte strings are invalid. A contained
- byte string is valid if it encodes a well-formed CBOR item; validity
- checking of the decoded CBOR item is not required for tag validity
- (but could be offered by a generic decoder as a special option).
+ tag the embedded byte string as a single data item encoded in CBOR
+ format. Contained items that aren't byte strings are invalid. A
+ contained byte string is valid if it encodes a well-formed CBOR data
+ item; validity checking of the decoded CBOR item is not required for
+ tag validity (but could be offered by a generic decoder as a special
+ option).
3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters
Tags number 21 to 23 indicate that a byte string might require a
specific encoding when interoperating with a text-based
representation. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a
particular JSON-based usage. That usage specifies that some strings
are encoded as base64, base64url, and so on. The encoder uses byte
strings instead of doing the encoding itself to reduce the message
@@ -1198,33 +1213,33 @@
whitespace, or other additional characters. Tag number 23 suggests
conversion to base16 (hex) encoding, with uppercase alphabetics (see
Section 8 of RFC 4648). Note that, for all three tag numbers, the
encoding of the empty byte string is the empty text string.
3.4.5.3. Encoded Text
Some text strings hold data that have formats widely used on the
Internet, and sometimes those formats can be validated and presented
to the application in appropriate form by the decoder. There are
- tags for some of these formats. As with tag numbers 21 to 23, if
- these tags are applied to an item other than a text string, they
- apply to all text string data items it contains.
+ tags for some of these formats.
* Tag number 32 is for URIs, as defined in [RFC3986]. If the text
string doesn't match the "URI-reference" production, the string is
invalid.
* Tag numbers 33 and 34 are for base64url- and base64-encoded text
strings, respectively, as defined in [RFC4648]. If any of:
- the encoded text string contains non-alphabet characters or
- only 1 character in the last block of 4, or
+ only 1 alphabet character in the last block of 4 (where
+ alphabet is defined by Section 5 of [RFC4648] for tag number 33
+ and Section 4 of [RFC4648] for tag number 34), or
- the padding bits in a 2- or 3-character block are not 0, or
- the base64 encoding has the wrong number of padding characters,
or
- the base64url encoding has padding characters,
the string is invalid.
@@ -1255,24 +1270,26 @@
In many applications, it will be clear from the context that CBOR is
being employed for encoding a data item. For instance, a specific
protocol might specify the use of CBOR, or a media type is indicated
that specifies its use. However, there may be applications where
such context information is not available, such as when CBOR data is
stored in a file that does not have disambiguating metadata. Here,
it may help to have some distinguishing characteristics for the data
itself.
- Tag number 55799 is defined for this purpose. It does not impart any
- special semantics on the data item that it encloses; that is, the
- semantics of the tag content enclosed in tag number 55799 is exactly
- identical to the semantics of the tag content itself.
+ Tag number 55799 is defined for this purpose, specifically for use at
+ the start of a stored encoded CBOR data item as specified by an
+ application. It does not impart any special semantics on the data
+ item that it encloses; that is, the semantics of the tag content
+ enclosed in tag number 55799 is exactly identical to the semantics of
+ the tag content itself.
The serialization of this tag's head is 0xd9d9f7, which does not
appear to be in use as a distinguishing mark for any frequently used
file types. In particular, 0xd9d9f7 is not a valid start of a
Unicode text in any Unicode encoding if it is followed by a valid
CBOR data item.
For instance, a decoder might be able to decode both CBOR and JSON.
Such a decoder would need to mechanically distinguish the two
formats. An easy way for an encoder to help the decoder would be to
@@ -1576,55 +1593,55 @@
Note that some applications and protocols will not want to use
indefinite-length encoding. Using indefinite-length encoding allows
an encoder to not need to marshal all the data for counting, but it
requires a decoder to allocate increasing amounts of memory while
waiting for the end of the item. This might be fine for some
applications but not others.
5.2. Generic Encoders and Decoders
- A generic CBOR decoder can decode all well-formed CBOR data and
- present them to an application. See Appendix C.
+ A generic CBOR decoder can decode all well-formed encoded CBOR data
+ items and present the data items to an application. See Appendix C.
+ (The diagnostic notation, Section 8, may be used to present well-
+ formed CBOR values to humans.)
+
+ Generic CBOR encoders provide an application interface that allows
+ the application to specify any well-formed value to be encoded as a
+ CBOR data item, including simple values and tags unknown to the
+ encoder.
Even though CBOR attempts to minimize these cases, not all well-
formed CBOR data is valid: for example, the encoded text string
- "0x62c0ae" does not contain valid UTF-8 and so is not a valid CBOR
- item. Also, specific tags may make semantic constraints that may be
- violated, such as a bignum tag enclosing another tag, or an instance
- of tag number 0 containing a byte string, or containing a text string
- with contents that do not match [RFC3339]'s "date-time" production.
- There is no requirement that generic encoders and decoders make
- unnatural choices for their application interface to enable the
- processing of invalid data. Generic encoders and decoders are
- expected to forward simple values and tags even if their specific
+ "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires
+ always using the shortest form) and so is not a valid CBOR item.
+ Also, specific tags may make semantic constraints that may be
+ violated, for instance by a bignum tag enclosing another tag, or by
+ an instance of tag number 0 containing a byte string, or containing a
+ text string with contents that do not match [RFC3339]'s "date-time"
+ production. There is no requirement that generic encoders and
+ decoders make unnatural choices for their application interface to
+ enable the processing of invalid data. Generic encoders and decoders
+ are expected to forward simple values and tags even if their specific
codepoints are not registered at the time the encoder/decoder is
written (Section 5.4).
- Generic decoders provide ways to present well-formed CBOR values,
- both valid and invalid, to an application. The diagnostic notation
- (Section 8) may be used to present well-formed CBOR values to humans.
-
- Generic encoders provide an application interface that allows the
- application to specify any well-formed value, including simple values
- and tags unknown to the encoder.
-
5.3. Validity of Items
- A well-formed but invalid CBOR data item presents a problem with
- interpreting the data encoded in it in the CBOR data model. A CBOR-
- based protocol could be specified in several layers, in which the
- lower layers don't process the semantics of some of the CBOR data
- they forward. These layers can't notice any validity errors in data
- they don't process and MUST forward that data as-is. The first layer
- that does process the semantics of an invalid CBOR item MUST take one
- of two choices:
+ A well-formed but invalid CBOR data item (Section 1.2) presents a
+ problem with interpreting the data encoded in it in the CBOR data
+ model. A CBOR-based protocol could be specified in several layers,
+ in which the lower layers don't process the semantics of some of the
+ CBOR data they forward. These layers can't notice any validity
+ errors in data they don't process and MUST forward that data as-is.
+ The first layer that does process the semantics of an invalid CBOR
+ item MUST take one of two choices:
1. Replace the problematic item with an error marker and continue
with the next item, or
2. Issue an error and stop processing altogether.
A CBOR-based protocol MUST specify which of these options its
decoders take, for each kind of invalid item they might encounter.
Such problems might occur at the basic validity level of CBOR or in
@@ -1636,21 +1653,21 @@
model:
Duplicate keys in a map: Generic decoders (Section 5.2) make data
available to applications using the native CBOR data model. That
data model includes maps (key-value mappings with unique keys),
not multimaps (key-value mappings where multiple entries can have
the same key). Thus, a generic decoder that gets a CBOR map item
that has duplicate keys will decode to a map with only one
instance of that key, or it might stop processing altogether. On
the other hand, a "streaming decoder" may not even be able to
- notice (Section 5.6).
+ notice. See Section 5.6 for more discussion of keys in maps.
Invalid UTF-8 string: A decoder might or might not want to verify
that the sequence of bytes in a UTF-8 string (major type 3) is
actually valid UTF-8 and react appropriately.
5.3.2. Tag validity
Two additional kinds of validity errors are introduced by adding tags
to the basic generic data model:
@@ -2029,47 +2047,52 @@
7.1. Extension Points
In a protocol design, opportunities for evolution are often included
in the form of extension points. For example, there may be a
codepoint space that is not fully allocated from the outset, and the
protocol is designed to tolerate and embrace implementations that
start using more codepoints than initially allocated.
Sizing the codepoint space may be difficult because the range
- required may be hard to predict. An attempt should be made to make
- the codepoint space large enough so that it can slowly be filled over
- the intended lifetime of the protocol.
+ required may be hard to predict. Protocol designs should attempt to
+ make the codepoint space large enough so that it can slowly be filled
+ over the intended lifetime of the protocol.
CBOR has three major extension points:
* the "simple" space (values in major type 7). Of the 24 efficient
(and 224 slightly less efficient) values, only a small number have
been allocated. Implementations receiving an unknown simple data
- item may be able to process it as such, given that the structure
- of the value is indeed simple. The IANA registry in Section 9.1
- is the appropriate way to address the extensibility of this
- codepoint space.
+ item may easily be able to process it as such, given that the
+ structure of the value is indeed simple. The IANA registry in
+ Section 9.1 is the appropriate way to address the extensibility of
+ this codepoint space.
- * the "tag" space (values in major type 6). Again, only a small
- part of the codepoint space has been allocated, and the space is
- abundant (although the early numbers are more efficient than the
- later ones). Implementations receiving an unknown tag number can
- choose to simply ignore it (process just the enclosed tag content)
- or to process it as an unknown tag number wrapping the tag
- content. The IANA registry in Section 9.2 is the appropriate way
- to address the extensibility of this codepoint space.
+ * the "tag" space (values in major type 6). The total codepoint
+ space is abundant; only a tiny part of it has been allocated.
+ However, not all of these codepoints are equally efficient: the
+ first 24 only consume a single ("1+0") byte, and half of them have
+ already been allocated. The next 232 values only consume two
+ ("1+1") bytes, with nearly a quarter already allocated. These
+ subspaces need some curation to last for a few more decades.
+ Implementations receiving an unknown tag number can choose to
+ process just the enclosed tag content or, preferably, to process
+ the tag as an unknown tag number wrapping the tag content. The
+ IANA registry in Section 9.2 is the appropriate way to address the
+ extensibility of this codepoint space.
* the "additional information" space. An implementation receiving
an unknown additional information value has no way to continue
- decoding, so allocating codepoints to this space is a major step.
- There are also very few codepoints left. See also Section 7.2.
+ decoding, so allocating codepoints in this space is a major step
+ beyond just exercising an extension point. There are also very
+ few codepoints left. See also Section 7.2.
7.2. Curating the Additional Information Space
The human mind is sometimes drawn to filling in little perceived gaps
to make something neat. We expect the remaining gaps in the
codepoint space for the additional information values to be an
attractor for new ideas, just because they are there.
The present specification does not manage the additional information
codepoint space by an IANA registry. Instead, allocations out of
@@ -2190,37 +2213,44 @@
New entries in the range 32 to 255 are assigned by Specification
Required.
9.2. Tags Registry
IANA has created the "Concise Binary Object Representation (CBOR)
Tags" registry at [IANA.cbor-tags]. The tags that were defined in
[RFC7049] are described in detail in Section 3.4, and other tags have
already been defined.
- New entries in the range 0 to 23 are assigned by Standards Action.
- New entries in the range 24 to 255 are assigned by Specification
- Required. New entries in the range 256 to 18446744073709551615 are
- assigned by First Come First Served. The template for registration
- requests is:
+ New entries in the range 0 to 23 ("1+0") are assigned by Standards
+ Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767
+ (lower half of "1+2") are assigned by Specification Required. New
+ entries in the range 32768 to 18446744073709551615 (upper half of
+ "1+2", "1+4", and "1+8") are assigned by First Come First Served.
+ The template for registration requests is:
* Data item
* Semantics (short form)
In addition, First Come First Served requests should include:
* Point of contact
* Description of semantics (URL) - This description is optional; the
URL can point to something like an Internet-Draft or a web page.
+ Applicants exercising the First Come First Served range and making a
+ suggestion for a tag number that is not representable in 32 bits
+ (i.e., larger than 4294967295) should be aware that this could reduce
+ interoperability with implementations that do not support 64-bit
+ numbers.
+
9.3. Media Type ("MIME Type")
The Internet media type [RFC6838] for a single encoded CBOR data item
is application/cbor, as defined in [IANA.media-types]:
Type name: application
Subtype name: cbor
Required parameters: n/a
@@ -2305,30 +2336,26 @@
"xxx/yyy+cbor".
Security Considerations: See Section 10 of this document
Contact: IETF CBOR Working Group cbor@ietf.org
(mailto:cbor@ietf.org) or IETF Applications and Real-Time Area
art@ietf.org (mailto:art@ietf.org)
Author/Change Controller: The IESG iesg@ietf.org
(mailto:iesg@ietf.org)
- // Editors' note: RFC 6838 has a template
- field Author/Change
- // controller, the descriptive text of
- which makes clear that this is
- // the change controller, not the author.
- Go figure. There is no
- // separate author entry as in the media
- types registry. (RFC
- // editor: Please remove this note before
- publication.)
+ // Editors' note: RFC 6838 has a template field Author/Change
+ // controller, the descriptive text of which makes clear that this
+ is
+ // the change controller, not the author. Go figure. There is no
+ // separate author entry as in the media types registry. (RFC
+ // editor: Please remove this note before publication.)
10. Security Considerations
A network-facing application can exhibit vulnerabilities in its
processing logic for incoming data. Complex parsers are well known
as a likely source of such vulnerabilities, such as the ability to
remotely crash a node, or even remotely execute arbitrary code on it.
CBOR attempts to narrow the opportunities for introducing such
vulnerabilities by reducing parser complexity, by giving the entire
range of encodable values a meaning where possible.
@@ -2368,27 +2395,27 @@
input is in alignment with the application protocol that is
serialized in CBOR.
The input check itself may consume resources. This is usually linear
in the size of the input, which means that an attacker has to spend
resources that are commensurate to the resources spent by the
defender on input validation. Processing for arbitrary-precision
numbers may exceed linear effort. Also, some hash-table
implementations that are used by decoders to build in-memory
representations of maps can be attacked to spend quadratic effort,
- unless a secret key is employed (see Section 7 of [SIPHASH]). Such
- superlinear efforts can be employed by an attacker to exhaust
- resources at or before the input validator; they therefore need to be
- avoided in a CBOR decoder implementation. Note that tag number
- definitions and their implementations can add security considerations
- of this kind; this should then be discussed in the security
- considerations of the tag number definition.
+ unless a secret key (see Section 7 of [SIPHASH]) or some other
+ mitigation is employed. Such superlinear efforts can be exploited by
+ an attacker to exhaust resources at or before the input validator;
+ they therefore need to be avoided in a CBOR decoder implementation.
+ Note that tag number definitions and their implementations can add
+ security considerations of this kind; this should then be discussed
+ in the security considerations of the tag number definition.
CBOR encoders do not receive input directly from the network and are
thus not directly attackable in the same way as CBOR decoders.
However, CBOR encoders often have an API that takes input from
another level in the implementation and can be attacked through that
API. The design and implementation of that API should assume the
behavior of its caller may be based on hostile input or on coding
mistakes. It should check inputs for buffer overruns, overflow and
underflow of integer arithmetic, and other such errors that are aimed
to disrupt the encoder.
@@ -2475,20 +2502,26 @@
[ASN.1] International Telecommunication Union, "Information
Technology -- ASN.1 encoding rules: Specification of Basic
Encoding Rules (BER), Canonical Encoding Rules (CER) and
Distinguished Encoding Rules (DER)", ITU-T Recommendation
X.690, 1994.
[BSON] Various, "BSON - Binary JSON", 2013,
.
+ [I-D.bormann-cbor-notable-tags]
+ Bormann, C., "Notable CBOR Tags", Work in Progress,
+ Internet-Draft, draft-bormann-cbor-notable-tags-01, 15 May
+ 2020, .
+
[IANA.cbor-simple-values]
IANA, "Concise Binary Object Representation (CBOR) Simple
Values",
.
[IANA.cbor-tags]
IANA, "Concise Binary Object Representation (CBOR) Tags",
.
[IANA.core-parameters]
@@ -2555,25 +2588,20 @@
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
.
[RFC8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020,
.
- [rfc8746] Bormann, C., Ed., "Concise Binary Object Representation
- (CBOR) Tags for Typed Arrays", RFC 8746,
- DOI 10.17487/RFC8746, February 2020,
- .
-
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture
Notes in Computer Science pp. 489-508, 2012,
.
[YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
Language (YAML[TM]) Version 1.2", 3rd Edition, October
2009, .
Appendix A. Examples
@@ -2940,21 +2968,21 @@
* uint() converts a byte string into an unsigned integer by
interpreting the byte string in network byte order.
* Arithmetic works as in C.
* All variables are unsigned integers of sufficient range.
Note that "well_formed" returns the major type for well-formed
definite length items, but 0 for an indefinite length item (or -1 for
- a break stop code, only if "breakable" is set). This is used in
+ a "break" stop code, only if "breakable" is set). This is used in
"well_formed_indefinite" to ascertain that indefinite length strings
only contain definite length strings as chunks.
well_formed (breakable = false) {
// process initial bytes
ib = uint(take(1));
mt = ib >> 5;
val = ai = ib & 0x1f;
switch (ai) {
case 24: val = uint(take(1)); break;
@@ -3111,21 +3139,21 @@
E.1. ASN.1 DER, BER, and PER
[ASN.1] has many serializations. In the IETF, DER and BER are the
most common. The serialized output is not particularly compact for
many items, and the code needed to decode numeric items can be
complex on a constrained device.
Few (if any) IETF protocols have adopted one of the several variants
of Packed Encoding Rules (PER). There could be many reasons for
this, but one that is commonly stated is that PER makes use of the
- schema even for parsing the surface structure of the data stream,
+ schema even for parsing the surface structure of the data item,
requiring significant tool support. There are different versions of
the ASN.1 schema language in use, which has also hampered adoption.
E.2. MessagePack
[MessagePack] is a concise, widely implemented counted binary
serialization format, similar in many properties to CBOR, although
somewhat less regular. While the data model can be used to represent
JSON data, MessagePack has also been used in many remote procedure
call (RPC) applications and for long-term storage of data.
@@ -3184,89 +3212,21 @@
| | 00 00 04 31 00 13 00 00 00 | |
| | 10 30 00 02 00 00 00 10 31 | |
| | 00 03 00 00 00 00 00 | |
+-------------+----------------------------+----------------+
| CBOR | 82 01 82 02 03 | 9f 01 82 02 03 |
| | | ff |
+-------------+----------------------------+----------------+
Table 8: Examples for Different Levels of Conciseness
-Appendix F. Changes from RFC 7049
-
- The following is a list of known changes from RFC 7049. This list is
- non-authoritative. It is meant to help reviewers see the significant
- differences.
-
- * Made some use of new RFCXML functionality [RFC7991]
-
- * Updated references, e.g. for [RFC4627] to [RFC8259] in many
- places, for [CNN-TERMS] to [RFC7228]; added missing reference to
- [IEEE754] and updated to [ECMA262]
-
- * Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and
- in the last paragraph of Section 3.6 ("0b000_11101" ->
- "0b000_11001")
-
- * Added a comment to the last example in Section 3.2.2 (added
- "Second value")
-
- * Applied numerous small editorial changes
-
- * Added a few tables for illustration
-
- * More stringently used terminology for well-formed and valid data,
- avoiding less well-defined alternative terms such as "syntax
- error", "decoding error" and "strict mode" outside examples
-
- * Streamlined terminology to talk about tags, tag numbers, and tag
- content
-
- * Clarified the restrictions on tag content, in general and
- specifically for tag 1
-
- * Added text about the CBOR data model and its small variations
- (basic generic, extended generic, specific)
-
- * More clearly separated integers from floating-point values;
- provided a suggestion (based on I-JSON [RFC7493]) for handling
- these types when converting JSON to CBOR
-
- * Added term "preferred serialization" and defined it for various
- kinds of data items
-
- * Added comment about tags with semantics that depend on
- serialization order
-
- * Defined "deterministic encoding", making use of "preferred
- serialization", and simplified the suggested map ordering for the
- "Core Deterministic Encoding Requirements", easing implementation,
- while keeping RFC 7049 map ordering as an alternative "length-
- first map key ordering"; now avoiding the terms "canonical" and
- "canonicalization"
-
- * Clarified map validity (handling of duplicate keys) and explained
- the domain of applicability of certain implementation choices
-
- * Updated IANA considerations
-
- * Added security considerations
-
- * Clarified handling of non-well-formed simple values in text and
- pseudocode
-
- * Added Appendix G, well-formedness errors and examples
-
- * Removed UBJSON from Appendix E, as that format has completely
- changed since RFC 7049; added reference to [RFC8618]
-
-Appendix G. Well-formedness errors and examples
+Appendix F. Well-formedness errors and examples
There are three basic kinds of well-formedness errors that can occur
in decoding a CBOR data item:
* Too much data: There are input bytes left that were not consumed.
This is only an error if the application assumed that the input
bytes would span exactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in
CBOR sequences [RFC8742], the CBOR decoder can simply indicate
@@ -3295,66 +3255,66 @@
calling fail(), in order:
* a reserved value is used for additional information (28, 29, 30)
* major type 7, additional information 24, value < 32 (incorrect or
incorrectly encoded simple type)
* incorrect substructure of indefinite length byte/text string (may
only contain definite length strings of the same major type)
- * break stop code (mt=7, ai=31) occurs in a value position of a map
- or except at a position directly in an indefinite length item
+ * "break" stop code (mt=7, ai=31) occurs in a value position of a
+ map or except at a position directly in an indefinite length item
where also another enclosed data item could occur
* additional information 31 used with major type 0, 1, or 6
-G.1. Examples for CBOR data items that are not well-formed
+F.1. Examples for CBOR data items that are not well-formed
This subsection shows a few examples for CBOR data items that are not
well-formed. Each example is a sequence of bytes each shown in
hexadecimal; multiple examples in a list are separated by commas.
Examples for well-formedness error kind 1 (too much data) can easily
be formed by adding data to a well-formed encoded CBOR data item.
Similarly, examples for well-formedness error kind 2 (too little
data) can be formed by truncating a well-formed encoded CBOR data
item. In test suites, it may be beneficial to specifically test with
incomplete data items that would require large amounts of addition to
be completed (for instance by starting the encoding of a string of a
very large size).
A premature end of the input can occur in a head or within the
enclosed data, which may be bare strings or enclosed data items that
- are either counted or should have been ended by a break stop code.
+ are either counted or should have been ended by a "break" stop code.
* End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa
00 00, fb 00 00 00
* Definite length strings with short data: 41, 61, 5a ff ff ff ff
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
ff ff ff ff ff ff ff 01 02 03
* Definite length maps and arrays not closed with enough items: 81,
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
00
* Tag number not followed by tag content: c0
- * Indefinite length strings not closed by a break stop code: 5f 41
+ * Indefinite length strings not closed by a "break" stop code: 5f 41
00, 7f 61 00
- * Indefinite length maps and arrays not closed by a break stop code:
- 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f
- ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
+ * Indefinite length maps and arrays not closed by a "break" stop
+ code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f
+ 9f 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
A few examples for the five subkinds of well-formedness error kind 3
(syntax error) are shown below.
Subkind 1:
* Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
fd, fe,
@@ -3381,20 +3341,153 @@
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82
9f 81 9f 9f ff ff ff ff
* Break in indefinite length map would lead to odd number of items
(break in a value position): bf 00 ff, bf 00 00 00 ff
Subkind 5:
* Major type 0, 1, 6 with additional information 31: 1f, 3f, df
+Appendix G. Changes from RFC 7049
+
+ As discussed in the introduction, this document is a revised edition
+ of RFC 7049, with editorial improvements, added detail, and fixed
+ errata. This document formally obsoletes RFC 7049, while keeping
+ full compatibility of the interchange format from RFC 7049. This
+ document does not create a new version of the format.
+
+G.1. Errata processing, clerical changes
+
+ The two verified errata on RFC 7049, EID 3764 and EID 3770, concerned
+ two encoding examples in the text that have been corrected
+ (Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" ->
+ "0b000_11001"). Also, RFC 7049 contained an example using the simple
+ type value 24 (EID 5917), which is not well-formed; this example has
+ been removed. Errata report 5763 pointed to an accident in the
+ wording of the definition of tags; this was resolved during a re-
+ write of Section 3.4. Errata report 5434 pointed out that the UBJSON
+ example in Appendix E no longer complied with the version of UBJSON
+ current at the time of submitting the report. It turned out that the
+ UBJSON specification had completely changed since 2013; this example
+ therefore also was removed. Further errata reports (4409, 4963,
+ 4964) complained that the map key sorting rules for canonical
+ encoding were onerous; these led to a reconsideration of the
+ canonical encoding suggestions and replacement by the deterministic
+ encoding suggestions (described below). An editorial suggestion in
+ errata report 4294 was also implemented (improved symmetry by adding
+ "Second value" to a comment to the last example in Section 3.2.2).
+
+ Other more clerical changes include:
+
+ * use of new RFCXML functionality [RFC7991];
+
+ * explain some more of the notation used;
+
+ * updated references, e.g. for RFC4627 to [RFC8259] in many places,
+ for CNN-TERMS to [RFC7228]; added missing reference to [IEEE754]
+ (importing required definitions) and updated to [ECMA262]; added a
+ reference to [RFC8618] that further illustrates the discussion in
+ Appendix E;
+
+ * the discussion of diagnostic notation mentions the "Extended
+ Diagnostic Notation" (EDN) defined in [RFC8610];
+
+ * the addition of this appendix.
+
+G.2. Changes in IANA considerations
+
+ The IANA considerations were generally updated (clerical changes,
+ e.g., now pointing to the CBOR working group as the author of the
+ specification). References to the respective IANA registries have
+ been added to the informative references.
+
+ Tags in the space from 256 to 32767 (lower half of "1+2") are no
+ longer assigned by First Come First Served; this range is now
+ Specification Required.
+
+G.3. Changes in suggestions and other informational components
+
+ In revising the document, beyond processing errata reports, the WG
+ could use nearly seven years of experience with the use of CBOR in a
+ diverse set of applications. This led to a number of editorial
+ changes, including adding tables for illustration, but also to
+ emphasizing some aspects and de-emphasizing others.
+
+ A significant addition in this revision is Section 2, which discusses
+ the CBOR data model and its small variations involved in the
+ processing of CBOR. Introducing terms for those (basic generic,
+ extended generic, specific) enables more concise language in other
+ places of the document, but also helps in clarifying expectations on
+ implementations and on the extensibility features of the format.
+
+ RFC 7049, as a format derived from the JSON ecosystem, was influenced
+ by the JSON number system that was in turn inherited from JavaScript
+ at the time. JSON does not provide distinct integers and floating
+ point values (and the latter are decimal in the format). CBOR
+ provides binary representations of numbers, which do differ between
+ integers and floating point values. Experience from implementation
+ and use now suggested that the separation between these two number
+ domains should be more clearly drawn in the document; language that
+ suggested an integer could seamlessly stand in for a floating point
+ value was removed. Also, a suggestion (based on I-JSON [RFC7493])
+ was added for handling these types when converting JSON to CBOR.
+
+ For a single value in the data model, CBOR often provides multiple
+ encoding options. The revision adds a new section Section 4, which
+ first introduces the term "preferred serialization" (Section 4.1) and
+ defines it for various kinds of data items. On the basis of this
+ terminology, the section goes on to discuss how a CBOR-based protocol
+ can define "deterministic encoding" (Section 4.2), which now avoids
+ the RFC 7049 terms "canonical" and "canonicalization". The
+ suggestion of "Core Deterministic Encoding Requirements"
+ Section 4.2.1 enables generic support for such protocol-defined
+ encoding requirements. The present revision further eases the
+ implementation of deterministic encoding by simplifying the map
+ ordering suggested in RFC 7049 to simple lexicographic ordering of
+ encoded keys. A description of the older suggestion is kept as an
+ alternative, now termed "length-first map key ordering"
+ (Section 4.2.3).
+
+ The terminology for well-formed and valid data was sharpened and more
+ stringently used, avoiding less well-defined alternative terms such
+ as "syntax error", "decoding error" and "strict mode" outside
+ examples. Also, a third level of requirements beyond CBOR-level
+ validity that an application has on its input data is now explicitly
+ called out. Well-formed (processable at all), valid (checked by a
+ validity-checking generic decoder), and expected input (as checked by
+ the application) are treated as a hierarchy of layers of
+ acceptability.
+
+ The handling of non-well-formed simple values was clarified in text
+ and pseudocode. Appendix F was added to discuss well-formedness
+ errors and provide examples for them.
+
+ The discussion of validity has been sharpened in two areas. Map
+ validity (handling of duplicate keys) was clarified and the domain of
+ applicability of certain implementation choices explained. Also,
+ while streamlining the terminology for tags, tag numbers, and tag
+ content, discussion was added on tag validity, and the restrictions
+ pwere clarified on tag content, in general and specifically for tag
+ 1.
+
+ An implementation note (and note for future tag definitions) was
+ added to Section 3.4 about defining tags with semantics that depend
+ on serialization order.
+
+ Terminology was introduced in Section 3 for "argument" and "head",
+ simplifying further discussion.
+
+ The security considerations were mostly rewritten and significantly
+ expanded; in multiple other places, the document is now more explicit
+ that a decoder cannot simply condone well-formedness errors.
+
Acknowledgements
CBOR was inspired by MessagePack. MessagePack was developed and
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to
MessagePack is solely for attribution; CBOR is not intended as a
version of or replacement for MessagePack, as it has different design
goals and requirements.
The need for functionality beyond the original MessagePack
Specification became obvious to many people at about the same time