draft-ietf-cbor-7049bis-14.txt   draft-ietf-cbor-7049bis-15.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI Internet-Draft Universitaet Bremen TZI
Obsoletes: 7049 (if approved) P. Hoffman Obsoletes: 7049 (if approved) P. Hoffman
Intended status: Standards Track ICANN Intended status: Standards Track ICANN
Expires: 19 December 2020 17 June 2020 Expires: 28 March 2021 24 September 2020
Concise Binary Object Representation (CBOR) Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-14 draft-ietf-cbor-7049bis-15
Abstract Abstract
The Concise Binary Object Representation (CBOR) is a data format The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack. earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial This document is a revised edition of RFC 7049, with editorial
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 19 December 2020. This Internet-Draft will expire on 28 March 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 33 skipping to change at page 2, line 33
as described in Section 4.e of the Trust Legal Provisions and are as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 8 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 8
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 9 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 9
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 10
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 14 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 14
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 14 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 14
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 15
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 17
3.2.4. Summary of indefinite-length use of major types . . . 17 3.2.4. Summary of indefinite-length use of major types . . . 18
3.3. Floating-Point Numbers and Values with No Content . . . . 18 3.3. Floating-Point Numbers and Values with No Content . . . . 18
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 20
3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22 3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 23
3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 23 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 23
3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24 3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 25
3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 26 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 26
3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 26 3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 27
3.4.5.2. Expected Later Encoding for CBOR-to-JSON 3.4.5.2. Expected Later Encoding for CBOR-to-JSON
Converters . . . . . . . . . . . . . . . . . . . . 26 Converters . . . . . . . . . . . . . . . . . . . . 27
3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 27 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 28
3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 28 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 29
4. Serialization Considerations . . . . . . . . . . . . . . . . 29 4. Serialization Considerations . . . . . . . . . . . . . . . . 29
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 29 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 29
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 30 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 31
4.2.1. Core Deterministic Encoding Requirements . . . . . . 30 4.2.1. Core Deterministic Encoding Requirements . . . . . . 31
4.2.2. Additional Deterministic Encoding Considerations . . 31 4.2.2. Additional Deterministic Encoding Considerations . . 32
4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 33 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 34
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 34 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 35
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 35 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 35
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 35 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 36
5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 36 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 37
5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 36 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 37
5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 37 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 37
5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 37 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 38
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 39 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 40
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 41 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 42
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 42 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 43
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 42 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 43
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 42 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 43
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 43 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 44
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 44 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 46
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 45 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 46
7.2. Curating the Additional Information Space . . . . . . . . 46 7.2. Curating the Additional Information Space . . . . . . . . 47
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 46 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 47
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 47 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 49
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 48 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 50
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 48 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 50
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 49 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 51
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 50 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 51
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 50 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 52
10. Security Considerations . . . . . . . . . . . . . . . . . . . 51 10. Security Considerations . . . . . . . . . . . . . . . . . . . 53
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 53 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 56
11.1. Normative References . . . . . . . . . . . . . . . . . . 53 11.1. Normative References . . . . . . . . . . . . . . . . . . 56
11.2. Informative References . . . . . . . . . . . . . . . . . 54 11.2. Informative References . . . . . . . . . . . . . . . . . 57
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 57 Appendix A. Examples of Encoded CBOR Data Items . . . . . . . . 60
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 61 Appendix B. Jump Table for Initial Byte . . . . . . . . . . . . 64
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 64 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 67
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 66 Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 69
Appendix E. Comparison of Other Binary Formats to CBOR's Design Appendix E. Comparison of Other Binary Formats to CBOR's Design
Objectives . . . . . . . . . . . . . . . . . . . . . . . 67 Objectives . . . . . . . . . . . . . . . . . . . . . . . 70
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 68 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 71
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 68 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 71
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 69 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 72
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 69 E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 72
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 69 E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 72
Appendix F. Well-formedness errors and examples . . . . . . . . 70 Appendix F. Well-formedness errors and examples . . . . . . . . 73
F.1. Examples for CBOR data items that are not well-formed . . 71 F.1. Examples for CBOR data items that are not well-formed . . 74
Appendix G. Changes from RFC 7049 . . . . . . . . . . . . . . . 73 Appendix G. Changes from RFC 7049 . . . . . . . . . . . . . . . 76
G.1. Errata processing, clerical changes . . . . . . . . . . . 73 G.1. Errata processing, clerical changes . . . . . . . . . . . 76
G.2. Changes in IANA considerations . . . . . . . . . . . . . 74 G.2. Changes in IANA considerations . . . . . . . . . . . . . 77
G.3. Changes in suggestions and other informational G.3. Changes in suggestions and other informational
components . . . . . . . . . . . . . . . . . . . . . . . 74 components . . . . . . . . . . . . . . . . . . . . . . . 77
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 76 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 79
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 79
1. Introduction 1. Introduction
There are hundreds of standardized formats for binary representation There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER [ASN.1]. formats in the latter category are ASN.1's BER and DER [ASN.1].
The format defined here follows some specific design goals that are The format defined here follows some specific design goals that are
skipping to change at page 7, line 34 skipping to change at page 7, line 34
Stream decoder: A process that decodes a data stream and makes each Stream decoder: A process that decodes a data stream and makes each
of the data items in the sequence available to an application as of the data items in the sequence available to an application as
they are received. they are received.
Terms and concepts for floating-point values such as Infinity, NaN Terms and concepts for floating-point values such as Infinity, NaN
(not a number), negative zero, and subnormal are defined in (not a number), negative zero, and subnormal are defined in
[IEEE754]. [IEEE754].
Where bit arithmetic or data types are explained, this document uses Where bit arithmetic or data types are explained, this document uses
the notation familiar from the programming language C, except that the notation familiar from the programming language C [C], except
"**" denotes exponentiation. Similar to the "0x" notation for that "**" denotes exponentiation and ".." denotes a range that
hexadecimal numbers, numbers in binary notation are prefixed with includes both ends given. Examples and pseudocode assume that signed
"0b". Underscores can be added to a number solely for readability, integers use two's complement representation and that right shifts of
so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the signed integers perform sign extension; these assumptions are also
desired interpretation of the bits in the byte; in this case, it is specified in Sections 6.8.2 and 7.6.7 of the 2020 version of C++,
split into three bits and five bits. Encoded CBOR data items are successor of [Cplusplus17].
sometimes given in the "0x" or "0b" notation; these values are first
interpreted as numbers as in C and are then interpreted as byte Similar to the "0x" notation for hexadecimal numbers, numbers in
strings in network byte order, including any leading zero bytes binary notation are prefixed with "0b". Underscores can be added to
expressed in the notation. a number solely for readability, so 0b00100001 (0x21) might be
written 0b001_00001 to emphasize the desired interpretation of the
bits in the byte; in this case, it is split into three bits and five
bits. Encoded CBOR data items are sometimes given in the "0x" or
"0b" notation; these values are first interpreted as numbers as in C
and are then interpreted as byte strings in network byte order,
including any leading zero bytes expressed in the notation.
Words may be _italicized_ for emphasis; in the plain text form of Words may be _italicized_ for emphasis; in the plain text form of
this specification this is indicated by surrounding words with this specification this is indicated by surrounding words with
underscore characters. Verbatim text (e.g., names from a programming underscore characters. Verbatim text (e.g., names from a programming
language) may be set in "monospace" type; in plain text this is language) may be set in "monospace" type; in plain text this is
approximated somewhat ambiguously by surrounding the text in double approximated somewhat ambiguously by surrounding the text in double
quotes (which also retain their usual meaning). quotes (which also retain their usual meaning).
2. CBOR Data Models 2. CBOR Data Models
CBOR is explicit about its generic data model, which defines the set CBOR is explicit about its generic data model, which defines the set
of all data items that can be represented in CBOR. Its basic generic of all data items that can be represented in CBOR. Its basic generic
data model is extensible by the registration of simple type values data model is extensible by the registration of "simple values" and
and tags. Applications can then subset the resulting extended tags. Applications can then subset the resulting extended generic
generic data model to build their specific data models. data model to build their specific data models.
Within environments that can represent the data items in the generic Within environments that can represent the data items in the generic
data model, generic CBOR encoders and decoders can be implemented data model, generic CBOR encoders and decoders can be implemented
(which usually involves defining additional implementation data types (which usually involves defining additional implementation data types
for those data items that do not already have a natural for those data items that do not already have a natural
representation in the environment). The ability to provide generic representation in the environment). The ability to provide generic
encoders and decoders is an explicit design goal of CBOR; however encoders and decoders is an explicit design goal of CBOR; however
many applications will provide their own application-specific many applications will provide their own application-specific
encoders and/or decoders. encoders and/or decoders.
In the basic (un-extended) generic data model, a data item is one of: In the basic (un-extended) generic data model defined in Section 3, a
data item is one of:
* an integer in the range -2**64..2**64-1 inclusive * an integer in the range -2**64..2**64-1 inclusive
* a simple value, identified by a number between 0 and 255, but * a simple value, identified by a number between 0 and 255, but
distinct from that number itself distinct from that number itself
* a floating-point value, distinct from an integer, out of the set * a floating-point value, distinct from an integer, out of the set
representable by IEEE 754 binary64 (including non-finites) representable by IEEE 754 binary64 (including non-finites)
[IEEE754] [IEEE754]
skipping to change at page 9, line 32 skipping to change at page 9, line 35
precision than the above (tag numbers 2 to 5) precision than the above (tag numbers 2 to 5)
* application data types such as a point in time or an RFC 3339 * application data types such as a point in time or an RFC 3339
date/time string (tag numbers 1, 0) date/time string (tag numbers 1, 0)
Further elements of the extended generic data model can be (and have Further elements of the extended generic data model can be (and have
been) defined via the IANA registries created for CBOR. Even if such been) defined via the IANA registries created for CBOR. Even if such
an extension is unknown to a generic encoder or decoder, data items an extension is unknown to a generic encoder or decoder, data items
using that extension can be passed to or from the application by using that extension can be passed to or from the application by
representing them at the interface to the application within the representing them at the interface to the application within the
basic generic data model, i.e., as generic values of a simple type or basic generic data model, i.e., as generic simple values or generic
generic tags. tags.
In other words, the basic generic data model is stable as defined in In other words, the basic generic data model is stable as defined in
this document, while the extended generic data model expands by the this document, while the extended generic data model expands by the
registration of new simple values or tag numbers, but never shrinks. registration of new simple values or tag numbers, but never shrinks.
While there is a strong expectation that generic encoders and While there is a strong expectation that generic encoders and
decoders can represent "false", "true", and "null" ("undefined" is decoders can represent "false", "true", and "null" ("undefined" is
intentionally omitted) in the form appropriate for their programming intentionally omitted) in the form appropriate for their programming
environment, implementation of the data model extensions created by environment, implementation of the data model extensions created by
tags is truly optional and a matter of implementation quality. tags is truly optional and a matter of implementation quality.
skipping to change at page 10, line 23 skipping to change at page 10, line 32
representations of integral values are equivalent, using both map representations of integral values are equivalent, using both map
keys "0" and "0.0" in a single map would be considered duplicates, keys "0" and "0.0" in a single map would be considered duplicates,
even while encoded as different major types, and so invalid; and an even while encoded as different major types, and so invalid; and an
encoder could encode integral-valued floats as integers or vice encoder could encode integral-valued floats as integers or vice
versa, perhaps to save encoded bytes. versa, perhaps to save encoded bytes.
3. Specification of the CBOR Encoding 3. Specification of the CBOR Encoding
A CBOR data item (Section 2) is encoded to or decoded from a byte A CBOR data item (Section 2) is encoded to or decoded from a byte
string carrying a well-formed encoded data item as described in this string carrying a well-formed encoded data item as described in this
section. The encoding is summarized in Table 7, indexed by the section. The encoding is summarized in Table 7 in Appendix B,
initial byte. An encoder MUST produce only well-formed encoded data indexed by the initial byte. An encoder MUST produce only well-
items. A decoder MUST NOT return a decoded data item when it formed encoded data items. A decoder MUST NOT return a decoded data
encounters input that is not a well-formed encoded CBOR data item item when it encounters input that is not a well-formed encoded CBOR
(this does not detract from the usefulness of diagnostic and recovery data item (this does not detract from the usefulness of diagnostic
tools that might make available some information from a damaged and recovery tools that might make available some information from a
encoded CBOR data item). damaged encoded CBOR data item).
The initial byte of each encoded data item contains both information The initial byte of each encoded data item contains both information
about the major type (the high-order 3 bits, described in about the major type (the high-order 3 bits, described in
Section 3.1) and additional information (the low-order 5 bits). With Section 3.1) and additional information (the low-order 5 bits). With
a few exceptions, the additional information's value describes how to a few exceptions, the additional information's value describes how to
load an unsigned integer "argument": load an unsigned integer "argument":
Less than 24: The argument's value is the value of the additional Less than 24: The argument's value is the value of the additional
information. information.
skipping to change at page 11, line 6 skipping to change at page 11, line 16
are not used as an integer argument, but as a floating-point value are not used as an integer argument, but as a floating-point value
(see Section 3.3). (see Section 3.3).
28, 29, 30: These values are reserved for future additions to the 28, 29, 30: These values are reserved for future additions to the
CBOR format. In the present version of CBOR, the encoded item is CBOR format. In the present version of CBOR, the encoded item is
not well-formed. not well-formed.
31: No argument value is derived. If the major type is 0, 1, or 6, 31: No argument value is derived. If the major type is 0, 1, or 6,
the encoded item is not well-formed. For major types 2 to 5, the the encoded item is not well-formed. For major types 2 to 5, the
item's length is indefinite, and for major type 7, the byte does item's length is indefinite, and for major type 7, the byte does
not consitute a data item at all but terminates an indefinite not constitute a data item at all but terminates an indefinite
length item; both are described in Section 3.2. length item; all are described in Section 3.2.
The initial byte and any additional bytes consumed to construct the The initial byte and any additional bytes consumed to construct the
argument are collectively referred to as the "head" of the data item. argument are collectively referred to as the "head" of the data item.
The meaning of this argument depends on the major type. For example, The meaning of this argument depends on the major type. For example,
in major type 0, the argument is the value of the data item itself in major type 0, the argument is the value of the data item itself
(and in major type 1 the value of the data item is computed from the (and in major type 1 the value of the data item is computed from the
argument); in major type 2 and 3 it gives the length of the string argument); in major type 2 and 3 it gives the length of the string
data in bytes that follows; and in major types 4 and 5 it is used to data in bytes that follows; and in major types 4 and 5 it is used to
determine the number of data items enclosed. determine the number of data items enclosed.
skipping to change at page 11, line 38 skipping to change at page 11, line 48
256 defined values for the initial byte (Table 7). A decoder in a 256 defined values for the initial byte (Table 7). A decoder in a
constrained implementation can instead use the structure of the constrained implementation can instead use the structure of the
initial byte and following bytes for more compact code (see initial byte and following bytes for more compact code (see
Appendix C for a rough impression of how this could look). Appendix C for a rough impression of how this could look).
3.1. Major Types 3.1. Major Types
The following lists the major types and the additional information The following lists the major types and the additional information
and other bytes associated with the type. and other bytes associated with the type.
Major type 0: an integer in the range 0..2**64-1 inclusive. The Major type 0: an unsigned integer in the range 0..2**64-1 inclusive.
value of the encoded item is the argument itself. For example,
the integer 10 is denoted as the one byte 0b000_01010 (major type The value of the encoded item is the argument itself. For
0, additional information 10). The integer 500 would be example, the integer 10 is denoted as the one byte 0b000_01010
0b000_11001 (major type 0, additional information 25) followed by (major type 0, additional information 10). The integer 500 would
the two bytes 0x01f4, which is 500 in decimal. be 0b000_11001 (major type 0, additional information 25) followed
by the two bytes 0x01f4, which is 500 in decimal.
Major type 1: a negative integer in the range -2**64..-1 inclusive. Major type 1: a negative integer in the range -2**64..-1 inclusive.
The value of the item is -1 minus the argument. For example, the The value of the item is -1 minus the argument. For example, the
integer -500 would be 0b001_11001 (major type 1, additional integer -500 would be 0b001_11001 (major type 1, additional
information 25) followed by the two bytes 0x01f3, which is 499 in information 25) followed by the two bytes 0x01f3, which is 499 in
decimal. decimal.
Major type 2: a byte string. The number of bytes in the string is Major type 2: a byte string. The number of bytes in the string is
equal to the argument. For example, a byte string whose length is equal to the argument. For example, a byte string whose length is
5 would have an initial byte of 0b010_00101 (major type 2, 5 would have an initial byte of 0b010_00101 (major type 2,
skipping to change at page 12, line 18 skipping to change at page 12, line 32
initial bytes of 0b010_11001 (major type 2, additional information initial bytes of 0b010_11001 (major type 2, additional information
25 to indicate a two-byte length) followed by the two bytes 0x01f4 25 to indicate a two-byte length) followed by the two bytes 0x01f4
for a length of 500, followed by 500 bytes of binary content. for a length of 500, followed by 500 bytes of binary content.
Major type 3: a text string (Section 2), encoded as UTF-8 Major type 3: a text string (Section 2), encoded as UTF-8
([RFC3629]). The number of bytes in the string is equal to the ([RFC3629]). The number of bytes in the string is equal to the
argument. A string containing an invalid UTF-8 sequence is well- argument. A string containing an invalid UTF-8 sequence is well-
formed but invalid (Section 1.2). This type is provided for formed but invalid (Section 1.2). This type is provided for
systems that need to interpret or display human-readable text, and systems that need to interpret or display human-readable text, and
allows the differentiation between unstructured bytes and text allows the differentiation between unstructured bytes and text
that has a specified repertoire and encoding. In contrast to that has a specified repertoire (that of Unicode) and encoding
formats such as JSON, the Unicode characters in this type are (UTF-8). In contrast to formats such as JSON, the Unicode
never escaped. Thus, a newline character (U+000A) is always characters in this type are never escaped. Thus, a newline
represented in a string as the byte 0x0a, and never as the bytes character (U+000A) is always represented in a string as the byte
0x5c6e (the characters "\" and "n") or as 0x5c7530303061 (the 0x0a, and never as the bytes 0x5c6e (the characters "\" and "n")
characters "\", "u", "0", "0", "0", and "a"). nor as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and
"a").
Major type 4: an array of data items. In other formats, arrays are Major type 4: an array of data items. In other formats, arrays are
also called lists, sequences, or tuples (a "CBOR sequence" is also called lists, sequences, or tuples (a "CBOR sequence" is
something slightly different, though [RFC8742]). The argument is something slightly different, though [RFC8742]). The argument is
the number of data items in the array. Items in an array do not the number of data items in the array. Items in an array do not
need to all be of the same type. For example, an array that need to all be of the same type. For example, an array that
contains 10 items of any type would have an initial byte of contains 10 items of any type would have an initial byte of
0b100_01010 (major type of 4, additional information of 10 for the 0b100_01010 (major type 4, additional information 10 for the
length) followed by the 10 remaining items. length) followed by the 10 remaining items.
Major type 5: a map of pairs of data items. Maps are also called Major type 5: a map of pairs of data items. Maps are also called
tables, dictionaries, hashes, or objects (in JSON). A map is tables, dictionaries, hashes, or objects (in JSON). A map is
comprised of pairs of data items, each pair consisting of a key comprised of pairs of data items, each pair consisting of a key
that is immediately followed by a value. The argument is the that is immediately followed by a value. The argument is the
number of _pairs_ of data items in the map. For example, a map number of _pairs_ of data items in the map. For example, a map
that contains 9 pairs would have an initial byte of 0b101_01001 that contains 9 pairs would have an initial byte of 0b101_01001
(major type of 5, additional information of 9 for the number of (major type 5, additional information 9 for the number of pairs)
pairs) followed by the 18 remaining items. The first item is the followed by the 18 remaining items. The first item is the first
first key, the second item is the first value, the third item is key, the second item is the first value, the third item is the
the second key, and so on. Because items in a map come in pairs, second key, and so on. Because items in a map come in pairs,
their total number is always even: A map that contains an odd their total number is always even: A map that contains an odd
number of items (no value data present after the last key data number of items (no value data present after the last key data
item) is not well-formed. A map that has duplicate keys may be item) is not well-formed. A map that has duplicate keys may be
well-formed, but it is not valid, and thus it causes indeterminate well-formed, but it is not valid, and thus it causes indeterminate
decoding; see also Section 5.6. decoding; see also Section 5.6.
Major type 6: a tagged data item ("tag") whose tag number, an Major type 6: a tagged data item ("tag") whose tag number, an
integer in the range 0..2**64-1 inclusive, is the argument and integer in the range 0..2**64-1 inclusive, is the argument and
whose enclosed data item ("tag content") is the single encoded whose enclosed data item ("tag content") is the single encoded
data item that follows the head. See Section 3.4. data item that follows the head. See Section 3.4.
skipping to change at page 13, line 23 skipping to change at page 14, line 5
(Table 7). (Table 7).
In major types 6 and 7, many of the possible values are reserved for In major types 6 and 7, many of the possible values are reserved for
future specification. See Section 9 for more information on these future specification. See Section 9 for more information on these
values. values.
Table 1 summarizes the major types defined by CBOR, ignoring the next Table 1 summarizes the major types defined by CBOR, ignoring the next
section for now. The number N in this table stands for the argument, section for now. The number N in this table stands for the argument,
mt for the major type. mt for the major type.
+----+-----------------------+---------------------------------+ +====+=======================+=================================+
| mt | Meaning | Content | | mt | Meaning | Content |
+====+=======================+=================================+ +====+=======================+=================================+
| 0 | unsigned integer N | - | | 0 | unsigned integer N | - |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 1 | negative integer -1-N | - | | 1 | negative integer -1-N | - |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 2 | byte string | N bytes | | 2 | byte string | N bytes |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 3 | text string | N bytes (UTF-8 text) | | 3 | text string | N bytes (UTF-8 text) |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
skipping to change at page 14, line 16 skipping to change at page 14, line 39
Four CBOR items (arrays, maps, byte strings, and text strings) can be Four CBOR items (arrays, maps, byte strings, and text strings) can be
encoded with an indefinite length using additional information value encoded with an indefinite length using additional information value
31. This is useful if the encoding of the item needs to begin before 31. This is useful if the encoding of the item needs to begin before
the number of items inside the array or map, or the total length of the number of items inside the array or map, or the total length of
the string, is known. (The ability to start sending a data item the string, is known. (The ability to start sending a data item
before all of it is known is often referred to as "streaming" within before all of it is known is often referred to as "streaming" within
that data item.) that data item.)
Indefinite-length arrays and maps are dealt with differently than Indefinite-length arrays and maps are dealt with differently than
indefinite-length byte strings and text strings. indefinite-length strings (byte strings and text strings).
3.2.1. The "break" Stop Code 3.2.1. The "break" Stop Code
The "break" stop code is encoded with major type 7 and additional The "break" stop code is encoded with major type 7 and additional
information value 31 (0b111_11111). It is not itself a data item: it information value 31 (0b111_11111). It is not itself a data item: it
is just a syntactic feature to close an indefinite-length item. is just a syntactic feature to close an indefinite-length item.
If the "break" stop code appears anywhere where a data item is If the "break" stop code appears anywhere where a data item is
expected, other than directly inside an indefinite-length string, expected, other than directly inside an indefinite-length string,
array, or map -- for example directly inside a definite-length array array, or map -- for example directly inside a definite-length array
skipping to change at page 16, line 45 skipping to change at page 17, line 21
The data item represented by the indefinite-length string is the The data item represented by the indefinite-length string is the
concatenation of the chunks (i.e., the empty byte or text string, concatenation of the chunks (i.e., the empty byte or text string,
respectively, if no chunk is present). (Note that zero-length respectively, if no chunk is present). (Note that zero-length
chunks, while not particularly useful, are permitted.) chunks, while not particularly useful, are permitted.)
If any item between the indefinite-length string indicator If any item between the indefinite-length string indicator
(0b010_11111 or 0b011_11111) and the "break" stop code is not a (0b010_11111 or 0b011_11111) and the "break" stop code is not a
definite-length string item of the same major type, the string is not definite-length string item of the same major type, the string is not
well-formed. well-formed.
The design does not allow nesting indefinite-length strings as chunks
into indefinite-length strings. If it were allowed, it would require
decoder implementations to keep a stack, or at least a count, of
nesting levels. It is unnecessary on the encoder side because the
inner indefinite-length string would consist of chunks, and these
could instead be put directly into the outer indefinite-length
string.
If any definite-length text string inside an indefinite-length text If any definite-length text string inside an indefinite-length text
string is invalid, the indefinite-length text string is invalid. string is invalid, the indefinite-length text string is invalid.
Note that this implies that the UTF-8 bytes of a single Unicode code Note that this implies that the UTF-8 bytes of a single Unicode code
point (scalar value) cannot be spread between chunks: a new chunk of point (scalar value) cannot be spread between chunks: a new chunk of
a text string can only be started at a code point boundary. a text string can only be started at a code point boundary.
For example, assume an encoded data item consisting of the bytes: For example, assume an encoded data item consisting of the bytes:
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
skipping to change at page 17, line 23 skipping to change at page 18, line 11
After decoding, this results in a single byte string with seven After decoding, this results in a single byte string with seven
bytes: 0xaabbccddeeff99. bytes: 0xaabbccddeeff99.
3.2.4. Summary of indefinite-length use of major types 3.2.4. Summary of indefinite-length use of major types
Table 2 summarizes the major types defined by CBOR as used for Table 2 summarizes the major types defined by CBOR as used for
indefinite length encoding (with additional information set to 31). indefinite length encoding (with additional information set to 31).
mt stands for the major type. mt stands for the major type.
+----+-------------------+----------------------------------+ +====+===================+==================================+
| mt | Meaning | enclosed up to "break" stop code | | mt | Meaning | enclosed up to "break" stop code |
+====+===================+==================================+ +====+===================+==================================+
| 0 | (not well-formed) | - | | 0 | (not well-formed) | - |
+----+-------------------+----------------------------------+ +----+-------------------+----------------------------------+
| 1 | (not well-formed) | - | | 1 | (not well-formed) | - |
+----+-------------------+----------------------------------+ +----+-------------------+----------------------------------+
| 2 | byte string | definite-length byte strings | | 2 | byte string | definite-length byte strings |
+----+-------------------+----------------------------------+ +----+-------------------+----------------------------------+
| 3 | text string | definite-length text strings | | 3 | text string | definite-length text strings |
+----+-------------------+----------------------------------+ +----+-------------------+----------------------------------+
skipping to change at page 18, line 12 skipping to change at page 18, line 42
major types (mt = major type, additional information = major types (mt = major type, additional information =
31) 31)
3.3. Floating-Point Numbers and Values with No Content 3.3. Floating-Point Numbers and Values with No Content
Major type 7 is for two types of data: floating-point numbers and Major type 7 is for two types of data: floating-point numbers and
"simple values" that do not need any content. Each value of the "simple values" that do not need any content. Each value of the
5-bit additional information in the initial byte has its own separate 5-bit additional information in the initial byte has its own separate
meaning, as defined in Table 3. Like the major types for integers, meaning, as defined in Table 3. Like the major types for integers,
items of this major type do not carry content data; all the items of this major type do not carry content data; all the
information is in the initial bytes. information is in the initial bytes (the head).
+-------------+---------------------------------------------------+ +=============+===================================================+
| 5-Bit Value | Semantics | | 5-Bit Value | Semantics |
+=============+===================================================+ +=============+===================================================+
| 0..23 | Simple value (value 0..23) | | 0..23 | Simple value (value 0..23) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 24 | Simple value (value 32..255 in following byte) | | 24 | Simple value (value 32..255 in following byte) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 25 | IEEE 754 Half-Precision Float (16 bits follow) | | 25 | IEEE 754 Half-Precision Float (16 bits follow) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 26 | IEEE 754 Single-Precision Float (32 bits follow) | | 26 | IEEE 754 Single-Precision Float (32 bits follow) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
skipping to change at page 18, line 40 skipping to change at page 19, line 31
| | (Section 3.2.1) | | | (Section 3.2.1) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
Table 3: Values for Additional Information in Major Type 7 Table 3: Values for Additional Information in Major Type 7
As with all other major types, the 5-bit value 24 signifies a single- As with all other major types, the 5-bit value 24 signifies a single-
byte extension: it is followed by an additional byte to represent the byte extension: it is followed by an additional byte to represent the
simple value. (To minimize confusion, only the values 32 to 255 are simple value. (To minimize confusion, only the values 32 to 255 are
used.) This maintains the structure of the initial bytes: as for the used.) This maintains the structure of the initial bytes: as for the
other major types, the length of these always depends on the other major types, the length of these always depends on the
additional information in the first byte. Table 4 lists the values additional information in the first byte. Table 4 lists the numeric
assigned and available for simple types. values assigned and available for simple values.
+---------+-----------------+ +=========+==============+
| Value | Semantics | | Value | Semantics |
+=========+=================+ +=========+==============+
| 0..19 | (Unassigned) | | 0..19 | (Unassigned) |
+---------+-----------------+ +---------+--------------+
| 20 | False | | 20 | False |
+---------+-----------------+ +---------+--------------+
| 21 | True | | 21 | True |
+---------+-----------------+ +---------+--------------+
| 22 | Null | | 22 | Null |
+---------+-----------------+ +---------+--------------+
| 23 | Undefined value | | 23 | Undefined |
+---------+-----------------+ +---------+--------------+
| 24..31 | (Reserved) | | 24..31 | (Reserved) |
+---------+-----------------+ +---------+--------------+
| 32..255 | (Unassigned) | | 32..255 | (Unassigned) |
+---------+-----------------+ +---------+--------------+
Table 4: Simple Values Table 4: Simple Values
An encoder MUST NOT issue two-byte sequences that start with 0xf8 An encoder MUST NOT issue two-byte sequences that start with 0xf8
(major type = 7, additional information = 24) and continue with a (major type 7, additional information 24) and continue with a byte
byte less than 0x20 (32 decimal). Such sequences are not well- less than 0x20 (32 decimal). Such sequences are not well-formed.
formed. (This implies that an encoder cannot encode false, true, (This implies that an encoder cannot encode false, true, null, or
null, or undefined in two-byte sequences, only the one-byte variants undefined in two-byte sequences, and that only the one-byte variants
of these are well-formed; more generally speaking, each simple value of these are well-formed; more generally speaking, each simple value
only has a single representation variant). only has a single representation variant).
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
IEEE 754 binary floating-point values [IEEE754]. These floating- IEEE 754 binary floating-point values [IEEE754]. These floating-
point values are encoded in the additional bytes of the appropriate point values are encoded in the additional bytes of the appropriate
size. (See Appendix D for some information about 16-bit floating- size. (See Appendix D for some information about 16-bit floating-
point numbers.) point numbers.)
3.4. Tagging of Items 3.4. Tagging of Items
skipping to change at page 21, line 11 skipping to change at page 21, line 39
decoder; it can simply present both the tag number and the tag decoder; it can simply present both the tag number and the tag
content to the application, without interpreting the additional content to the application, without interpreting the additional
semantics of the tag. semantics of the tag.
A tag applies semantics to the data item it encloses. Tags can nest: A tag applies semantics to the data item it encloses. Tags can nest:
If tag A encloses tag B, which encloses data item C, tag A applies to If tag A encloses tag B, which encloses data item C, tag A applies to
the result of applying tag B on data item C. the result of applying tag B on data item C.
IANA maintains a registry of tag numbers as described in Section 9.2. IANA maintains a registry of tag numbers as described in Section 9.2.
Table 5 provides a list of tag numbers that were defined in Table 5 provides a list of tag numbers that were defined in
[RFC7049], with definitions in the rest of this section. Note that [RFC7049], with definitions in the rest of this section. (Tag number
many other tag numbers have been defined since the publication of 35 was also defined in [RFC7049]; a discussion of this tag number
[RFC7049]; see the registry described at Section 9.2 for the complete follows in Section 3.4.5.3.) Note that many other tag numbers have
list. been defined since the publication of [RFC7049]; see the registry
described at Section 9.2 for the complete list.
+------------+-------------+----------------------------------+ +============+=============+==================================+
| Tag Number | Data Item | Semantics | | Tag Number | Data Item | Tag Content Semantics |
+============+=============+==================================+ +============+=============+==================================+
| 0 | text string | Standard date/time string; see | | 0 | text string | Standard date/time string; see |
| | | Section 3.4.1 | | | | Section 3.4.1 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 1 | integer or | Epoch-based date/time; see | | 1 | integer or | Epoch-based date/time; see |
| | float | Section 3.4.2 | | | float | Section 3.4.2 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 2 | byte string | Positive bignum; see | | 2 | byte string | Positive bignum; see |
| | | Section 3.4.3 | | | | Section 3.4.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
skipping to change at page 22, line 5 skipping to change at page 22, line 43
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 24 | byte string | Encoded CBOR data item; see | | 24 | byte string | Encoded CBOR data item; see |
| | | Section 3.4.5.1 | | | | Section 3.4.5.1 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 32 | text string | URI; see Section 3.4.5.3 | | 32 | text string | URI; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 33 | text string | base64url; see Section 3.4.5.3 | | 33 | text string | base64url; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 34 | text string | base64; see Section 3.4.5.3 | | 34 | text string | base64; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 35 | text string | Regular expression; see |
| | | Section 3.4.5.3 |
+------------+-------------+----------------------------------+
| 36 | text string | MIME message; see | | 36 | text string | MIME message; see |
| | | Section 3.4.5.3 | | | | Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 55799 | (any) | Self-described CBOR; see | | 55799 | (any) | Self-described CBOR; see |
| | | Section 3.4.6 | | | | Section 3.4.6 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
Table 5: Tag numbers defined in RFC 7049 Table 5: Tag numbers defined in RFC 7049
Conceptually, tags are interpreted in the generic data model, not at Conceptually, tags are interpreted in the generic data model, not at
(de-)serialization time. A small number of tags (specifically, tag (de-)serialization time. A small number of tags (at this time, tag
number 25 and tag number 29) have been registered with semantics that number 25 and tag number 29 [IANA.cbor-tags]) have been registered
may require processing at (de-)serialization time: The decoder needs with semantics that may require processing at (de-)serialization
to be aware and the encoder needs to be in control of the exact time: The decoder needs to be aware and the encoder needs to be in
sequence in which data items are encoded into the CBOR data item. control of the exact sequence in which data items are encoded into
This means these tags cannot be implemented on top of every generic the CBOR data item. This means these tags cannot be implemented on
CBOR encoder/decoder (which might not reflect the serialization order top of an arbitrary generic CBOR encoder/decoder (which might not
for entries in a map at the data model level and vice versa); their reflect the serialization order for entries in a map at the data
implementation therefore typically needs to be integrated into the model level and vice versa); their implementation therefore typically
generic encoder/decoder. The definition of new tags with this needs to be integrated into the generic encoder/decoder. The
property is NOT RECOMMENDED. definition of new tags with this property is NOT RECOMMENDED.
IANA allocated tag numbers 65535, 4294967295, and IANA allocated tag numbers 65535, 4294967295, and
18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit). 18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit).
These can be used as a convenience for implementers that want a These can be used as a convenience for implementers that want a
single integer to indicate either that a specific tag is present, or single integer data structure to indicate either that a specific tag
the absence of a tag. That allocation is described in Section 10 of is present, or the absence of a tag. That allocation is described in
[I-D.bormann-cbor-notable-tags]. These tags are not intended to Section 10 of [I-D.bormann-cbor-notable-tags]. These tags are not
occur in actual CBOR data items; implementations may flag such an intended to occur in actual CBOR data items; implementations MAY flag
occurrence as an error. such an occurrence as an error.
Protocols using tag numbers 0 and 1 extend the generic data model Protocols using tag numbers 0 and 1 extend the generic data model
(Section 2) with data items representing points in time; tag numbers (Section 2) with data items representing points in time; tag numbers
2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, 2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5,
with floating-point values of arbitrary size and precision. with floating-point values of arbitrary size and precision.
3.4.1. Standard Date/Time String 3.4.1. Standard Date/Time String
Tag number 0 contains a text string in the standard format described Tag number 0 contains a text string in the standard format described
by the "date-time" production in [RFC3339], as refined by Section 3.3 by the "date-time" production in [RFC3339], as refined by Section 3.3
of [RFC4287], representing the point in time described there. A of [RFC4287], representing the point in time described there. A
nested item of another type or that doesn't match the [RFC4287] nested item of another type or a text string that doesn't match the
format is invalid. [RFC4287] format is invalid.
3.4.2. Epoch-based Date/Time 3.4.2. Epoch-based Date/Time
Tag number 1 contains a numerical value counting the number of Tag number 1 contains a numerical value counting the number of
seconds from 1970-01-01T00:00Z in UTC time to the represented point seconds from 1970-01-01T00:00Z in UTC time to the represented point
in civil time. in civil time.
The tag content MUST be an unsigned or negative integer (major types The tag content MUST be an unsigned or negative integer (major types
0 and 1), or a floating-point number (major type 7 with additional 0 and 1), or a floating-point number (major type 7 with additional
information 25, 26, or 27). Other contained types are invalid. information 25, 26, or 27). Other contained types are invalid.
Non-negative values (major type 0 and non-negative floating-point Non-negative values (major type 0 and non-negative floating-point
numbers) stand for time values on or after 1970-01-01T00:00Z UTC and numbers) stand for time values on or after 1970-01-01T00:00Z UTC and
are interpreted according to POSIX [TIME_T]. (POSIX time is also are interpreted according to POSIX [TIME_T]. (POSIX time is also
known as UNIX Epoch time. Note that leap seconds are handled known as "UNIX Epoch time".) Leap seconds are handled specially by
specially by POSIX time and this results in a 1 second discontinuity POSIX time and this results in a 1 second discontinuity several times
several times per decade.) Note that applications that require the per decade. Note that applications that require the expression of
expression of times beyond early 2106 cannot leave out support of times beyond early 2106 cannot leave out support of 64-bit integers
64-bit integers for the tag content. for the tag content.
Negative values (major type 1 and negative floating-point numbers) Negative values (major type 1 and negative floating-point numbers)
are interpreted as determined by the application requirements as are interpreted as determined by the application requirements as
there is no universal standard for UTC count-of-seconds time before there is no universal standard for UTC count-of-seconds time before
1970-01-01T00:00Z (this is particularly true for points in time that 1970-01-01T00:00Z (this is particularly true for points in time that
precede discontinuities in national calendars). The same applies to precede discontinuities in national calendars). The same applies to
non-finite values. non-finite values.
To indicate fractional seconds, floating-point values can be used To indicate fractional seconds, floating-point values can be used
within tag number 1 instead of integer values. Note that this within tag number 1 instead of integer values. Note that this
skipping to change at page 23, line 44 skipping to change at page 24, line 30
non-zero fractions of seconds only for a short period of time around non-zero fractions of seconds only for a short period of time around
early 1970. An application that requires tag number 1 support may early 1970. An application that requires tag number 1 support may
restrict the tag content to be an integer (or a floating-point value) restrict the tag content to be an integer (or a floating-point value)
only. only.
Note that platform types for date/time may include null or undefined Note that platform types for date/time may include null or undefined
values, which may also be desirable at an application protocol level. values, which may also be desirable at an application protocol level.
While emitting tag number 1 values with non-finite tag content values While emitting tag number 1 values with non-finite tag content values
(e.g., with NaN for undefined date/time values or with Infinite for (e.g., with NaN for undefined date/time values or with Infinite for
an expiry date that is not set) may seem an obvious way to handle an expiry date that is not set) may seem an obvious way to handle
this, using untagged null or undefined is often a better solution. this, using untagged null or undefined avoids the use of non-finites
Application protocol designers are encouraged to consider these cases and results in a shorter encoding. Application protocol designers
and include clear guidelines for handling them. are encouraged to consider these cases and include clear guidelines
for handling them.
3.4.3. Bignums 3.4.3. Bignums
Protocols using tag numbers 2 and 3 extend the generic data model Protocols using tag numbers 2 and 3 extend the generic data model
(Section 2) with "bignums" representing arbitrarily sized integers. (Section 2) with "bignums" representing arbitrarily sized integers.
In the basic generic data model, bignum values are not equal to In the basic generic data model, bignum values are not equal to
integers from the same model, but the extended generic data model integers from the same model, but the extended generic data model
created by this tag definition defines equivalence based on numeric created by this tag definition defines equivalence based on numeric
value, and preferred serialization (Section 4.1) never makes use of value, and preferred serialization (Section 4.1) never makes use of
bignums that also can be expressed as basic integers (see below). bignums that also can be expressed as basic integers (see below).
skipping to change at page 25, line 28 skipping to change at page 26, line 15
A decimal fraction or a bigfloat is represented as a tagged array A decimal fraction or a bigfloat is represented as a tagged array
that contains exactly two integer numbers: an exponent e and a that contains exactly two integer numbers: an exponent e and a
mantissa m. Decimal fractions (tag number 4) use base-10 exponents; mantissa m. Decimal fractions (tag number 4) use base-10 exponents;
the value of a decimal fraction data item is m*(10**e). Bigfloats the value of a decimal fraction data item is m*(10**e). Bigfloats
(tag number 5) use base-2 exponents; the value of a bigfloat data (tag number 5) use base-2 exponents; the value of a bigfloat data
item is m*(2**e). The exponent e MUST be represented in an integer item is m*(2**e). The exponent e MUST be represented in an integer
of major type 0 or 1, while the mantissa can also be a bignum of major type 0 or 1, while the mantissa can also be a bignum
(Section 3.4.3). Contained items with other structures are invalid. (Section 3.4.3). Contained items with other structures are invalid.
An example of a decimal fraction is that the number 273.15 could be An example of a decimal fraction is that the number 273.15 could be
represented as 0b110_00100 (major type of 6 for the tag, additional represented as 0b110_00100 (major type 6 for tag, additional
information of 4 for the number of tag), followed by 0b100_00010 information 4 for the tag number), followed by 0b100_00010 (major
(major type of 4 for the array, additional information of 2 for the type 4 for the array, additional information 2 for the length of the
length of the array), followed by 0b001_00001 (major type of 1 for array), followed by 0b001_00001 (major type 1 for the first integer,
the first integer, additional information of 1 for the value of -2), additional information 1 for the value of -2), followed by
followed by 0b000_11001 (major type of 0 for the second integer, 0b000_11001 (major type 0 for the second integer, additional
additional information of 25 for a two-byte value), followed by information 25 for a two-byte value), followed by 0b0110101010110011
0b0110101010110011 (27315 in two bytes). In hexadecimal: (27315 in two bytes). In hexadecimal:
C4 -- Tag 4 C4 -- Tag 4
82 -- Array of length 2 82 -- Array of length 2
21 -- -2 21 -- -2
19 6ab3 -- 27315 19 6ab3 -- 27315
An example of a bigfloat is that the number 1.5 could be represented An example of a bigfloat is that the number 1.5 could be represented
as 0b110_00101 (major type of 6 for the tag, additional information as 0b110_00101 (major type 6 for tag, additional information 5 for
of 5 for the number of tag), followed by 0b100_00010 (major type of 4 the tag number), followed by 0b100_00010 (major type 4 for the array,
for the array, additional information of 2 for the length of the additional information 2 for the length of the array), followed by
array), followed by 0b001_00000 (major type of 1 for the first 0b001_00000 (major type 1 for the first integer, additional
integer, additional information of 0 for the value of -1), followed information 0 for the value of -1), followed by 0b000_00011 (major
by 0b000_00011 (major type of 0 for the second integer, additional type 0 for the second integer, additional information 3 for the value
information of 3 for the value of 3). In hexadecimal: of 3). In hexadecimal:
C5 -- Tag 5 C5 -- Tag 5
82 -- Array of length 2 82 -- Array of length 2
20 -- -1 20 -- -1
03 -- 3 03 -- 3
Decimal fractions and bigfloats provide no representation of Decimal fractions and bigfloats provide no representation of
Infinity, -Infinity, or NaN; if these are needed in place of a Infinity, -Infinity, or NaN; if these are needed in place of a
decimal fraction or bigfloat, the IEEE 754 half-precision decimal fraction or bigfloat, the IEEE 754 half-precision
representations from Section 3.3 can be used. representations from Section 3.3 can be used.
skipping to change at page 26, line 35 skipping to change at page 27, line 19
item is being decoded. Tag number 24 (CBOR data item) can be used to item is being decoded. Tag number 24 (CBOR data item) can be used to
tag the embedded byte string as a single data item encoded in CBOR tag the embedded byte string as a single data item encoded in CBOR
format. Contained items that aren't byte strings are invalid. A format. Contained items that aren't byte strings are invalid. A
contained byte string is valid if it encodes a well-formed CBOR data contained byte string is valid if it encodes a well-formed CBOR data
item; validity checking of the decoded CBOR item is not required for item; validity checking of the decoded CBOR item is not required for
tag validity (but could be offered by a generic decoder as a special tag validity (but could be offered by a generic decoder as a special
option). option).
3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters
Tags number 21 to 23 indicate that a byte string might require a Tag numbers 21 to 23 indicate that a byte string might require a
specific encoding when interoperating with a text-based specific encoding when interoperating with a text-based
representation. These tags are useful when an encoder knows that the representation. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a byte string data it is writing is likely to be later converted to a
particular JSON-based usage. That usage specifies that some strings particular JSON-based usage. That usage specifies that some strings
are encoded as base64, base64url, and so on. The encoder uses byte are encoded as base64, base64url, and so on. The encoder uses byte
strings instead of doing the encoding itself to reduce the message strings instead of doing the encoding itself to reduce the message
size, to reduce the code size of the encoder, or both. The encoder size, to reduce the code size of the encoder, or both. The encoder
does not know whether or not the converter will be generic, and does not know whether or not the converter will be generic, and
therefore wants to say what it believes is the proper way to convert therefore wants to say what it believes is the proper way to convert
binary strings to JSON. binary strings to JSON.
skipping to change at page 27, line 12 skipping to change at page 27, line 43
contained in the data item, except for those contained in a nested contained in the data item, except for those contained in a nested
data item tagged with an expected conversion. data item tagged with an expected conversion.
These three tag numbers suggest conversions to three of the base data These three tag numbers suggest conversions to three of the base data
encodings defined in [RFC4648]. Tag number 21 suggests conversion to encodings defined in [RFC4648]. Tag number 21 suggests conversion to
base64url encoding (Section 5 of RFC 4648), where padding is not used base64url encoding (Section 5 of RFC 4648), where padding is not used
(see Section 3.2 of RFC 4648); that is, all trailing equals signs (see Section 3.2 of RFC 4648); that is, all trailing equals signs
("=") are removed from the encoded string. Tag number 22 suggests ("=") are removed from the encoded string. Tag number 22 suggests
conversion to classical base64 encoding (Section 4 of RFC 4648), with conversion to classical base64 encoding (Section 4 of RFC 4648), with
padding as defined in RFC 4648. For both base64url and base64, padding as defined in RFC 4648. For both base64url and base64,
padding bits are set to zero (see Section 3.5 of RFC 4648), and padding bits are set to zero (see Section 3.5 of RFC 4648), and the
encoding is performed without the inclusion of any line breaks, conversion to alternate encoding is performed on the contents of the
whitespace, or other additional characters. Tag number 23 suggests byte string (that is, without adding any line breaks, whitespace, or
conversion to base16 (hex) encoding, with uppercase alphabetics (see other additional characters). Tag number 23 suggests conversion to
Section 8 of RFC 4648). Note that, for all three tag numbers, the base16 (hex) encoding, with uppercase alphabetics (see Section 8 of
encoding of the empty byte string is the empty text string. RFC 4648). Note that, for all three tag numbers, the encoding of the
empty byte string is the empty text string.
3.4.5.3. Encoded Text 3.4.5.3. Encoded Text
Some text strings hold data that have formats widely used on the Some text strings hold data that have formats widely used on the
Internet, and sometimes those formats can be validated and presented Internet, and sometimes those formats can be validated and presented
to the application in appropriate form by the decoder. There are to the application in appropriate form by the decoder. There are
tags for some of these formats. tags for some of these formats.
* Tag number 32 is for URIs, as defined in [RFC3986]. If the text * Tag number 32 is for URIs, as defined in [RFC3986]. If the text
string doesn't match the "URI-reference" production, the string is string doesn't match the "URI-reference" production, the string is
skipping to change at page 28, line 5 skipping to change at page 28, line 33
- the padding bits in a 2- or 3-character block are not 0, or - the padding bits in a 2- or 3-character block are not 0, or
- the base64 encoding has the wrong number of padding characters, - the base64 encoding has the wrong number of padding characters,
or or
- the base64url encoding has padding characters, - the base64url encoding has padding characters,
the string is invalid. the string is invalid.
* Tag number 35 is for regular expressions that are roughly in Perl
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
version of the JavaScript regular expression syntax [ECMA262].
(Note that more specific identification may be necessary if the
actual version of the specification underlying the regular
expression, or more than just the text of the regular expression
itself, need to be conveyed.) Any contained string value is
valid.
* Tag number 36 is for MIME messages (including all headers), as * Tag number 36 is for MIME messages (including all headers), as
defined in [RFC2045]. A text string that isn't a valid MIME defined in [RFC2045]. A text string that isn't a valid MIME
message is invalid. (For this tag, validity checking may be message is invalid. (For this tag, validity checking may be
particularly onerous for a generic decoder and might therefore not particularly onerous for a generic decoder and might therefore not
be offered. Note that many MIME messages are general binary data be offered. Note that many MIME messages are general binary data
and can therefore not be represented in a text string; and can therefore not be represented in a text string;
[IANA.cbor-tags] lists a registration for tag number 257 that is [IANA.cbor-tags] lists a registration for tag number 257 that is
similar to tag number 36 but uses a byte string as its tag similar to tag number 36 but uses a byte string as its tag
content.) content.)
Note that tag numbers 33 and 34 differ from 21 and 22 in that the Note that tag numbers 33 and 34 differ from 21 and 22 in that the
data is transported in base-encoded form for the former and in raw data is transported in base-encoded form for the former and in raw
byte string form for the latter. byte string form for the latter.
[RFC7049] also defined a tag number 35, for regular expressions that
are in Perl Compatible Regular Expressions (PCRE/PCRE2) form [PCRE]
or in JavaScript regular expression syntax [ECMA262]. The state of
the art in these regular expression specifications has since advanced
and is continually advancing, so the present specification does not
attempt to update the references to a snapshot that is current at the
time of writing. Instead, this tag remains available (as registered
in [RFC7049]) for applications that specify the particular regular
expression variant they use out-of-band (possibly by limiting the
usage to a defined common subset of both PCRE and ECMA262). As the
present specification clarifies tag validity beyond [RFC7049], we
note that due to the open way the tag was defined in [RFC7049], any
contained string value needs to be valid at the CBOR tag level (but
may then not be "expected" at the application level).
3.4.6. Self-Described CBOR 3.4.6. Self-Described CBOR
In many applications, it will be clear from the context that CBOR is In many applications, it will be clear from the context that CBOR is
being employed for encoding a data item. For instance, a specific being employed for encoding a data item. For instance, a specific
protocol might specify the use of CBOR, or a media type is indicated protocol might specify the use of CBOR, or a media type is indicated
that specifies its use. However, there may be applications where that specifies its use. However, there may be applications where
such context information is not available, such as when CBOR data is such context information is not available, such as when CBOR data is
stored in a file that does not have disambiguating metadata. Here, stored in a file that does not have disambiguating metadata. Here,
it may help to have some distinguishing characteristics for the data it may help to have some distinguishing characteristics for the data
itself. itself.
skipping to change at page 29, line 40 skipping to change at page 30, line 24
say, always uses 64-bit integers. say, always uses 64-bit integers.
Similarly, a constrained encoder may be limited in the variety of Similarly, a constrained encoder may be limited in the variety of
representation variants it supports in such a way that it does not representation variants it supports in such a way that it does not
emit preferred serializations ("variant encoder"): Say, it could be emit preferred serializations ("variant encoder"): Say, it could be
designed to always use the 32-bit variant for an integer that it designed to always use the 32-bit variant for an integer that it
encodes even if a short representation is available (again, assuming encodes even if a short representation is available (again, assuming
that there is no application need for integers that can only be that there is no application need for integers that can only be
represented with the 64-bit variant). A decoder that does not rely represented with the 64-bit variant). A decoder that does not rely
on only ever receiving preferred serializations ("variation-tolerant on only ever receiving preferred serializations ("variation-tolerant
decoder") can there be said to be more universally interoperable (it decoder") can therefore be said to be more universally interoperable
might very well optimize for the case of receiving preferred (it might very well optimize for the case of receiving preferred
serializations, though). Full implementations of CBOR decoders are serializations, though). Full implementations of CBOR decoders are
by definition variation-tolerant; the distinction is only relevant if by definition variation-tolerant; the distinction is only relevant if
a constrained implementation of a CBOR decoder meets a variant a constrained implementation of a CBOR decoder meets a variant
encoder. encoder.
The preferred serialization always uses the shortest form of The preferred serialization always uses the shortest form of
representing the argument (Section 3); it also uses the shortest representing the argument (Section 3); it also uses the shortest
floating-point encoding that preserves the value being encoded. floating-point encoding that preserves the value being encoded.
The preferred serialization for a floating-point value is the The preferred serialization for a floating-point value is the
skipping to change at page 30, line 49 skipping to change at page 31, line 38
- 24 to 255 and -25 to -256 MUST be expressed only with an - 24 to 255 and -25 to -256 MUST be expressed only with an
additional uint8_t; additional uint8_t;
- 256 to 65535 and -257 to -65536 MUST be expressed only with an - 256 to 65535 and -257 to -65536 MUST be expressed only with an
additional uint16_t; additional uint16_t;
- 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
only with an additional uint32_t. only with an additional uint32_t.
Floating-point values also MUST use the shortest form that Floating-point values also MUST use the shortest form that
preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 preserves the value, e.g. 1.5 is encoded as 0xf93e00 (binary16)
as 0xfa49742408. (One implementation of this is to have all and 1000000.5 as 0xfa49742408 (binary32). (One implementation of
floats start as a 64-bit float, then do a test conversion to a this is to have all floats start as a 64-bit float, then do a test
32-bit float; if the result is the same numeric value, use the conversion to a 32-bit float; if the result is the same numeric
shorter form and repeat the process with a test conversion to a value, use the shorter form and repeat the process with a test
16-bit float. This also works to select 16-bit float for positive conversion to a 16-bit float. This also works to select 16-bit
and negative Infinity as well.) float for positive and negative Infinity as well.)
* Indefinite-length items MUST NOT appear. They can be encoded as * Indefinite-length items MUST NOT appear. They can be encoded as
definite-length items instead. definite-length items instead.
* The keys in every map MUST be sorted in the bytewise lexicographic * The keys in every map MUST be sorted in the bytewise lexicographic
order of their deterministic encodings. For example, the order of their deterministic encodings. For example, the
following keys are sorted correctly: following keys are sorted correctly:
1. 10, encoded as 0x0a. 1. 10, encoded as 0x0a.
skipping to change at page 31, line 31 skipping to change at page 32, line 21
4. "z", encoded as 0x617a. 4. "z", encoded as 0x617a.
5. "aa", encoded as 0x626161. 5. "aa", encoded as 0x626161.
6. [100], encoded as 0x811864. 6. [100], encoded as 0x811864.
7. [-1], encoded as 0x8120. 7. [-1], encoded as 0x8120.
8. false, encoded as 0xf4. 8. false, encoded as 0xf4.
(Implementation note: the self-delimiting nature of the CBOR
encoding means that there are no two well-formed CBOR encoded data
items where one is a prefix of the other. The bytewise
lexicographic comparison of deterministic encodings of different
map keys therefore always ends in a position where the byte
differs between the keys, before the end of a key is reached.)
4.2.2. Additional Deterministic Encoding Considerations 4.2.2. Additional Deterministic Encoding Considerations
CBOR tags present additional considerations for deterministic CBOR tags present additional considerations for deterministic
encoding. If a CBOR-based protocol were to provide the same encoding. If a CBOR-based protocol were to provide the same
semantics for the presence and absence of a specific tag (e.g., by semantics for the presence and absence of a specific tag (e.g., by
allowing both tag 1 data items and raw numbers in a date/time allowing both tag 1 data items and raw numbers in a date/time
position, treating the latter as if they were tagged), the position, treating the latter as if they were tagged), the
deterministic format would not allow the presence of the tag, based deterministic format would not allow the presence of the tag, based
on the "shortest form" principle. For example, a protocol might give on the "shortest form" principle. For example, a protocol might give
encoders the choice of representing a URL as either a text string or, encoders the choice of representing a URL as either a text string or,
skipping to change at page 32, line 21 skipping to change at page 33, line 14
Protocols that include floating-point values, whether represented Protocols that include floating-point values, whether represented
using basic floating-point values (Section 3.3) or using tags (or using basic floating-point values (Section 3.3) or using tags (or
both), may need to define extra requirements on their deterministic both), may need to define extra requirements on their deterministic
encodings, such as: encodings, such as:
* Although IEEE floating-point values can represent both positive * Although IEEE floating-point values can represent both positive
and negative zero as distinct values, the application might not and negative zero as distinct values, the application might not
distinguish these and might decide to represent all zero values distinguish these and might decide to represent all zero values
with a positive sign, disallowing negative zero. (The application with a positive sign, disallowing negative zero. (The application
may also want to restrict the precision of floating point values may also want to restrict the precision of floating-point values
in such a way that there is never a need to represent 64-bit -- or in such a way that there is never a need to represent 64-bit -- or
even 32-bit -- floating-point values.) even 32-bit -- floating-point values.)
* If a protocol includes a field that can express floating-point * If a protocol includes a field that can express floating-point
values, with a specific data model that declares integer and values, with a specific data model that declares integer and
floating-point values to be interchangeable, the protocol's floating-point values to be interchangeable, the protocol's
deterministic encoding needs to specify whether the integer 1.0 is deterministic encoding needs to specify whether (for example) the
encoded as 0x01, 0xf93c00, 0xfa3f800000, or 0xfb3ff0000000000000. integer 1.0 is encoded as 0x01 (unsigned integer), 0xf93c00
Example rules for this are: (binary16), 0xfa3f800000 (binary32), or 0xfb3ff0000000000000
(binary64). Example rules for this are:
1. Encode integral values that fit in 64 bits as values from 1. Encode integral values that fit in 64 bits as values from
major types 0 and 1, and other values as the preferred major types 0 and 1, and other values as the preferred
(smallest of 16-, 32-, or 64-bit) floating-point (smallest of 16-, 32-, or 64-bit) floating-point
representation that accurately represents the value, representation that accurately represents the value,
2. Encode all values as the preferred floating-point 2. Encode all values as the preferred floating-point
representation that accurately represents the value, even for representation that accurately represents the value, even for
integral values, or integral values, or
skipping to change at page 34, line 33 skipping to change at page 35, line 25
Data formats such as CBOR are often used in environments where there Data formats such as CBOR are often used in environments where there
is no format negotiation. A specific design goal of CBOR is to not is no format negotiation. A specific design goal of CBOR is to not
need any included or assumed schema: a decoder can take a CBOR item need any included or assumed schema: a decoder can take a CBOR item
and decode it with no other knowledge. and decode it with no other knowledge.
Of course, in real-world implementations, the encoder and the decoder Of course, in real-world implementations, the encoder and the decoder
will have a shared view of what should be in a CBOR data item. For will have a shared view of what should be in a CBOR data item. For
example, an agreed-to format might be "the item is an array whose example, an agreed-to format might be "the item is an array whose
first value is a UTF-8 string, second value is an integer, and first value is a UTF-8 string, second value is an integer, and
subsequent values are zero or more floating-point numbers" or "the subsequent values are zero or more floating-point numbers" or "the
item is a map that has byte strings for keys and contains at least item is a map that has byte strings for keys and contains a pair
one pair whose key is 0xab01". whose key is 0xab01".
CBOR-based protocols MUST specify how their decoders handle invalid CBOR-based protocols MUST specify how their decoders handle invalid
and other unexpected data. CBOR-based protocols MAY specify that and other unexpected data. CBOR-based protocols MAY specify that
they treat arbitrary valid data as unexpected. Encoders for CBOR- they treat arbitrary valid data as unexpected. Encoders for CBOR-
based protocols MUST produce only valid items, that is, the protocol based protocols MUST produce only valid items, that is, the protocol
cannot be designed to make use of invalid items. An encoder can be cannot be designed to make use of invalid items. An encoder can be
capable of encoding as many or as few types of values as is required capable of encoding as many or as few types of values as is required
by the protocol in which it is used; a decoder can be capable of by the protocol in which it is used; a decoder can be capable of
understanding as many or as few types of values as is required by the understanding as many or as few types of values as is required by the
protocols in which it is used. This lack of restrictions allows CBOR protocols in which it is used. This lack of restrictions allows CBOR
skipping to change at page 35, line 26 skipping to change at page 36, line 11
sequence of CBOR data items concatenated back-to-back. In such an sequence of CBOR data items concatenated back-to-back. In such an
environment, the decoder immediately begins decoding a new data item environment, the decoder immediately begins decoding a new data item
if data is found after the end of a previous data item. if data is found after the end of a previous data item.
Not all of the bytes making up a data item may be immediately Not all of the bytes making up a data item may be immediately
available to the decoder; some decoders will buffer additional data available to the decoder; some decoders will buffer additional data
until a complete data item can be presented to the application. until a complete data item can be presented to the application.
Other decoders can present partial information about a top-level data Other decoders can present partial information about a top-level data
item to an application, such as the nested data items that could item to an application, such as the nested data items that could
already be decoded, or even parts of a byte string that hasn't already be decoded, or even parts of a byte string that hasn't
completely arrived yet. completely arrived yet. Such an application also MUST have matching
streaming security mechanism, where the desired protection is
available for incremental data presented to the application.
Note that some applications and protocols will not want to use Note that some applications and protocols will not want to use
indefinite-length encoding. Using indefinite-length encoding allows indefinite-length encoding. Using indefinite-length encoding allows
an encoder to not need to marshal all the data for counting, but it an encoder to not need to marshal all the data for counting, but it
requires a decoder to allocate increasing amounts of memory while requires a decoder to allocate increasing amounts of memory while
waiting for the end of the item. This might be fine for some waiting for the end of the item. This might be fine for some
applications but not others. applications but not others.
5.2. Generic Encoders and Decoders 5.2. Generic Encoders and Decoders
skipping to change at page 37, line 44 skipping to change at page 38, line 35
needs to have an API that reports an error (and does not return data) needs to have an API that reports an error (and does not return data)
for a CBOR data item that contains any of the validity errors listed for a CBOR data item that contains any of the validity errors listed
in the previous subsection. in the previous subsection.
The set of tags defined in the tag registry (Section 9.2), as well as The set of tags defined in the tag registry (Section 9.2), as well as
the set of simple values defined in the simple values registry the set of simple values defined in the simple values registry
(Section 9.1), can grow at any time beyond the set understood by a (Section 9.1), can grow at any time beyond the set understood by a
generic decoder. A validity-checking decoder can do one of two generic decoder. A validity-checking decoder can do one of two
things when it encounters such a case that it does not recognize: things when it encounters such a case that it does not recognize:
* It can report an error (and not return data). Note that this * It can report an error (and not return data). Note that treating
error is not a validity error per se. This kind of error is more this case as an error can cause ossification, and is thus not
likely to be raised by a decoder that would be performing validity encouraged. This error is not a validity error per se. This kind
checking if this were a known case. of error is more likely to be raised by a decoder that would be
performing validity checking if this were a known case.
* It can emit the unknown item (type, value, and, for tags, the * It can emit the unknown item (type, value, and, for tags, the
decoded tagged data item) to the application calling the decoder, decoded tagged data item) to the application calling the decoder,
with an indication that the decoder did not recognize that tag with an indication that the decoder did not recognize that tag
number or simple value. number or simple value.
The latter approach, which is also appropriate for decoders that do The latter approach, which is also appropriate for decoders that do
not support validity checking, provides forward compatibility with not support validity checking, provides forward compatibility with
newly registered tags and simple values without the requirement to newly registered tags and simple values without the requirement to
update the encoder at the same time as the calling application. (For update the encoder at the same time as the calling application. (For
skipping to change at page 38, line 36 skipping to change at page 39, line 31
reliably limits its output to valid CBOR, independent of whether or reliably limits its output to valid CBOR, independent of whether or
not its application is indeed providing API-conformant data. not its application is indeed providing API-conformant data.
5.5. Numbers 5.5. Numbers
CBOR-based protocols should take into account that different language CBOR-based protocols should take into account that different language
environments pose different restrictions on the range and precision environments pose different restrictions on the range and precision
of numbers that are representable. For example, the basic JavaScript of numbers that are representable. For example, the basic JavaScript
number system treats all numbers as floating-point values, which may number system treats all numbers as floating-point values, which may
result in silent loss of precision in decoding integers with more result in silent loss of precision in decoding integers with more
than 53 significant bits. A protocol that uses numbers should define than 53 significant bits. Another example is that, since CBOR keeps
its expectations on the handling of non-trivial numbers in decoders the sign bit for its integer representation in the major type, it has
and receiving applications. one bit more for signed numbers of a certain length (e.g.,
-2**64..2**64-1 for 1+8-byte integers) than the typical platform
signed integer representation of the same length (-2**63..2**63-1 for
8-byte int64_t). A protocol that uses numbers should define its
expectations on the handling of non-trivial numbers in decoders and
receiving applications.
A CBOR-based protocol that includes floating-point numbers can A CBOR-based protocol that includes floating-point numbers can
restrict which of the three formats (half-precision, single- restrict which of the three formats (half-precision, single-
precision, and double-precision) are to be supported. For an precision, and double-precision) are to be supported. For an
integer-only application, a protocol may want to completely exclude integer-only application, a protocol may want to completely exclude
the use of floating-point values. the use of floating-point values.
A CBOR-based protocol designed for compactness may want to exclude A CBOR-based protocol designed for compactness may want to exclude
specific integer encodings that are longer than necessary for the specific integer encodings that are longer than necessary for the
application, such as to save the need to implement 64-bit integers. application, such as to save the need to implement 64-bit integers.
skipping to change at page 40, line 5 skipping to change at page 41, line 5
A CBOR-based protocol MUST define what to do when a receiving A CBOR-based protocol MUST define what to do when a receiving
application does see multiple identical keys in a map. The resulting application does see multiple identical keys in a map. The resulting
rule in the protocol MUST respect the CBOR data model: it cannot rule in the protocol MUST respect the CBOR data model: it cannot
prescribe a specific handling of the entries with the identical keys, prescribe a specific handling of the entries with the identical keys,
except that it might have a rule that having identical keys in a map except that it might have a rule that having identical keys in a map
indicates a malformed map and that the decoder has to stop with an indicates a malformed map and that the decoder has to stop with an
error. When processing maps that exhibit entries with duplicate error. When processing maps that exhibit entries with duplicate
keys, a generic decoder might do one of the following: keys, a generic decoder might do one of the following:
* Not accept maps duplicate keys (that is, enforce validity for * Not accept maps with duplicate keys (that is, enforce validity for
maps, see also Section 5.4). These generic decoders are maps, see also Section 5.4). These generic decoders are
universally useful. An application may still need to do perform universally useful. An application may still need to do perform
its own duplicate checking based on application rules (for its own duplicate checking based on application rules (for
instance if the application equates integers and floating point instance if the application equates integers and floating-point
values in map key positions for specific maps). values in map key positions for specific maps).
* Pass all map entries to the application, including ones with * Pass all map entries to the application, including ones with
duplicate keys. This requires the application to handle (check duplicate keys. This requires the application to handle (check
against) duplicate keys, even if the application rules are against) duplicate keys, even if the application rules are
identical to the generic data model rules. identical to the generic data model rules.
* Lose some entries with duplicate keys, e.g. by only delivering the * Lose some entries with duplicate keys, e.g. by only delivering the
final (or first) entry out of the entries with the same key. With final (or first) entry out of the entries with the same key. With
such a generic decoder, applications may get different results for such a generic decoder, applications may get different results for
skipping to change at page 41, line 34 skipping to change at page 42, line 34
element, and are equal if they have the same number of bytes/elements element, and are equal if they have the same number of bytes/elements
and the same values at the same positions. Two maps are equal if and the same values at the same positions. Two maps are equal if
they have the same set of pairs regardless of their order; pairs are they have the same set of pairs regardless of their order; pairs are
equal if both the key and value are equal. equal if both the key and value are equal.
Tagged values are equal if both the tag number and the tag content Tagged values are equal if both the tag number and the tag content
are equal. (Note that a generic decoder that provides processing for are equal. (Note that a generic decoder that provides processing for
a specific tag may not be able to distinguish some semantically a specific tag may not be able to distinguish some semantically
equivalent values, e.g. if leading zeroes occur in the content of tag equivalent values, e.g. if leading zeroes occur in the content of tag
2/3 (Section 3.4.3).) Simple values are equal if they simply have 2/3 (Section 3.4.3).) Simple values are equal if they simply have
the same value. Nothing else is equal in the generic data model, a the same value. Nothing else is equal in the generic data model; a
simple value 2 is not equivalent to an integer 2 and an array is simple value 2 is not equivalent to an integer 2 and an array is
never equivalent to a map. never equivalent to a map.
As discussed in Section 2.2, specific data models can make values As discussed in Section 2.2, specific data models can make values
equivalent for the purpose of comparing map keys that are distinct in equivalent for the purpose of comparing map keys that are distinct in
the generic data model. Note that this implies that a generic the generic data model. Note that this implies that a generic
decoder may deliver a decoded map to an application that needs to be decoder may deliver a decoded map to an application that needs to be
checked for duplicate map keys by that application (alternatively, checked for duplicate map keys by that application (alternatively,
the decoder may provide a programming interface to perform this the decoder may provide a programming interface to perform this
service for the application). Specific data models cannot service for the application). Specific data models are not able to
distinguish values for map keys that are equal for this purpose at distinguish values for map keys that are equal for this purpose at
the generic data model level. the generic data model level.
5.7. Undefined Values 5.7. Undefined Values
In some CBOR-based protocols, the simple value (Section 3.3) of In some CBOR-based protocols, the simple value (Section 3.3) of
Undefined might be used by an encoder as a substitute for a data item Undefined might be used by an encoder as a substitute for a data item
with an encoding problem, in order to allow the rest of the enclosing with an encoding problem, in order to allow the rest of the enclosing
data items to be encoded without harm. data items to be encoded without harm.
6. Converting Data between CBOR and JSON 6. Converting Data between CBOR and JSON
This section gives non-normative advice about converting between CBOR This section gives non-normative advice about converting between CBOR
and JSON. Implementations of converters are free to use whichever and JSON. Implementations of converters MAY use whichever advice
advice here they want. here they want.
It is worth noting that a JSON text is a sequence of characters, not It is worth noting that a JSON text is a sequence of characters, not
an encoded sequence of bytes, while a CBOR data item consists of an encoded sequence of bytes, while a CBOR data item consists of
bytes, not characters. bytes, not characters.
6.1. Converting from CBOR to JSON 6.1. Converting from CBOR to JSON
Most of the types in CBOR have direct analogs in JSON. However, some Most of the types in CBOR have direct analogs in JSON. However, some
do not, and someone implementing a CBOR-to-JSON converter has to do not, and someone implementing a CBOR-to-JSON converter has to
consider what to do in those cases. The following non-normative consider what to do in those cases. The following non-normative
skipping to change at page 43, line 31 skipping to change at page 44, line 31
value not yet discussed) is represented by the substitute value. value not yet discussed) is represented by the substitute value.
* A bignum (major type 6, tag number 2 or 3) is represented by * A bignum (major type 6, tag number 2 or 3) is represented by
encoding its byte string in base64url without padding and becomes encoding its byte string in base64url without padding and becomes
a JSON string. For tag number 3 (negative bignum), a "~" (ASCII a JSON string. For tag number 3 (negative bignum), a "~" (ASCII
tilde) is inserted before the base-encoded value. (The conversion tilde) is inserted before the base-encoded value. (The conversion
to a binary blob instead of a number is to prevent a likely to a binary blob instead of a number is to prevent a likely
numeric overflow for the JSON decoder.) numeric overflow for the JSON decoder.)
* A byte string with an encoding hint (major type 6, tag number 21 * A byte string with an encoding hint (major type 6, tag number 21
through 23) is encoded as described and becomes a JSON string. through 23) is encoded as described by the hint and becomes a JSON
string.
* For all other tags (major type 6, any other tag number), the tag * For all other tags (major type 6, any other tag number), the tag
content is represented as a JSON value; the tag number is ignored. content is represented as a JSON value; the tag number is ignored.
* Indefinite-length items are made definite before conversion. * Indefinite-length items are made definite before conversion.
A CBOR-to-JSON converter may want to keep to the JSON profile I-JSON
[RFC7493], to maximize interoperability and increase confidence that
the JSON output can be processed with predictable results. For
example, this has implications on the range of integers that can be
represented reliably, as well as on the top-level items that may be
supported by older JSON implementations.
6.2. Converting from JSON to CBOR 6.2. Converting from JSON to CBOR
All JSON values, once decoded, directly map into one or more CBOR All JSON values, once decoded, directly map into one or more CBOR
values. As with any kind of CBOR generation, decisions have to be values. As with any kind of CBOR generation, decisions have to be
made with respect to number representation. In a suggested made with respect to number representation. In a suggested
conversion: conversion:
* JSON numbers without fractional parts (integer numbers) are * JSON numbers without fractional parts (integer numbers) are
represented as integers (major types 0 and 1, possibly major type represented as integers (major types 0 and 1, possibly major type
6 tag number 2 and 3), choosing the shortest form; integers longer 6 tag number 2 and 3), choosing the shortest form; integers longer
skipping to change at page 44, line 14 skipping to change at page 45, line 23
converter implementation, may choose -2**32..2**32-1 or converter implementation, may choose -2**32..2**32-1 or
-2**64..2**64-1 (fully using the integer ranges available in CBOR -2**64..2**64-1 (fully using the integer ranges available in CBOR
with uint32_t or uint64_t, respectively) or even -2**31..2**31-1 with uint32_t or uint64_t, respectively) or even -2**31..2**31-1
or -2**63..2**63-1 (using popular ranges for two's complement or -2**63..2**63-1 (using popular ranges for two's complement
signed integers). (If the JSON was generated from a JavaScript signed integers). (If the JSON was generated from a JavaScript
implementation, its precision is already limited to 53 bits implementation, its precision is already limited to 53 bits
maximum.) maximum.)
* Numbers with fractional parts are represented as floating-point * Numbers with fractional parts are represented as floating-point
values, performing the decimal-to-binary conversion based on the values, performing the decimal-to-binary conversion based on the
precision provided by IEEE 754 binary64. Then, when encoding in precision provided by IEEE 754 binary64. The mathematical value
CBOR, the preferred serialization uses the shortest floating-point of the JSON number is converted to binary64 using the
representation exactly representing this conversion result; for roundTiesToEven procedure in Section 4.3.1 of [IEEE754]. Then,
instance, 1.5 is represented in a 16-bit floating-point value (not when encoding in CBOR, the preferred serialization uses the
all implementations will be capable of efficiently finding the shortest floating-point representation exactly representing this
minimum form, though). Instead of using the default binary64 conversion result; for instance, 1.5 is represented in a 16-bit
precision, there may be an implementation-defined limit to the floating-point value (not all implementations will be capable of
precision of the conversion that will affect the precision of the efficiently finding the minimum form, though). Instead of using
represented values. Decimal representation should only be used on the default binary64 precision, there may be an implementation-
the CBOR side if that is specified in a protocol. defined limit to the precision of the conversion that will affect
the precision of the represented values. Decimal representation
should only be used on the CBOR side if that is specified in a
protocol.
CBOR has been designed to generally provide a more compact encoding CBOR has been designed to generally provide a more compact encoding
than JSON. One implementation strategy that might come to mind is to than JSON. One implementation strategy that might come to mind is to
perform a JSON-to-CBOR encoding in place in a single buffer. This perform a JSON-to-CBOR encoding in place in a single buffer. This
strategy would need to carefully consider a number of pathological strategy would need to carefully consider a number of pathological
cases, such as that some strings represented with no or very few cases, such as that some strings represented with no or very few
escapes and longer (or much longer) than 255 bytes may expand when escapes and longer (or much longer) than 255 bytes may expand when
encoded as UTF-8 strings in CBOR. Similarly, a few of the binary encoded as UTF-8 strings in CBOR. Similarly, a few of the binary
floating-point representations might cause expansion from some short floating-point representations might cause expansion from some short
decimal representations (1.1, 1e9) in JSON. This may be hard to get decimal representations (1.1, 1e9) in JSON. This may be hard to get
skipping to change at page 47, line 30 skipping to change at page 48, line 45
actual encodings do not overlap, so the string remains unambiguous). actual encodings do not overlap, so the string remains unambiguous).
For example, the byte string 0x12345678 could be written h'12345678', For example, the byte string 0x12345678 could be written h'12345678',
b32'CI2FM6A', or b64'EjRWeA'. b32'CI2FM6A', or b64'EjRWeA'.
Unassigned simple values are given as "simple()" with the appropriate Unassigned simple values are given as "simple()" with the appropriate
integer in the parentheses. For example, "simple(42)" indicates integer in the parentheses. For example, "simple(42)" indicates
major type 7, value 42. major type 7, value 42.
A number of useful extensions to the diagnostic notation defined here A number of useful extensions to the diagnostic notation defined here
are provided in Appendix G of [RFC8610], "Extended Diagnostic are provided in Appendix G of [RFC8610], "Extended Diagnostic
Notation" (EDN). Notation" (EDN). Similarly, an extension of this notation could be
provided in a separate document to provide for the documentation of
NaN payloads, which are not covered in the present document.
8.1. Encoding Indicators 8.1. Encoding Indicators
Sometimes it is useful to indicate in the diagnostic notation which Sometimes it is useful to indicate in the diagnostic notation which
of several alternative representations were actually used; for of several alternative representations were actually used; for
example, a data item written >1.5< by a diagnostic decoder might have example, a data item written >1.5< by a diagnostic decoder might have
been encoded as a half-, single-, or double-precision float. been encoded as a half-, single-, or double-precision float.
The convention for encoding indicators is that anything starting with The convention for encoding indicators is that anything starting with
an underscore and all following characters that are alphanumeric or an underscore and all following characters that are alphanumeric or
skipping to change at page 48, line 14 skipping to change at page 49, line 33
An underscore followed by a decimal digit n indicates that the An underscore followed by a decimal digit n indicates that the
preceding item (or, for arrays and maps, the item starting with the preceding item (or, for arrays and maps, the item starting with the
preceding bracket or brace) was encoded with an additional preceding bracket or brace) was encoded with an additional
information value of 24+n. For example, 1.5_1 is a half-precision information value of 24+n. For example, 1.5_1 is a half-precision
floating-point number, while 1.5_3 is encoded as double precision. floating-point number, while 1.5_3 is encoded as double precision.
This encoding indicator is not shown in Appendix A. (Note that the This encoding indicator is not shown in Appendix A. (Note that the
encoding indicator "_" is thus an abbreviation of the full form "_7", encoding indicator "_" is thus an abbreviation of the full form "_7",
which is not used.) which is not used.)
Byte and text strings of indefinite length can be notated in the form The detailed chunk structure of byte and text strings of indefinite
(_ h'0123', h'4567') and (_ "foo", "bar"). length can be notated in the form (_ h'0123', h'4567') and (_ "foo",
"bar"). However, for an indefinite length string with no chunks
inside, (_ ) would be ambiguous whether a byte string (0x5fff) or a
text string (0x7fff) is meant and is therefore not used. The basic
forms ''_ and ""_ can be used instead and are reserved for the case
with no chunks only -- not as short forms for the (permitted, but not
really useful) encodings with only empty chunks, which to preserve
the chunk structure need to be notated as (_ ''), (_ ""), etc.
9. IANA Considerations 9. IANA Considerations
IANA has created two registries for new CBOR values. The registries IANA has created two registries for new CBOR values. The registries
are separate, that is, not under an umbrella registry, and follow the are separate, that is, not under an umbrella registry, and follow the
rules in [RFC8126]. IANA has also assigned a new MIME media type and rules in [RFC8126]. IANA has also assigned a new MIME media type and
an associated Constrained Application Protocol (CoAP) Content-Format an associated Constrained Application Protocol (CoAP) Content-Format
entry. entry.
[To be removed by RFC editor:] IANA is requested to update these [To be removed by RFC editor:] IANA is requested to update these
skipping to change at page 48, line 47 skipping to change at page 50, line 24
contiguous blocks (if any). contiguous blocks (if any).
New entries in the range 32 to 255 are assigned by Specification New entries in the range 32 to 255 are assigned by Specification
Required. Required.
9.2. Tags Registry 9.2. Tags Registry
IANA has created the "Concise Binary Object Representation (CBOR) IANA has created the "Concise Binary Object Representation (CBOR)
Tags" registry at [IANA.cbor-tags]. The tags that were defined in Tags" registry at [IANA.cbor-tags]. The tags that were defined in
[RFC7049] are described in detail in Section 3.4, and other tags have [RFC7049] are described in detail in Section 3.4, and other tags have
already been defined. already been defined since then.
New entries in the range 0 to 23 ("1+0") are assigned by Standards New entries in the range 0 to 23 ("1+0") are assigned by Standards
Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767 Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767
(lower half of "1+2") are assigned by Specification Required. New (lower half of "1+2") are assigned by Specification Required. New
entries in the range 32768 to 18446744073709551615 (upper half of entries in the range 32768 to 18446744073709551615 (upper half of
"1+2", "1+4", and "1+8") are assigned by First Come First Served. "1+2", "1+4", and "1+8") are assigned by First Come First Served.
The template for registration requests is: The template for registration requests is:
* Data item * Data item
* Semantics (short form) * Semantics (short form)
In addition, First Come First Served requests should include: In addition, First Come First Served requests should include:
* Point of contact * Point of contact
* Description of semantics (URL) - This description is optional; the * Description of semantics (URL) -- This description is optional;
URL can point to something like an Internet-Draft or a web page. the URL can point to something like an Internet-Draft or a web
page.
Applicants exercising the First Come First Served range and making a Applicants exercising the First Come First Served range and making a
suggestion for a tag number that is not representable in 32 bits suggestion for a tag number that is not representable in 32 bits
(i.e., larger than 4294967295) should be aware that this could reduce (i.e., larger than 4294967295) should be aware that this could reduce
interoperability with implementations that do not support 64-bit interoperability with implementations that do not support 64-bit
numbers. numbers.
9.3. Media Type ("MIME Type") 9.3. Media Type ("MIME Type")
The Internet media type [RFC6838] for a single encoded CBOR data item The Internet media type [RFC6838] for a single encoded CBOR data item
is application/cbor, as defined in [IANA.media-types]: is application/cbor, as defined in [IANA.media-types]:
Type name: application Type name: application
Subtype name: cbor Subtype name: cbor
Required parameters: n/a Required parameters: n/a
Optional parameters: n/a Optional parameters: n/a
Encoding considerations: binary Encoding considerations: Binary
Security considerations: See Section 10 of this document Security considerations: See Section 10 of this document
Interoperability considerations: n/a Interoperability considerations: n/a
Published specification: This document Published specification: This document
Applications that use this media type: None yet, but it is expected Applications that use this media type: Many
that this format will be deployed in protocols and applications.
Additional information: * Magic number(s): n/a Additional information:
* Magic number(s): n/a
* File extension(s): .cbor * File extension(s): .cbor
* Macintosh file type code(s): n/a * Macintosh file type code(s): n/a
Person & email address to contact for further information: IETF CBOR Person & email address to contact for further information: IETF CBOR
Working Group cbor@ietf.org (mailto:cbor@ietf.org) or IETF Working Group cbor@ietf.org (mailto:cbor@ietf.org) or IETF
Applications and Real-Time Area art@ietf.org (mailto:art@ietf.org) Applications and Real-Time Area art@ietf.org (mailto:art@ietf.org)
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: none Restrictions on usage: none
Author: IETF CBOR Working Group cbor@ietf.org (mailto:cbor@ietf.org) Author: IETF CBOR Working Group cbor@ietf.org (mailto:cbor@ietf.org)
Change controller: The IESG iesg@ietf.org (mailto:iesg@ietf.org) Change controller: The IESG iesg@ietf.org (mailto:iesg@ietf.org)
9.4. CoAP Content-Format 9.4. CoAP Content-Format
The CoAP Content-Format for CBOR is defined in The CoAP Content-Format for CBOR is registered in
[IANA.core-parameters]: [IANA.core-parameters]:
Media Type: application/cbor Media Type: application/cbor
Encoding: - Encoding: -
Id: 60 Id: 60
Reference: [RFCthis] Reference: [RFCthis]
9.5. The +cbor Structured Syntax Suffix Registration 9.5. The +cbor Structured Syntax Suffix Registration
The Structured Syntax Suffix [RFC6838] for media types based on a The Structured Syntax Suffix [RFC6838] for media types based on a
single encoded CBOR data item is +cbor, as defined in single encoded CBOR data item is +cbor, as defined in
skipping to change at page 52, line 9 skipping to change at page 53, line 33
Because CBOR decoders are often used as a first step in processing Because CBOR decoders are often used as a first step in processing
unvalidated input, they need to be fully prepared for all types of unvalidated input, they need to be fully prepared for all types of
hostile input that may be designed to corrupt, overrun, or achieve hostile input that may be designed to corrupt, overrun, or achieve
control of the system decoding the CBOR data item. A CBOR decoder control of the system decoding the CBOR data item. A CBOR decoder
needs to assume that all input may be hostile even if it has been needs to assume that all input may be hostile even if it has been
checked by a firewall, has come over a secure channel such as TLS, is checked by a firewall, has come over a secure channel such as TLS, is
encrypted or signed, or has come from some other source that is encrypted or signed, or has come from some other source that is
presumed trusted. presumed trusted.
Section 4.1 gives examples of limitations in interoperability when
using a constrained CBOR decoder with input from a CBOR encoder that
uses a non-preferred serialization. When a single data item is
consumed both by such a constrained decoder and a full decoder, it
can lead to security issues that can be exploited by an attacker who
can inject or manipulate content.
As discussed throughout this document, there are many values that can
be considered "equivalent" in some circumstances and "not equivalent"
in others. As just one example, the numeric value for the number
"one" might be expressed as an integer or a bignum. A system
interpreting CBOR input might accept either form for the number
"one", or might reject one (or both) forms. Such acceptance or
rejection can have security implications in the program that is using
the interpreted input.
Hostile input may be constructed to overrun buffers, overflow or Hostile input may be constructed to overrun buffers, overflow or
underflow integer arithmetic, or cause other decoding disruption. underflow integer arithmetic, or cause other decoding disruption.
CBOR data items might have lengths or sizes that are intentionally CBOR data items might have lengths or sizes that are intentionally
extremely large or too short. Resource exhaustion attacks might extremely large or too short. Resource exhaustion attacks might
attempt to lure a decoder into allocating very big data items attempt to lure a decoder into allocating very big data items
(strings, arrays, maps, or even arbitrary precision numbers) or (strings, arrays, maps, or even arbitrary precision numbers) or
exhaust the stack depth by setting up deeply nested items. Decoders exhaust the stack depth by setting up deeply nested items. Decoders
need to have appropriate resource management to mitigate these need to have appropriate resource management to mitigate these
attacks. (Items for which very large sizes are given can also attacks. (Items for which very large sizes are given can also
attempt to exploit integer overflow vulnerabilities.) attempt to exploit integer overflow vulnerabilities.)
skipping to change at page 52, line 38 skipping to change at page 54, line 29
also perform validity checks on the CBOR data. Alternatively, it can also perform validity checks on the CBOR data. Alternatively, it can
leave those checks to the application using the decoder. This choice leave those checks to the application using the decoder. This choice
needs to be clearly documented in the decoder. Beyond the validity needs to be clearly documented in the decoder. Beyond the validity
at the CBOR level, an application also needs to ascertain that the at the CBOR level, an application also needs to ascertain that the
input is in alignment with the application protocol that is input is in alignment with the application protocol that is
serialized in CBOR. serialized in CBOR.
The input check itself may consume resources. This is usually linear The input check itself may consume resources. This is usually linear
in the size of the input, which means that an attacker has to spend in the size of the input, which means that an attacker has to spend
resources that are commensurate to the resources spent by the resources that are commensurate to the resources spent by the
defender on input validation. Processing for arbitrary-precision defender on input validation. However, an attacker might be able to
craft inputs that will take longer for a target decoder to process
than for the attacker to produce. Processing for arbitrary-precision
numbers may exceed linear effort. Also, some hash-table numbers may exceed linear effort. Also, some hash-table
implementations that are used by decoders to build in-memory implementations that are used by decoders to build in-memory
representations of maps can be attacked to spend quadratic effort, representations of maps can be attacked to spend quadratic effort,
unless a secret key (see Section 7 of [SIPHASH]) or some other unless a secret key (see Section 7 of [SIPHASH_LNCS], also
mitigation is employed. Such superlinear efforts can be exploited by [SIPHASH_OPEN]) or some other mitigation is employed. Such
an attacker to exhaust resources at or before the input validator; superlinear efforts can be exploited by an attacker to exhaust
they therefore need to be avoided in a CBOR decoder implementation. resources at or before the input validator; they therefore need to be
Note that tag number definitions and their implementations can add avoided in a CBOR decoder implementation. Note that tag number
security considerations of this kind; this should then be discussed definitions and their implementations can add security considerations
in the security considerations of the tag number definition. of this kind; this should then be discussed in the security
considerations of the tag number definition.
CBOR encoders do not receive input directly from the network and are CBOR encoders do not receive input directly from the network and are
thus not directly attackable in the same way as CBOR decoders. thus not directly attackable in the same way as CBOR decoders.
However, CBOR encoders often have an API that takes input from However, CBOR encoders often have an API that takes input from
another level in the implementation and can be attacked through that another level in the implementation and can be attacked through that
API. The design and implementation of that API should assume the API. The design and implementation of that API should assume the
behavior of its caller may be based on hostile input or on coding behavior of its caller may be based on hostile input or on coding
mistakes. It should check inputs for buffer overruns, overflow and mistakes. It should check inputs for buffer overruns, overflow and
underflow of integer arithmetic, and other such errors that are aimed underflow of integer arithmetic, and other such errors that are aimed
to disrupt the encoder. to disrupt the encoder.
skipping to change at page 53, line 34 skipping to change at page 55, line 34
cannot know about all requirements that an application poses on its cannot know about all requirements that an application poses on its
input data; it is therefore not relieving the application from input data; it is therefore not relieving the application from
performing its own input checking. Also, since the set of defined performing its own input checking. Also, since the set of defined
tag numbers evolves, the application may employ a tag number that is tag numbers evolves, the application may employ a tag number that is
not yet supported for validity checking by the generic decoder it not yet supported for validity checking by the generic decoder it
uses. Generic decoders therefore need to provide documentation which uses. Generic decoders therefore need to provide documentation which
tag numbers they support and what validity checking they can provide tag numbers they support and what validity checking they can provide
for each of them as well as for basic CBOR validity (UTF-8 checking, for each of them as well as for basic CBOR validity (UTF-8 checking,
duplicate map key checking). duplicate map key checking).
Section 3.4.3 notes that using the non-preferred choice of a bignum
representation instead of a basic integer for encoding a number is
not intended to have application semantics, but it can have such
semantics if an application receiving CBOR data is using a decoder in
the basic generic data model. This disparity causes a security issue
if the two sets of semantics differ. Thus, applications using CBOR
need to specify the data model that they are using for each use of
CBOR data.
It is common to convert CBOR data to other formats. In many cases,
CBOR has more expressive types than other formats; this is
particularly true for the common conversion to JSON. The loss of
type information can cause security issues for the systems that are
processing the less-expressive data.
Section 6.2 describes a possibly-common usage scenario of converting
between CBOR and JSON that could allow an attack if the attcker knows
that the application is performing the conversion.
Security considerations for the use of base16 and base64 from
[RFC4648], and the use of UTF-8 from [RFC3629], are relevant to CBOR
as well.
11. References 11. References
11.1. Normative References 11.1. Normative References
[ECMA262] Ecma International, "ECMAScript 2018 Language [C] International Organization for Standardization,
Specification", ECMA Standard ECMA-262, 9th Edition, June "Information technology — Programming languages — C", ISO/
2018, <https://www.ecma- IEC 9899:2018, Fourth Edition, June 2018.
international.org/publications/files/ECMA-ST/Ecma-
262.pdf>. [Cplusplus17]
International Organization for Standardization,
"Programming languages — C++", ISO/IEC 14882:2017, Fifth
Edition, December 2017.
[IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
Std 754-2008. Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229,
<https://ieeexplore.ieee.org/document/8766229>.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
<https://www.rfc-editor.org/info/rfc2045>. <https://www.rfc-editor.org/info/rfc2045>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
skipping to change at page 54, line 40 skipping to change at page 57, line 18
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26, Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017, RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc8126>. <https://www.rfc-editor.org/info/rfc8126>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[TIME_T] The Open Group Base Specifications, "Vol. 1: Base [TIME_T] The Open Group Base Specifications, "Open Group Standard:
Definitions, Issue 7", 2013 Edition, IEEE Std 1003.1, Vol. 1: Base Definitions, Issue 7", Section 4.16 'Seconds
Section 4.15 'Seconds Since the Epoch', 2013, Since the Epoch', IEEE Std 1003.1, 2018 Edition, 2018,
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/ <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/
V1_chap04.html#tag_04_15>. V1_chap04.html#tag_04_16>.
11.2. Informative References 11.2. Informative References
[ASN.1] International Telecommunication Union, "Information [ASN.1] International Telecommunication Union, "Information
Technology -- ASN.1 encoding rules: Specification of Basic Technology ASN.1 encoding rules: Specification of Basic
Encoding Rules (BER), Canonical Encoding Rules (CER) and Encoding Rules (BER), Canonical Encoding Rules (CER) and
Distinguished Encoding Rules (DER)", ITU-T Recommendation Distinguished Encoding Rules (DER)", ITU-T Recommendation
X.690, 1994. X.690, 1994.
[BSON] Various, "BSON - Binary JSON", 2013, [BSON] Various, "BSON - Binary JSON", 2013,
<http://bsonspec.org/>. <http://bsonspec.org/>.
[ECMA262] Ecma International, "ECMAScript 2018 Language
Specification", ECMA Standard ECMA-262, 9th Edition, June
2018, <https://www.ecma-
international.org/publications/files/ECMA-ST/Ecma-
262.pdf>.
[I-D.bormann-cbor-notable-tags] [I-D.bormann-cbor-notable-tags]
Bormann, C., "Notable CBOR Tags", Work in Progress, Bormann, C., "Notable CBOR Tags", Work in Progress,
Internet-Draft, draft-bormann-cbor-notable-tags-01, 15 May Internet-Draft, draft-bormann-cbor-notable-tags-02, 25
2020, <http://www.ietf.org/internet-drafts/draft-bormann- June 2020, <http://www.ietf.org/internet-drafts/draft-
cbor-notable-tags-01.txt>. bormann-cbor-notable-tags-02.txt>.
[IANA.cbor-simple-values] [IANA.cbor-simple-values]
IANA, "Concise Binary Object Representation (CBOR) Simple IANA, "Concise Binary Object Representation (CBOR) Simple
Values", Values",
<http://www.iana.org/assignments/cbor-simple-values>. <http://www.iana.org/assignments/cbor-simple-values>.
[IANA.cbor-tags] [IANA.cbor-tags]
IANA, "Concise Binary Object Representation (CBOR) Tags", IANA, "Concise Binary Object Representation (CBOR) Tags",
<http://www.iana.org/assignments/cbor-tags>. <http://www.iana.org/assignments/cbor-tags>.
skipping to change at page 56, line 47 skipping to change at page 59, line 34
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
<https://www.rfc-editor.org/info/rfc8742>. <https://www.rfc-editor.org/info/rfc8742>.
[RFC8746] Bormann, C., Ed., "Concise Binary Object Representation [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746, (CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020, DOI 10.17487/RFC8746, February 2020,
<https://www.rfc-editor.org/info/rfc8746>. <https://www.rfc-editor.org/info/rfc8746>.
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- [SIPHASH_LNCS]
Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
Notes in Computer Science pp. 489-508, 2012, Input PRF", Lecture Notes in Computer Science pp. 489-508,
DOI 10.1007/978-3-642-34931-7_28, 2012,
<https://doi.org/10.1007/978-3-642-34931-7_28>. <https://doi.org/10.1007/978-3-642-34931-7_28>.
[SIPHASH_OPEN]
Aumasson, J. and D.J. Bernstein, "SipHash: a fast short-
input PRF", <https://131002.net/siphash/siphash.pdf>.
[YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
Language (YAML[TM]) Version 1.2", 3rd Edition, October Language (YAML[TM]) Version 1.2", 3rd Edition, October
2009, <http://www.yaml.org/spec/1.2/spec.html>. 2009, <http://www.yaml.org/spec/1.2/spec.html>.
Appendix A. Examples Appendix A. Examples of Encoded CBOR Data Items
The following table provides some CBOR-encoded values in hexadecimal The following table provides some CBOR-encoded values in hexadecimal
(right column), together with diagnostic notation for these values (right column), together with diagnostic notation for these values
(left column). Note that the string "\u00fc" is one form of (left column). Note that the string "\u00fc" is one form of
diagnostic notation for a UTF-8 string containing the single Unicode diagnostic notation for a UTF-8 string containing the single Unicode
character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut). character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut).
Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often
representing "water"), and "\ud800\udd51" is a UTF-8 string in representing "water"), and "\ud800\udd51" is a UTF-8 string in
diagnostic notation with a single character U+10151 (GREEK ACROPHONIC diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
ATTIC FIFTY STATERS). (Note that all these single-character strings ATTIC FIFTY STATERS). (Note that all these single-character strings
could also be represented in native UTF-8 in diagnostic notation, could also be represented in native UTF-8 in diagnostic notation,
just not in an ASCII-only specification like the present one.) In just not in an ASCII-only specification.) In the diagnostic notation
the diagnostic notation provided for bignums, their intended numeric provided for bignums, their intended numeric value is shown as a
value is shown as a decimal number (such as 18446744073709551616) decimal number (such as 18446744073709551616) instead of showing a
instead of showing a tagged byte string (such as tagged byte string (such as 2(h'010000000000000000')).
2(h'010000000000000000')).
+------------------------------+------------------------------------+
| Diagnostic | Encoded |
+==============================+====================================+ +==============================+====================================+
| 0 | 0x00 | |Diagnostic | Encoded |
+==============================+====================================+
|0 | 0x00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1 | 0x01 | |1 | 0x01 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 10 | 0x0a | |10 | 0x0a |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 23 | 0x17 | |23 | 0x17 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 24 | 0x1818 | |24 | 0x1818 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 25 | 0x1819 | |25 | 0x1819 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 100 | 0x1864 | |100 | 0x1864 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1000 | 0x1903e8 | |1000 | 0x1903e8 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1000000 | 0x1a000f4240 | |1000000 | 0x1a000f4240 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1000000000000 | 0x1b000000e8d4a51000 | |1000000000000 | 0x1b000000e8d4a51000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 18446744073709551615 | 0x1bffffffffffffffff | |18446744073709551615 | 0x1bffffffffffffffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 18446744073709551616 | 0xc249010000000000000000 | |18446744073709551616 | 0xc249010000000000000000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -18446744073709551616 | 0x3bffffffffffffffff | |-18446744073709551616 | 0x3bffffffffffffffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -18446744073709551617 | 0xc349010000000000000000 | |-18446744073709551617 | 0xc349010000000000000000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -1 | 0x20 | |-1 | 0x20 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -10 | 0x29 | |-10 | 0x29 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -100 | 0x3863 | |-100 | 0x3863 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -1000 | 0x3903e7 | |-1000 | 0x3903e7 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 0.0 | 0xf90000 | |0.0 | 0xf90000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -0.0 | 0xf98000 | |-0.0 | 0xf98000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1.0 | 0xf93c00 | |1.0 | 0xf93c00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1.1 | 0xfb3ff199999999999a | |1.1 | 0xfb3ff199999999999a |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1.5 | 0xf93e00 | |1.5 | 0xf93e00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 65504.0 | 0xf97bff | |65504.0 | 0xf97bff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 100000.0 | 0xfa47c35000 | |100000.0 | 0xfa47c35000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 3.4028234663852886e+38 | 0xfa7f7fffff | |3.4028234663852886e+38 | 0xfa7f7fffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1.0e+300 | 0xfb7e37e43c8800759c | |1.0e+300 | 0xfb7e37e43c8800759c |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 5.960464477539063e-8 | 0xf90001 | |5.960464477539063e-8 | 0xf90001 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 0.00006103515625 | 0xf90400 | |0.00006103515625 | 0xf90400 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -4.0 | 0xf9c400 | |-4.0 | 0xf9c400 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -4.1 | 0xfbc010666666666666 | |-4.1 | 0xfbc010666666666666 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| Infinity | 0xf97c00 | |Infinity | 0xf97c00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| NaN | 0xf97e00 | |NaN | 0xf97e00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -Infinity | 0xf9fc00 | |-Infinity | 0xf9fc00 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| Infinity | 0xfa7f800000 | |Infinity | 0xfa7f800000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| NaN | 0xfa7fc00000 | |NaN | 0xfa7fc00000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -Infinity | 0xfaff800000 | |-Infinity | 0xfaff800000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| Infinity | 0xfb7ff0000000000000 | |Infinity | 0xfb7ff0000000000000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| NaN | 0xfb7ff8000000000000 | |NaN | 0xfb7ff8000000000000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| -Infinity | 0xfbfff0000000000000 | |-Infinity | 0xfbfff0000000000000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| false | 0xf4 | |false | 0xf4 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| true | 0xf5 | |true | 0xf5 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| null | 0xf6 | |null | 0xf6 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| undefined | 0xf7 | |undefined | 0xf7 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| simple(16) | 0xf0 | |simple(16) | 0xf0 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| simple(255) | 0xf8ff | |simple(255) | 0xf8ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | |0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a |
| | 30343a30305a | | | 30343a30305a |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1(1363896240) | 0xc11a514b67b0 | |1(1363896240) | 0xc11a514b67b0 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 1(1363896240.5) | 0xc1fb41d452d9ec200000 | |1(1363896240.5) | 0xc1fb41d452d9ec200000 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 23(h'01020304') | 0xd74401020304 | |23(h'01020304') | 0xd74401020304 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 24(h'6449455446') | 0xd818456449455446 | |24(h'6449455446') | 0xd818456449455446 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | |32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
| | 616d706c652e636f6d | | | 616d706c652e636f6d |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| h'' | 0x40 | |h'' | 0x40 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| h'01020304' | 0x4401020304 | |h'01020304' | 0x4401020304 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "" | 0x60 | |"" | 0x60 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "a" | 0x6161 | |"a" | 0x6161 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "IETF" | 0x6449455446 | |"IETF" | 0x6449455446 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "\"\\" | 0x62225c | |"\"\\" | 0x62225c |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "\u00fc" | 0x62c3bc | |"\u00fc" | 0x62c3bc |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "\u6c34" | 0x63e6b0b4 | |"\u6c34" | 0x63e6b0b4 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| "\ud800\udd51" | 0x64f0908591 | |"\ud800\udd51" | 0x64f0908591 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [] | 0x80 | |[] | 0x80 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [1, 2, 3] | 0x83010203 | |[1, 2, 3] | 0x83010203 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [1, [2, 3], [4, 5]] | 0x8301820203820405 | |[1, [2, 3], [4, 5]] | 0x8301820203820405 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | |[1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e |
| 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | |10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 |
| 17, 18, 19, 20, 21, 22, 23, | | |17, 18, 19, 20, 21, 22, 23, | |
| 24, 25] | | |24, 25] | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {} | 0xa0 | |{} | 0xa0 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {1: 2, 3: 4} | 0xa201020304 | |{1: 2, 3: 4} | 0xa201020304 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {"a": 1, "b": [2, 3]} | 0xa26161016162820203 | |{"a": 1, "b": [2, 3]} | 0xa26161016162820203 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| ["a", {"b": "c"}] | 0x826161a161626163 | |["a", {"b": "c"}] | 0x826161a161626163 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
|{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 | |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 |
| "d": "D", "e": "E"} | 4461656145 | |"d": "D", "e": "E"} | 4461656145 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| (_ h'0102', h'030405') | 0x5f42010243030405ff | |(_ h'0102', h'030405') | 0x5f42010243030405ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| (_ "strea", "ming") | 0x7f657374726561646d696e67ff | |(_ "strea", "ming") | 0x7f657374726561646d696e67ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [_ ] | 0x9fff | |[_ ] | 0x9fff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | |[_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | |[_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | |[1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | |[1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
|[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f | |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f |
| 10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff | |10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff |
| 17, 18, 19, 20, 21, 22, 23, | | |17, 18, 19, 20, 21, 22, 23, | |
| 24, 25] | | |24, 25] | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | |{_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | |["a", {_ "b": "c"}] | 0x826161bf61626163ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | |{_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 6: Examples of Encoded CBOR Data Items Table 6: Examples of Encoded CBOR Data Items
Appendix B. Jump Table Appendix B. Jump Table for Initial Byte
For brevity, this jump table does not show initial bytes that are For brevity, this jump table does not show initial bytes that are
reserved for future extension. It also only shows a selection of the reserved for future extension. It also only shows a selection of the
initial bytes that can be used for optional features. (All unsigned initial bytes that can be used for optional features. (All unsigned
integers are in network byte order.) integers are in network byte order.)
+------------+------------------------------------------------+ +============+================================================+
| Byte | Structure/Semantics | | Byte | Structure/Semantics |
+============+================================================+ +============+================================================+
| 0x00..0x17 | Unsigned integer 0x00..0x17 (0..23) | | 0x00..0x17 | Unsigned integer 0x00..0x17 (0..23) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0x18 | Unsigned integer (one-byte uint8_t follows) | | 0x18 | Unsigned integer (one-byte uint8_t follows) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0x19 | Unsigned integer (two-byte uint16_t follows) | | 0x19 | Unsigned integer (two-byte uint16_t follows) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0x1a | Unsigned integer (four-byte uint32_t follows) | | 0x1a | Unsigned integer (four-byte uint32_t follows) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
skipping to change at page 63, line 41 skipping to change at page 66, line 35
| | see Section 3.4.4) | | | see Section 3.4.4) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xc5 | Bigfloat (data item "array" follows; see | | 0xc5 | Bigfloat (data item "array" follows; see |
| | Section 3.4.4) | | | Section 3.4.4) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xc6..0xd4 | (tag) | | 0xc6..0xd4 | (tag) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xd5..0xd7 | Expected Conversion (data item follows; see | | 0xd5..0xd7 | Expected Conversion (data item follows; see |
| | Section 3.4.5.2) | | | Section 3.4.5.2) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xd8..0xdb | (more tags, 1/2/4/8 bytes and then a data item | | 0xd8..0xdb | (more tags; 1/2/4/8 bytes of tag number and |
| | follow) | | | then a data item follow) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xe0..0xf3 | (simple value) | | 0xe0..0xf3 | (simple value) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xf4 | False | | 0xf4 | False |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xf5 | True | | 0xf5 | True |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xf6 | Null | | 0xf6 | Null |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xf7 | Undefined | | 0xf7 | Undefined |
skipping to change at page 64, line 42 skipping to change at page 67, line 36
byte string. If n bytes are no longer available, take(n) fails. byte string. If n bytes are no longer available, take(n) fails.
* uint() converts a byte string into an unsigned integer by * uint() converts a byte string into an unsigned integer by
interpreting the byte string in network byte order. interpreting the byte string in network byte order.
* Arithmetic works as in C. * Arithmetic works as in C.
* All variables are unsigned integers of sufficient range. * All variables are unsigned integers of sufficient range.
Note that "well_formed" returns the major type for well-formed Note that "well_formed" returns the major type for well-formed
definite length items, but 0 for an indefinite length item (or -1 for definite length items, but 99 for an indefinite length item (or -1
a "break" stop code, only if "breakable" is set). This is used in for a "break" stop code, only if "breakable" is set). This is used
"well_formed_indefinite" to ascertain that indefinite length strings in "well_formed_indefinite" to ascertain that indefinite length
only contain definite length strings as chunks. strings only contain definite length strings as chunks.
well_formed (breakable = false) { well_formed(breakable = false) {
// process initial bytes // process initial bytes
ib = uint(take(1)); ib = uint(take(1));
mt = ib >> 5; mt = ib >> 5;
val = ai = ib & 0x1f; val = ai = ib & 0x1f;
switch (ai) { switch (ai) {
case 24: val = uint(take(1)); break; case 24: val = uint(take(1)); break;
case 25: val = uint(take(2)); break; case 25: val = uint(take(2)); break;
case 26: val = uint(take(4)); break; case 26: val = uint(take(4)); break;
case 27: val = uint(take(8)); break; case 27: val = uint(take(8)); break;
case 28: case 29: case 30: fail(); case 28: case 29: case 30: fail();
skipping to change at page 65, line 28 skipping to change at page 68, line 28
} }
// process content // process content
switch (mt) { switch (mt) {
// case 0, 1, 7 do not have content; just use val // case 0, 1, 7 do not have content; just use val
case 2: case 3: take(val); break; // bytes/UTF-8 case 2: case 3: take(val); break; // bytes/UTF-8
case 4: for (i = 0; i < val; i++) well_formed(); break; case 4: for (i = 0; i < val; i++) well_formed(); break;
case 5: for (i = 0; i < val*2; i++) well_formed(); break; case 5: for (i = 0; i < val*2; i++) well_formed(); break;
case 6: well_formed(); break; // 1 embedded data item case 6: well_formed(); break; // 1 embedded data item
case 7: if (ai == 24 && val < 32) fail(); // bad simple case 7: if (ai == 24 && val < 32) fail(); // bad simple
} }
return mt; // finite data item return mt; // definite-length data item
} }
well_formed_indefinite(mt, breakable) { well_formed_indefinite(mt, breakable) {
switch (mt) { switch (mt) {
case 2: case 3: case 2: case 3:
while ((it = well_formed(true)) != -1) while ((it = well_formed(true)) != -1)
if (it != mt) // need finite-length chunk if (it != mt) // need definite-length chunk
fail(); // of same type fail(); // of same type
break; break;
case 4: while (well_formed(true) != -1); break; case 4: while (well_formed(true) != -1); break;
case 5: while (well_formed(true) != -1) well_formed(); break; case 5: while (well_formed(true) != -1) well_formed(); break;
case 7: case 7:
if (breakable) if (breakable)
return -1; // signal break out return -1; // signal break out
else fail(); // no enclosing indefinite else fail(); // no enclosing indefinite
default: fail(); // wrong mt default: fail(); // wrong mt
} }
return 0; // no break out return 99; // indefinite-length data item
} }
Figure 1: Pseudocode for Well-Formedness Check Figure 1: Pseudocode for Well-Formedness Check
Note that the remaining complexity of a complete CBOR decoder is Note that the remaining complexity of a complete CBOR decoder is
about presenting data that has been decoded to the application in an about presenting data that has been decoded to the application in an
appropriate form. appropriate form.
Major types 0 and 1 are designed in such a way that they can be Major types 0 and 1 are designed in such a way that they can be
encoded in C from a signed integer without actually doing an if-then- encoded in C from a signed integer without actually doing an if-then-
skipping to change at page 66, line 22 skipping to change at page 69, line 22
(-1-n), the transformation for major type 1, is the same as ~n (-1-n), the transformation for major type 1, is the same as ~n
(bitwise complement) in C unsigned arithmetic; ~n can then be (bitwise complement) in C unsigned arithmetic; ~n can then be
expressed as (-1)^n for the negative case, while 0^n leaves n expressed as (-1)^n for the negative case, while 0^n leaves n
unchanged for non-negative. The sign of a number can be converted to unchanged for non-negative. The sign of a number can be converted to
-1 for negative and 0 for non-negative (0 or positive) by arithmetic- -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
shifting the number by one bit less than the bit length of the number shifting the number by one bit less than the bit length of the number
(for example, by 63 for 64-bit numbers). (for example, by 63 for 64-bit numbers).
void encode_sint(int64_t n) { void encode_sint(int64_t n) {
uint64t ui = n >> 63; // extend sign to whole length uint64t ui = n >> 63; // extend sign to whole length
mt = ui & 0x20; // extract major type unsigned mt = ui & 0x20; // extract (shifted) major type
ui ^= n; // complement negatives ui ^= n; // complement negatives
if (ui < 24) if (ui < 24)
*p++ = mt + ui; *p++ = mt + ui;
else if (ui < 256) { else if (ui < 256) {
*p++ = mt + 24; *p++ = mt + 24;
*p++ = ui; *p++ = ui;
} else } else
... ...
Figure 2: Pseudocode for Encoding a Signed Integer Figure 2: Pseudocode for Encoding a Signed Integer
See Section 1.2 for some specific assumptions about the profile of
the C language used in these pieces of code.
Appendix D. Half-Precision Appendix D. Half-Precision
As half-precision floating-point numbers were only added to IEEE 754 As half-precision floating-point numbers were only added to IEEE 754
in 2008 [IEEE754], today's programming platforms often still only in 2008 [IEEE754], today's programming platforms often still only
have limited support for them. It is very easy to include at least have limited support for them. It is very easy to include at least
decoding support for them even without such support. An example of a decoding support for them even without such support. An example of a
small decoder for half-precision floating-point numbers in the C small decoder for half-precision floating-point numbers in the C
language is shown in Figure 3. A similar program for Python is in language is shown in Figure 3. A similar program for Python is in
Figure 4; this code assumes that the 2-byte value has already been Figure 4; this code assumes that the 2-byte value has already been
decoded as an (unsigned short) integer in network byte order (as decoded as an (unsigned short) integer in network byte order (as
would be done by the pseudocode in Appendix C). would be done by the pseudocode in Appendix C).
#include <math.h> #include <math.h>
double decode_half(unsigned char *halfp) { double decode_half(unsigned char *halfp) {
int half = (halfp[0] << 8) + halfp[1]; unsigned half = (halfp[0] << 8) + halfp[1];
int exp = (half >> 10) & 0x1f; unsigned exp = (half >> 10) & 0x1f;
int mant = half & 0x3ff; unsigned mant = half & 0x3ff;
double val; double val;
if (exp == 0) val = ldexp(mant, -24); if (exp == 0) val = ldexp(mant, -24);
else if (exp != 31) val = ldexp(mant + 1024, exp - 25); else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
else val = mant == 0 ? INFINITY : NAN; else val = mant == 0 ? INFINITY : NAN;
return half & 0x8000 ? -val : val; return half & 0x8000 ? -val : val;
} }
Figure 3: C Code for a Half-Precision Decoder Figure 3: C Code for a Half-Precision Decoder
import struct import struct
skipping to change at page 70, line 5 skipping to change at page 73, line 5
E.5. Conciseness on the Wire E.5. Conciseness on the Wire
While CBOR's design objective of code compactness for encoders and While CBOR's design objective of code compactness for encoders and
decoders is a higher priority than its objective of conciseness on decoders is a higher priority than its objective of conciseness on
the wire, many people focus on the wire size. Table 8 shows some the wire, many people focus on the wire size. Table 8 shows some
encoding examples for the simple nested array [1, [2, 3]]; where some encoding examples for the simple nested array [1, [2, 3]]; where some
form of indefinite-length encoding is supported by the encoding, form of indefinite-length encoding is supported by the encoding,
[_ 1, [2, 3]] (indefinite length on the outer array) is also shown. [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.
+-------------+----------------------------+----------------+ +=============+============================+================+
| Format | [1, [2, 3]] | [_ 1, [2, 3]] | | Format | [1, [2, 3]] | [_ 1, [2, 3]] |
+=============+============================+================+ +=============+============================+================+
| RFC 713 | c2 05 81 c2 02 82 83 | | | RFC 713 | c2 05 81 c2 02 82 83 | |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 | | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 |
| | 02 02 01 03 | 30 06 02 01 02 | | | 02 02 01 03 | 30 06 02 01 02 |
| | | 02 01 03 00 00 | | | | 02 01 03 00 00 |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| MessagePack | 92 01 92 02 03 | | | MessagePack | 92 01 92 02 03 | |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
skipping to change at page 70, line 43 skipping to change at page 73, line 43
This is only an error if the application assumed that the input This is only an error if the application assumed that the input
bytes would span exactly one data item. Where the application bytes would span exactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in additional data after the data item, as is for example done in
CBOR sequences [RFC8742], the CBOR decoder can simply indicate CBOR sequences [RFC8742], the CBOR decoder can simply indicate
what part of the input has not been consumed. what part of the input has not been consumed.
* Too little data: The input data available would need additional * Too little data: The input data available would need additional
bytes added at their end for a complete CBOR data item. This may bytes added at their end for a complete CBOR data item. This may
indicate the input is truncated; it is also a common error when indicate the input is truncated; it is also a common error when
trying to decode random data as CBOR. For some applications trying to decode random data as CBOR. For some applications,
however, this may not actually be an error, as the application may however, this may not actually be an error, as the application may
not be certain it has all the data yet and can obtain or wait for not be certain it has all the data yet and can obtain or wait for
additional input bytes. Some of these applications may have an additional input bytes. Some of these applications may have an
upper limit for how much additional data can show up; here the upper limit for how much additional data can show up; here the
decoder may be able to indicate that the encoded CBOR data item decoder may be able to indicate that the encoded CBOR data item
cannot be completed within this limit. cannot be completed within this limit.
* Syntax error: The input data are not consistent with the * Syntax error: The input data are not consistent with the
requirements of the CBOR encoding, and this cannot be remedied by requirements of the CBOR encoding, and this cannot be remedied by
adding (or removing) data at the end. adding (or removing) data at the end.
In Appendix C, errors of the first kind are addressed in the first In Appendix C, errors of the first kind are addressed in the first
paragraph/bullet list (requiring "no bytes are left"), and errors of paragraph/bullet list (requiring "no bytes are left"), and errors of
the second kind are addressed in the second paragraph/bullet list the second kind are addressed in the second paragraph/bullet list
(failing "if n bytes are no longer available"). Errors of the third (failing "if n bytes are no longer available"). Errors of the third
kind are identified in the pseudocode by specific instances of kind are identified in the pseudocode by specific instances of
calling fail(), in order: calling fail(), in order:
* a reserved value is used for additional information (28, 29, 30) * a reserved value is used for additional information (28, 29, 30)
* major type 7, additional information 24, value < 32 (incorrect or * major type 7, additional information 24, value < 32 (incorrect)
incorrectly encoded simple type)
* incorrect substructure of indefinite length byte/text string (may * incorrect substructure of indefinite length byte/text string (may
only contain definite length strings of the same major type) only contain definite length strings of the same major type)
* "break" stop code (mt=7, ai=31) occurs in a value position of a * "break" stop code (mt=7, ai=31) occurs in a value position of a
map or except at a position directly in an indefinite length item map or except at a position directly in an indefinite length item
where also another enclosed data item could occur where also another enclosed data item could occur
* additional information 31 used with major type 0, 1, or 6 * additional information 31 used with major type 0, 1, or 6
skipping to change at page 72, line 33 skipping to change at page 75, line 33
(syntax error) are shown below. (syntax error) are shown below.
Subkind 1: Subkind 1:
* Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, * Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
fd, fe, fd, fe,
Subkind 2: Subkind 2:
* Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18, * Reserved two-byte encodings of simple values: f8 00, f8 01, f8 18,
f8 1f f8 1f
Subkind 3: Subkind 3:
* Indefinite length string chunks not of the correct type: 5f 00 ff, * Indefinite length string chunks not of the correct type: 5f 00 ff,
5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff,
7f 41 00 ff 7f 41 00 ff
* Indefinite length string chunks not definite length: 5f 5f 41 00 * Indefinite length string chunks not definite length: 5f 5f 41 00
ff ff, 7f 7f 61 00 ff ff ff ff, 7f 7f 61 00 ff ff
skipping to change at page 73, line 25 skipping to change at page 76, line 25
of RFC 7049, with editorial improvements, added detail, and fixed of RFC 7049, with editorial improvements, added detail, and fixed
errata. This document formally obsoletes RFC 7049, while keeping errata. This document formally obsoletes RFC 7049, while keeping
full compatibility of the interchange format from RFC 7049. This full compatibility of the interchange format from RFC 7049. This
document does not create a new version of the format. document does not create a new version of the format.
G.1. Errata processing, clerical changes G.1. Errata processing, clerical changes
The two verified errata on RFC 7049, EID 3764 and EID 3770, concerned The two verified errata on RFC 7049, EID 3764 and EID 3770, concerned
two encoding examples in the text that have been corrected two encoding examples in the text that have been corrected
(Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" -> (Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" ->
"0b000_11001"). Also, RFC 7049 contained an example using the simple "0b000_11001"). Also, RFC 7049 contained an example using the
type value 24 (EID 5917), which is not well-formed; this example has numeric value 24 for a simple value (EID 5917), which is not well-
been removed. Errata report 5763 pointed to an accident in the formed; this example has been removed. Errata report 5763 pointed to
wording of the definition of tags; this was resolved during a re- an accident in the wording of the definition of tags; this was
write of Section 3.4. Errata report 5434 pointed out that the UBJSON resolved during a re-write of Section 3.4. Errata report 5434
example in Appendix E no longer complied with the version of UBJSON pointed out that the UBJSON example in Appendix E no longer complied
current at the time of submitting the report. It turned out that the with the version of UBJSON current at the time of submitting the
UBJSON specification had completely changed since 2013; this example report. It turned out that the UBJSON specification had completely
therefore also was removed. Further errata reports (4409, 4963, changed since 2013; this example therefore also was removed. Further
4964) complained that the map key sorting rules for canonical errata reports (4409, 4963, 4964) complained that the map key sorting
encoding were onerous; these led to a reconsideration of the rules for canonical encoding were onerous; these led to a
canonical encoding suggestions and replacement by the deterministic reconsideration of the canonical encoding suggestions and replacement
encoding suggestions (described below). An editorial suggestion in by the deterministic encoding suggestions (described below). An
errata report 4294 was also implemented (improved symmetry by adding editorial suggestion in errata report 4294 was also implemented
"Second value" to a comment to the last example in Section 3.2.2). (improved symmetry by adding "Second value" to a comment to the last
example in Section 3.2.2).
Other more clerical changes include: Other more clerical changes include:
* use of new RFCXML functionality [RFC7991]; * use of new RFCXML functionality [RFC7991];
* explain some more of the notation used; * explain some more of the notation used;
* updated references, e.g. for RFC4627 to [RFC8259] in many places, * updated references, e.g. for RFC4627 to [RFC8259] in many places,
for CNN-TERMS to [RFC7228]; added missing reference to [IEEE754] for CNN-TERMS to [RFC7228]; added missing reference to [IEEE754]
(importing required definitions) and updated to [ECMA262]; added a (importing required definitions) and updated to [ECMA262]; added a
reference to [RFC8618] that further illustrates the discussion in reference to [RFC8618] that further illustrates the discussion in
Appendix E; Appendix E;
* the discussion of diagnostic notation mentions the "Extended * the discussion of diagnostic notation mentions the "Extended
Diagnostic Notation" (EDN) defined in [RFC8610]; Diagnostic Notation" (EDN) defined in [RFC8610] as well as the gap
diagnostic notation has in representing NaN payloads; an
explanation was added on how to represent indefinite length
strings with no chunks;
* the addition of this appendix. * the addition of this appendix.
G.2. Changes in IANA considerations G.2. Changes in IANA considerations
The IANA considerations were generally updated (clerical changes, The IANA considerations were generally updated (clerical changes,
e.g., now pointing to the CBOR working group as the author of the e.g., now pointing to the CBOR working group as the author of the
specification). References to the respective IANA registries have specification). References to the respective IANA registries have
been added to the informative references. been added to the informative references.
skipping to change at page 74, line 38 skipping to change at page 77, line 41
A significant addition in this revision is Section 2, which discusses A significant addition in this revision is Section 2, which discusses
the CBOR data model and its small variations involved in the the CBOR data model and its small variations involved in the
processing of CBOR. Introducing terms for those (basic generic, processing of CBOR. Introducing terms for those (basic generic,
extended generic, specific) enables more concise language in other extended generic, specific) enables more concise language in other
places of the document, but also helps in clarifying expectations on places of the document, but also helps in clarifying expectations on
implementations and on the extensibility features of the format. implementations and on the extensibility features of the format.
RFC 7049, as a format derived from the JSON ecosystem, was influenced RFC 7049, as a format derived from the JSON ecosystem, was influenced
by the JSON number system that was in turn inherited from JavaScript by the JSON number system that was in turn inherited from JavaScript
at the time. JSON does not provide distinct integers and floating at the time. JSON does not provide distinct integers and floating-
point values (and the latter are decimal in the format). CBOR point values (and the latter are decimal in the format). CBOR
provides binary representations of numbers, which do differ between provides binary representations of numbers, which do differ between
integers and floating point values. Experience from implementation integers and floating-point values. Experience from implementation
and use now suggested that the separation between these two number and use now suggested that the separation between these two number
domains should be more clearly drawn in the document; language that domains should be more clearly drawn in the document; language that
suggested an integer could seamlessly stand in for a floating point suggested an integer could seamlessly stand in for a floating-point
value was removed. Also, a suggestion (based on I-JSON [RFC7493]) value was removed. Also, a suggestion (based on I-JSON [RFC7493])
was added for handling these types when converting JSON to CBOR. was added for handling these types when converting JSON to CBOR, and
the use of a specific rounding mechanism has been recommended.
For a single value in the data model, CBOR often provides multiple For a single value in the data model, CBOR often provides multiple
encoding options. The revision adds a new section Section 4, which encoding options. The revision adds a new section Section 4, which
first introduces the term "preferred serialization" (Section 4.1) and first introduces the term "preferred serialization" (Section 4.1) and
defines it for various kinds of data items. On the basis of this defines it for various kinds of data items. On the basis of this
terminology, the section goes on to discuss how a CBOR-based protocol terminology, the section goes on to discuss how a CBOR-based protocol
can define "deterministic encoding" (Section 4.2), which now avoids can define "deterministic encoding" (Section 4.2), which now avoids
the RFC 7049 terms "canonical" and "canonicalization". The the RFC 7049 terms "canonical" and "canonicalization". The
suggestion of "Core Deterministic Encoding Requirements" suggestion of "Core Deterministic Encoding Requirements"
Section 4.2.1 enables generic support for such protocol-defined Section 4.2.1 enables generic support for such protocol-defined
skipping to change at page 75, line 27 skipping to change at page 78, line 33
as "syntax error", "decoding error" and "strict mode" outside as "syntax error", "decoding error" and "strict mode" outside
examples. Also, a third level of requirements beyond CBOR-level examples. Also, a third level of requirements beyond CBOR-level
validity that an application has on its input data is now explicitly validity that an application has on its input data is now explicitly
called out. Well-formed (processable at all), valid (checked by a called out. Well-formed (processable at all), valid (checked by a
validity-checking generic decoder), and expected input (as checked by validity-checking generic decoder), and expected input (as checked by
the application) are treated as a hierarchy of layers of the application) are treated as a hierarchy of layers of
acceptability. acceptability.
The handling of non-well-formed simple values was clarified in text The handling of non-well-formed simple values was clarified in text
and pseudocode. Appendix F was added to discuss well-formedness and pseudocode. Appendix F was added to discuss well-formedness
errors and provide examples for them. errors and provide examples for them. The pseudocode was updated to
be more portable and some portability considerations were added.
The discussion of validity has been sharpened in two areas. Map The discussion of validity has been sharpened in two areas. Map
validity (handling of duplicate keys) was clarified and the domain of validity (handling of duplicate keys) was clarified and the domain of
applicability of certain implementation choices explained. Also, applicability of certain implementation choices explained. Also,
while streamlining the terminology for tags, tag numbers, and tag while streamlining the terminology for tags, tag numbers, and tag
content, discussion was added on tag validity, and the restrictions content, discussion was added on tag validity, and the restrictions
pwere clarified on tag content, in general and specifically for tag were clarified on tag content, in general and specifically for tag 1.
1.
An implementation note (and note for future tag definitions) was An implementation note (and note for future tag definitions) was
added to Section 3.4 about defining tags with semantics that depend added to Section 3.4 about defining tags with semantics that depend
on serialization order. on serialization order.
Tag 35 is no longer defined in this updated document; the
registration based on the definition in RFC 7049 remains in place.
Terminology was introduced in Section 3 for "argument" and "head", Terminology was introduced in Section 3 for "argument" and "head",
simplifying further discussion. simplifying further discussion.
The security considerations were mostly rewritten and significantly The security considerations were mostly rewritten and significantly
expanded; in multiple other places, the document is now more explicit expanded; in multiple other places, the document is now more explicit
that a decoder cannot simply condone well-formedness errors. that a decoder cannot simply condone well-formedness errors.
Acknowledgements Acknowledgements
CBOR was inspired by MessagePack. MessagePack was developed and CBOR was inspired by MessagePack. MessagePack was developed and
skipping to change at page 76, line 29 skipping to change at page 79, line 33
contributed to the discussion about extending MessagePack to separate contributed to the discussion about extending MessagePack to separate
text string representation from byte string representation. text string representation from byte string representation.
The encoding of the additional information in CBOR was inspired by The encoding of the additional information in CBOR was inspired by
the encoding of length information designed by Klaus Hartke for CoAP. the encoding of length information designed by Klaus Hartke for CoAP.
This document also incorporates suggestions made by many people, This document also incorporates suggestions made by many people,
notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand, notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand,
Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael
Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray
Polk, Tim Bray, Tony Finch, Tony Hansen, and Yaron Sheffer. Polk, Stuart Cheshire, Tim Bray, Tony Finch, Tony Hansen, and Yaron
Sheffer. Benjamin Kaduk provided an extensive review during IESG
processing. Éric Vyncke, Erik Kline, Robert Wilton, and Roman Danyliw
provided further IESG comments, which included an IoT directorate
review by Eve Schooler.
Authors' Addresses Authors' Addresses
Carsten Bormann Carsten Bormann
Universitaet Bremen TZI Universitaet Bremen TZI
Postfach 330440 Postfach 330440
D-28359 Bremen D-28359 Bremen
Germany Germany
Phone: +49-421-218-63921 Phone: +49-421-218-63921
 End of changes. 198 change blocks. 
418 lines changed or deleted 546 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/