draft-ietf-cbor-7049bis-12.txt   draft-ietf-cbor-7049bis-13.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI Internet-Draft Universitaet Bremen TZI
Obsoletes: 7049 (if approved) P. Hoffman Obsoletes: 7049 (if approved) P. Hoffman
Intended status: Standards Track ICANN Intended status: Standards Track ICANN
Expires: 20 June 2020 18 December 2019 Expires: 9 September 2020 8 March 2020
Concise Binary Object Representation (CBOR) Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-12 draft-ietf-cbor-7049bis-13
Abstract Abstract
The Concise Binary Object Representation (CBOR) is a data format The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack. earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial This document is a revised edition of RFC 7049, with editorial
skipping to change at page 1, line 38 skipping to change at page 1, line 38
This document is being worked on in the CBOR Working Group. Please This document is being worked on in the CBOR Working Group. Please
contribute on the mailing list there, or in the GitHub repository for contribute on the mailing list there, or in the GitHub repository for
this draft: https://github.com/cbor-wg/CBORbis this draft: https://github.com/cbor-wg/CBORbis
The charter for the CBOR Working Group says that the WG will update The charter for the CBOR Working Group says that the WG will update
RFC 7049 to fix verified errata. Security issues and clarifications RFC 7049 to fix verified errata. Security issues and clarifications
may be addressed, but changes to this document will ensure backward may be addressed, but changes to this document will ensure backward
compatibility for popular deployed codebases. This document will be compatibility for popular deployed codebases. This document will be
targeted at becoming an Internet Standard. targeted at becoming an Internet Standard.
[RFC editor: please remove this note.]
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 20 June 2020. This Internet-Draft will expire on 9 September 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 10
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16
3.3. Floating-Point Numbers and Values with No Content . . . . 16 3.2.4. Summary of indefinite-length use of major types . . . 17
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 18 3.3. Floating-Point Numbers and Values with No Content . . . . 17
3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 20 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 19
3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 20 3.4.1. Standard Date/Time String . . . . . . . . . . . . . . 22
3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.2. Epoch-based Date/Time . . . . . . . . . . . . . . . . 22
3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 22 3.4.3. Bignums . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 23 3.4.4. Decimal Fractions and Bigfloats . . . . . . . . . . . 24
3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 23 3.4.5. Content Hints . . . . . . . . . . . . . . . . . . . . 25
3.4.5.1. Encoded CBOR Data Item . . . . . . . . . . . . . 25
3.4.5.2. Expected Later Encoding for CBOR-to-JSON 3.4.5.2. Expected Later Encoding for CBOR-to-JSON
Converters . . . . . . . . . . . . . . . . . . . . 24 Converters . . . . . . . . . . . . . . . . . . . . 25
3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 24 3.4.5.3. Encoded Text . . . . . . . . . . . . . . . . . . 26
3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 25 3.4.6. Self-Described CBOR . . . . . . . . . . . . . . . . . 27
4. Serialization Considerations . . . . . . . . . . . . . . . . 26
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 26 4. Serialization Considerations . . . . . . . . . . . . . . . . 28
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 27 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 28
4.2.1. Core Deterministic Encoding Requirements . . . . . . 27 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 29
4.2.2. Additional Deterministic Encoding Considerations . . 28 4.2.1. Core Deterministic Encoding Requirements . . . . . . 29
4.2.3. Length-first map key ordering . . . . . . . . . . . . 30 4.2.2. Additional Deterministic Encoding Considerations . . 30
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 31 4.2.3. Length-first Map Key Ordering . . . . . . . . . . . . 32
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 31 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 33
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 32 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 33
5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 32 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 34
5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 33 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 35
5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 33 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 35
5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 34 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 35
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4. Validity and Evolution . . . . . . . . . . . . . . . . . 36
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 35 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 36 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 38
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 37 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 39
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 38 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 40
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 38 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 40
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 39 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 41
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 40 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 42
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 41 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 43
7.2. Curating the Additional Information Space . . . . . . . . 41 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 43
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 42 7.2. Curating the Additional Information Space . . . . . . . . 44
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 43 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 45
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 46
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 44 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 44 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 47
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 45 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 47
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 45 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 47
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 46 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 48
10. Security Considerations . . . . . . . . . . . . . . . . . . . 47 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 49
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 48 10. Security Considerations . . . . . . . . . . . . . . . . . . . 50
11.1. Normative References . . . . . . . . . . . . . . . . . . 48 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 52
11.2. Informative References . . . . . . . . . . . . . . . . . 50 11.1. Normative References . . . . . . . . . . . . . . . . . . 52
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 51 11.2. Informative References . . . . . . . . . . . . . . . . . 53
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 55 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 55
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 58 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 59
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 61 Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 62
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 65
Appendix E. Comparison of Other Binary Formats to CBOR's Design Appendix E. Comparison of Other Binary Formats to CBOR's Design
Objectives . . . . . . . . . . . . . . . . . . . . . . . 62 Objectives . . . . . . . . . . . . . . . . . . . . . . . 66
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 63 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 67
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 63 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 67
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 64 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 68
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 64 E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 68
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 64 E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 68
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 65 Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 69
Appendix G. Well-formedness errors and examples . . . . . . . . 65 Appendix G. Well-formedness errors and examples . . . . . . . . 70
G.1. Examples for CBOR data items that are not well-formed . . 66 G.1. Examples for CBOR data items that are not well-formed . . 71
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 68 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 73
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74
1. Introduction 1. Introduction
There are hundreds of standardized formats for binary representation There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER [ASN.1]. formats in the latter category are ASN.1's BER and DER [ASN.1].
The format defined here follows some specific design goals that are The format defined here follows some specific design goals that are
skipping to change at page 5, line 25 skipping to change at page 5, line 29
3. Data must be able to be decoded without a schema description. 3. Data must be able to be decoded without a schema description.
* Similar to JSON, encoded data should be self-describing so * Similar to JSON, encoded data should be self-describing so
that a generic decoder can be written. that a generic decoder can be written.
4. The serialization must be reasonably compact, but data 4. The serialization must be reasonably compact, but data
compactness is secondary to code compactness for the encoder and compactness is secondary to code compactness for the encoder and
decoder. decoder.
* "Reasonable" here is bounded by JSON as an upper bound in * "Reasonable" here is bounded by JSON as an upper bound in
size, and by implementation complexity maintaining a lower size, and by the implementation complexity limiting how much
bound. Using either general compression schemes or extensive effort can go into achieving that compactness. Using either
bit-fiddling violates the complexity goals. general compression schemes or extensive bit-fiddling violates
the complexity goals.
5. The format must be applicable to both constrained nodes and high- 5. The format must be applicable to both constrained nodes and high-
volume applications. volume applications.
* This means it must be reasonably frugal in CPU usage for both * This means it must be reasonably frugal in CPU usage for both
encoding and decoding. This is relevant both for constrained encoding and decoding. This is relevant both for constrained
nodes and for potential usage in applications with a very high nodes and for potential usage in applications with a very high
volume of data. volume of data.
6. The format must support all JSON data types for conversion to and 6. The format must support all JSON data types for conversion to and
skipping to change at page 6, line 48 skipping to change at page 7, line 4
Data Stream: A sequence of zero or more data items, not further Data Stream: A sequence of zero or more data items, not further
assembled into a larger containing data item. The independent assembled into a larger containing data item. The independent
data items that make up a data stream are sometimes also referred data items that make up a data stream are sometimes also referred
to as "top-level data items". to as "top-level data items".
Well-formed: A data item that follows the syntactic structure of Well-formed: A data item that follows the syntactic structure of
CBOR. A well-formed data item uses the initial bytes and the byte CBOR. A well-formed data item uses the initial bytes and the byte
strings and/or data items that are implied by their values as strings and/or data items that are implied by their values as
defined in CBOR and does not include following extraneous data. defined in CBOR and does not include following extraneous data.
CBOR decoders by definition only return contents from well-formed CBOR decoders by definition only return contents from well-formed
data items. data items.
Valid: A data item that is well-formed and also follows the semantic Valid: A data item that is well-formed and also follows the semantic
restrictions that apply to CBOR data items. restrictions that apply to CBOR data items (Section 5.3).
Expected: Besides its normal English meaning, the term "expected" is Expected: Besides its normal English meaning, the term "expected" is
used to describe requirements beyond CBOR validity that an used to describe requirements beyond CBOR validity that an
application has on its input data. Well-formed (processable at application has on its input data. Well-formed (processable at
all), valid (checked by a validity-checking generic decoder), and all), valid (checked by a validity-checking generic decoder), and
expected (checked by the application) form a hierarchy of layers expected (checked by the application) form a hierarchy of layers
of acceptability. of acceptability.
Stream decoder: A process that decodes a data stream and makes each Stream decoder: A process that decodes a data stream and makes each
of the data items in the sequence available to an application as of the data items in the sequence available to an application as
they are received. they are received.
Terms and concepts for floating-point values such as Infinity, NaN
(not a number), negative zero, and subnormal are defined in
[IEEE754].
Where bit arithmetic or data types are explained, this document uses Where bit arithmetic or data types are explained, this document uses
the notation familiar from the programming language C, except that the notation familiar from the programming language C, except that
"**" denotes exponentiation. Similar to the "0x" notation for "**" denotes exponentiation. Similar to the "0x" notation for
hexadecimal numbers, numbers in binary notation are prefixed with hexadecimal numbers, numbers in binary notation are prefixed with
"0b". Underscores can be added to a number solely for readability, "0b". Underscores can be added to a number solely for readability,
so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the so 0b00100001 (0x21) might be written 0b001_00001 to emphasize the
desired interpretation of the bits in the byte; in this case, it is desired interpretation of the bits in the byte; in this case, it is
split into three bits and five bits. Encoded CBOR data items are split into three bits and five bits. Encoded CBOR data items are
sometimes given in the "0x" or "0b" notation; these values are first sometimes given in the "0x" or "0b" notation; these values are first
interpreted as numbers as in C and are then interpreted as byte interpreted as numbers as in C and are then interpreted as byte
strings in network byte order, including any leading zero bytes strings in network byte order, including any leading zero bytes
expressed in the notation. expressed in the notation.
Words may be _italicized_ for emphasis; in the plain text form of
this specification this is indicated by surrounding words with
underscore characters. Verbatim text (e.g., names from a programming
language) may be set in "monospace" type; in plain text this is
approximated somewhat ambiguously by surrounding the text in double
quotes (which also retain their usual meaning).
2. CBOR Data Models 2. CBOR Data Models
CBOR is explicit about its generic data model, which defines the set CBOR is explicit about its generic data model, which defines the set
of all data items that can be represented in CBOR. Its basic generic of all data items that can be represented in CBOR. Its basic generic
data model is extensible by the registration of simple type values data model is extensible by the registration of simple type values
and tags. Applications can then subset the resulting extended and tags. Applications can then subset the resulting extended
generic data model to build their specific data models. generic data model to build their specific data models.
Within environments that can represent the data items in the generic Within environments that can represent the data items in the generic
data model, generic CBOR encoders and decoders can be implemented data model, generic CBOR encoders and decoders can be implemented
skipping to change at page 8, line 4 skipping to change at page 8, line 17
(which usually involves defining additional implementation data types (which usually involves defining additional implementation data types
for those data items that do not already have a natural for those data items that do not already have a natural
representation in the environment). The ability to provide generic representation in the environment). The ability to provide generic
encoders and decoders is an explicit design goal of CBOR; however encoders and decoders is an explicit design goal of CBOR; however
many applications will provide their own application-specific many applications will provide their own application-specific
encoders and/or decoders. encoders and/or decoders.
In the basic (un-extended) generic data model, a data item is one of: In the basic (un-extended) generic data model, a data item is one of:
* an integer in the range -2**64..2**64-1 inclusive * an integer in the range -2**64..2**64-1 inclusive
* a simple value, identified by a number between 0 and 255, but * a simple value, identified by a number between 0 and 255, but
distinct from that number distinct from that number itself
* a floating-point value, distinct from an integer, out of the set * a floating-point value, distinct from an integer, out of the set
representable by IEEE 754 binary64 (including non-finites) representable by IEEE 754 binary64 (including non-finites)
[IEEE754] [IEEE754]
* a sequence of zero or more bytes ("byte string") * a sequence of zero or more bytes ("byte string")
* a sequence of zero or more Unicode code points ("text string") * a sequence of zero or more Unicode code points ("text string")
* a sequence of zero or more data items ("array") * a sequence of zero or more data items ("array")
* a mapping (mathematical function) from zero or more data items * a mapping (mathematical function) from zero or more data items
("keys") each to a data item ("values"), ("map") ("keys") each to a data item ("values"), ("map")
* a tagged data item ("tag"), comprising a tag number (an integer in * a tagged data item ("tag"), comprising a tag number (an integer in
the range 0..2**64-1) and a tagged value (a data item) the range 0..2**64-1) and the tag content (a data item)
Note that integer and floating-point values are distinct in this Note that integer and floating-point values are distinct in this
model, even if they have the same numeric value. model, even if they have the same numeric value.
Also note that serialization variants, such as the number of bytes of Also note that serialization variants, such as the number of bytes of
the encoded floating value, or the choice of one of the ways in which the encoded floating-point value, or the choice of one of the ways in
an integer, the length of a text or byte string, the number of which an integer, the length of a text or byte string, the number of
elements in an array or pairs in a map, or a tag number, elements in an array or pairs in a map, or a tag number,
(collectively "the argument", see Section 3) can be encoded, are not (collectively "the argument", see Section 3) can be encoded, are not
visible at the generic data model level. visible at the generic data model level.
2.1. Extended Generic Data Models 2.1. Extended Generic Data Models
This basic generic data model comes pre-extended by the registration This basic generic data model comes pre-extended by the registration
of a number of simple values and tag numbers right in this document, of a number of simple values and tag numbers right in this document,
such as: such as:
skipping to change at page 9, line 45 skipping to change at page 10, line 11
representations of integral values are equivalent, using both map representations of integral values are equivalent, using both map
keys "0" and "0.0" in a single map would be considered duplicates, keys "0" and "0.0" in a single map would be considered duplicates,
even while encoded as different major types, and so invalid; and an even while encoded as different major types, and so invalid; and an
encoder could encode integral-valued floats as integers or vice encoder could encode integral-valued floats as integers or vice
versa, perhaps to save encoded bytes. versa, perhaps to save encoded bytes.
3. Specification of the CBOR Encoding 3. Specification of the CBOR Encoding
A CBOR data item (Section 2) is encoded to or decoded from a byte A CBOR data item (Section 2) is encoded to or decoded from a byte
string carrying a well-formed encoded data item as described in this string carrying a well-formed encoded data item as described in this
section. The encoding is summarized in Table 6, indexed by the section. The encoding is summarized in Table 7, indexed by the
initial byte. An encoder MUST produce only well-formed encoded data initial byte. An encoder MUST produce only well-formed encoded data
items. A decoder MUST NOT return a decoded data item when it items. A decoder MUST NOT return a decoded data item when it
encounters input that is not a well-formed encoded CBOR data item encounters input that is not a well-formed encoded CBOR data item
(this does not detract from the usefulness of diagnostic and recovery (this does not detract from the usefulness of diagnostic and recovery
tools that might make available some information from a damaged tools that might make available some information from a damaged
encoded CBOR data item). encoded CBOR data item).
The initial byte of each encoded data item contains both information The initial byte of each encoded data item contains both information
about the major type (the high-order 3 bits, described in about the major type (the high-order 3 bits, described in
Section 3.1) and additional information (the low-order 5 bits). With Section 3.1) and additional information (the low-order 5 bits). With
skipping to change at page 10, line 49 skipping to change at page 11, line 16
If the encoded sequence of bytes ends before the end of a data item, If the encoded sequence of bytes ends before the end of a data item,
that item is not well-formed. If the encoded sequence of bytes still that item is not well-formed. If the encoded sequence of bytes still
has bytes remaining after the outermost encoded item is decoded, that has bytes remaining after the outermost encoded item is decoded, that
encoding is not a single well-formed CBOR item; depending on the encoding is not a single well-formed CBOR item; depending on the
application, the decoder may either treat the encoding as not well- application, the decoder may either treat the encoding as not well-
formed or just identify the start of the remaining bytes to the formed or just identify the start of the remaining bytes to the
application. application.
A CBOR decoder implementation can be based on a jump table with all A CBOR decoder implementation can be based on a jump table with all
256 defined values for the initial byte (Table 6). A decoder in a 256 defined values for the initial byte (Table 7). A decoder in a
constrained implementation can instead use the structure of the constrained implementation can instead use the structure of the
initial byte and following bytes for more compact code (see initial byte and following bytes for more compact code (see
Appendix C for a rough impression of how this could look). Appendix C for a rough impression of how this could look).
3.1. Major Types 3.1. Major Types
The following lists the major types and the additional information The following lists the major types and the additional information
and other bytes associated with the type. and other bytes associated with the type.
Major type 0: an integer in the range 0..2**64-1 inclusive. The Major type 0: an integer in the range 0..2**64-1 inclusive. The
skipping to change at page 11, line 45 skipping to change at page 12, line 13
formed but invalid. This type is provided for systems that need formed but invalid. This type is provided for systems that need
to interpret or display human-readable text, and allows the to interpret or display human-readable text, and allows the
differentiation between unstructured bytes and text that has a differentiation between unstructured bytes and text that has a
specified repertoire and encoding. In contrast to formats such as specified repertoire and encoding. In contrast to formats such as
JSON, the Unicode characters in this type are never escaped. JSON, the Unicode characters in this type are never escaped.
Thus, a newline character (U+000A) is always represented in a Thus, a newline character (U+000A) is always represented in a
string as the byte 0x0a, and never as the bytes 0x5c6e (the string as the byte 0x0a, and never as the bytes 0x5c6e (the
characters "\" and "n") or as 0x5c7530303061 (the characters "\", characters "\" and "n") or as 0x5c7530303061 (the characters "\",
"u", "0", "0", "0", and "a"). "u", "0", "0", "0", and "a").
Major type 4: an array of data items. Arrays are also called lists, Major type 4: an array of data items. In other formats, arrays are
sequences, or tuples. The argument is the number of data items in also called lists, sequences, or tuples (a "CBOR sequence" is
the array. Items in an array do not need to all be of the same something slightly different, though [RFC8742]). The argument is
type. For example, an array that contains 10 items of any type the number of data items in the array. Items in an array do not
would have an initial byte of 0b100_01010 (major type of 4, need to all be of the same type. For example, an array that
additional information of 10 for the length) followed by the 10 contains 10 items of any type would have an initial byte of
remaining items. 0b100_01010 (major type of 4, additional information of 10 for the
length) followed by the 10 remaining items.
Major type 5: a map of pairs of data items. Maps are also called Major type 5: a map of pairs of data items. Maps are also called
tables, dictionaries, hashes, or objects (in JSON). A map is tables, dictionaries, hashes, or objects (in JSON). A map is
comprised of pairs of data items, each pair consisting of a key comprised of pairs of data items, each pair consisting of a key
that is immediately followed by a value. The argument is the that is immediately followed by a value. The argument is the
number of _pairs_ of data items in the map. For example, a map number of _pairs_ of data items in the map. For example, a map
that contains 9 pairs would have an initial byte of 0b101_01001 that contains 9 pairs would have an initial byte of 0b101_01001
(major type of 5, additional information of 9 for the number of (major type of 5, additional information of 9 for the number of
pairs) followed by the 18 remaining items. The first item is the pairs) followed by the 18 remaining items. The first item is the
first key, the second item is the first value, the third item is first key, the second item is the first value, the third item is
the second key, and so on. Because items in a map come in pairs, the second key, and so on. Because items in a map come in pairs,
their total number is always even: A map that contains an odd their total number is always even: A map that contains an odd
number of items (no value data present after the last key data number of items (no value data present after the last key data
item) is not well-formed. A map that has duplicate keys may be item) is not well-formed. A map that has duplicate keys may be
well-formed, but it is not valid, and thus it causes indeterminate well-formed, but it is not valid, and thus it causes indeterminate
decoding; see also Section 5.6. decoding; see also Section 5.6.
Major type 6: a tagged data item ("tag") whose tag number is the Major type 6: a tagged data item ("tag") whose tag number, an
argument and whose enclosed data item ("tag content") is the integer in the range 0..2**64-1 inclusive, is the argument and
single encoded data item that follows the head. See Section 3.4. whose enclosed data item ("tag content") is the single encoded
data item that follows the head. See Section 3.4.
Major type 7: floating-point numbers and simple values, as well as Major type 7: floating-point numbers and simple values, as well as
the "break" stop code. See Section 3.3. the "break" stop code. See Section 3.3.
These eight major types lead to a simple table showing which of the These eight major types lead to a simple table showing which of the
256 possible values for the initial byte of a data item are used 256 possible values for the initial byte of a data item are used
(Table 6). (Table 7).
In major types 6 and 7, many of the possible values are reserved for In major types 6 and 7, many of the possible values are reserved for
future specification. See Section 9 for more information on these future specification. See Section 9 for more information on these
values. values.
Table 1 summarizes the major types defined by CBOR, ignoring the next Table 1 summarizes the major types defined by CBOR, ignoring the next
section for now. The number N in this table stands for the argument, section for now. The number N in this table stands for the argument,
mt for the major type. mt for the major type.
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
skipping to change at page 13, line 25 skipping to change at page 13, line 31
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 4 | array | N data items (elements) | | 4 | array | N data items (elements) |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 5 | map | 2N data items (key/value pairs) | | 5 | map | 2N data items (key/value pairs) |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 6 | tag of number N | 1 data item | | 6 | tag of number N | 1 data item |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
| 7 | simple/float | - | | 7 | simple/float | - |
+----+-----------------------+---------------------------------+ +----+-----------------------+---------------------------------+
Table 1: Overview over CBOR major types (definite length Table 1: Overview over the definite-length use of CBOR major
encoded) types (mt = major type, N = argument)
3.2. Indefinite Lengths for Some Major Types 3.2. Indefinite Lengths for Some Major Types
Four CBOR items (arrays, maps, byte strings, and text strings) can be Four CBOR items (arrays, maps, byte strings, and text strings) can be
encoded with an indefinite length using additional information value encoded with an indefinite length using additional information value
31. This is useful if the encoding of the item needs to begin before 31. This is useful if the encoding of the item needs to begin before
the number of items inside the array or map, or the total length of the number of items inside the array or map, or the total length of
the string, is known. (The application of this is often referred to the string, is known. (The ability to start sending a data item
as "streaming" within a data item.) before all of it is known is often referred to as "streaming" within
that data item.)
Indefinite-length arrays and maps are dealt with differently than Indefinite-length arrays and maps are dealt with differently than
indefinite-length byte strings and text strings. indefinite-length byte strings and text strings.
3.2.1. The "break" Stop Code 3.2.1. The "break" Stop Code
The "break" stop code is encoded with major type 7 and additional The "break" stop code is encoded with major type 7 and additional
information value 31 (0b111_11111). It is not itself a data item: it information value 31 (0b111_11111). It is not itself a data item: it
is just a syntactic feature to close an indefinite-length item. is just a syntactic feature to close an indefinite-length item.
skipping to change at page 16, line 24 skipping to change at page 16, line 34
chunks, while not particularly useful, are permitted.) chunks, while not particularly useful, are permitted.)
If any item between the indefinite-length string indicator If any item between the indefinite-length string indicator
(0b010_11111 or 0b011_11111) and the "break" stop code is not a (0b010_11111 or 0b011_11111) and the "break" stop code is not a
definite-length string item of the same major type, the string is not definite-length string item of the same major type, the string is not
well-formed. well-formed.
If any definite-length text string inside an indefinite-length text If any definite-length text string inside an indefinite-length text
string is invalid, the indefinite-length text string is invalid. string is invalid, the indefinite-length text string is invalid.
Note that this implies that the bytes of a single UTF-8 character Note that this implies that the bytes of a single UTF-8 character
cannot be spread between chunks: a new chunk can only be started at a cannot be split up between chunks: a new chunk of a text string can
character boundary. only be started at a character boundary.
For example, assume the sequence: For example, assume an encoded data item consisting of the bytes:
0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
5F -- Start indefinite-length byte string 5F -- Start indefinite-length byte string
44 -- Byte string of length 4 44 -- Byte string of length 4
aabbccdd -- Bytes content aabbccdd -- Bytes content
43 -- Byte string of length 3 43 -- Byte string of length 3
eeff99 -- Bytes content eeff99 -- Bytes content
FF -- "break" FF -- "break"
After decoding, this results in a single byte string with seven After decoding, this results in a single byte string with seven
bytes: 0xaabbccddeeff99. bytes: 0xaabbccddeeff99.
3.2.4. Summary of indefinite-length use of major types
Table 2 summarizes the major types defined by CBOR as used for
indefinite length encoding (with additional information set to 31).
mt stands for the major type.
+----+-------------------+----------------------------------+
| mt | Meaning | enclosed up to "break" stop code |
+====+===================+==================================+
| 0 | (not well-formed) | - |
+----+-------------------+----------------------------------+
| 1 | (not well-formed) | - |
+----+-------------------+----------------------------------+
| 2 | byte string | definite-length byte strings |
+----+-------------------+----------------------------------+
| 3 | text string | definite-length text strings |
+----+-------------------+----------------------------------+
| 4 | array | data items (elements) |
+----+-------------------+----------------------------------+
| 5 | map | data items (key/value pairs) |
+----+-------------------+----------------------------------+
| 6 | (not well-formed) | - |
+----+-------------------+----------------------------------+
| 7 | "break" stop code | - |
+----+-------------------+----------------------------------+
Table 2: Overview over the indefinite-length use of CBOR
major types (mt = major type, additional information =
31)
3.3. Floating-Point Numbers and Values with No Content 3.3. Floating-Point Numbers and Values with No Content
Major type 7 is for two types of data: floating-point numbers and Major type 7 is for two types of data: floating-point numbers and
"simple values" that do not need any content. Each value of the "simple values" that do not need any content. Each value of the
5-bit additional information in the initial byte has its own separate 5-bit additional information in the initial byte has its own separate
meaning, as defined in Table 2. Like the major types for integers, meaning, as defined in Table 3. Like the major types for integers,
items of this major type do not carry content data; all the items of this major type do not carry content data; all the
information is in the initial bytes. information is in the initial bytes.
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 5-Bit Value | Semantics | | 5-Bit Value | Semantics |
+=============+===================================================+ +=============+===================================================+
| 0..23 | Simple value (value 0..23) | | 0..23 | Simple value (value 0..23) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 24 | Simple value (value 32..255 in following byte) | | 24 | Simple value (value 32..255 in following byte) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
skipping to change at page 17, line 24 skipping to change at page 18, line 24
| 26 | IEEE 754 Single-Precision Float (32 bits follow) | | 26 | IEEE 754 Single-Precision Float (32 bits follow) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 27 | IEEE 754 Double-Precision Float (64 bits follow) | | 27 | IEEE 754 Double-Precision Float (64 bits follow) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 28-30 | Reserved, not well-formed in the present document | | 28-30 | Reserved, not well-formed in the present document |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
| 31 | "break" stop code for indefinite-length items | | 31 | "break" stop code for indefinite-length items |
| | (Section 3.2.1) | | | (Section 3.2.1) |
+-------------+---------------------------------------------------+ +-------------+---------------------------------------------------+
Table 2: Values for Additional Information in Major Type 7 Table 3: Values for Additional Information in Major Type 7
As with all other major types, the 5-bit value 24 signifies a single- As with all other major types, the 5-bit value 24 signifies a single-
byte extension: it is followed by an additional byte to represent the byte extension: it is followed by an additional byte to represent the
simple value. (To minimize confusion, only the values 32 to 255 are simple value. (To minimize confusion, only the values 32 to 255 are
used.) This maintains the structure of the initial bytes: as for the used.) This maintains the structure of the initial bytes: as for the
other major types, the length of these always depends on the other major types, the length of these always depends on the
additional information in the first byte. Table 3 lists the values additional information in the first byte. Table 4 lists the values
assigned and available for simple types. assigned and available for simple types.
+---------+-----------------+ +---------+-----------------+
| Value | Semantics | | Value | Semantics |
+=========+=================+ +=========+=================+
| 0..19 | (Unassigned) | | 0..19 | (Unassigned) |
+---------+-----------------+ +---------+-----------------+
| 20 | False | | 20 | False |
+---------+-----------------+ +---------+-----------------+
| 21 | True | | 21 | True |
+---------+-----------------+ +---------+-----------------+
| 22 | Null | | 22 | Null |
+---------+-----------------+ +---------+-----------------+
| 23 | Undefined value | | 23 | Undefined value |
+---------+-----------------+ +---------+-----------------+
| 24..31 | (Reserved) | | 24..31 | (Reserved) |
+---------+-----------------+ +---------+-----------------+
| 32..255 | (Unassigned) | | 32..255 | (Unassigned) |
+---------+-----------------+ +---------+-----------------+
Table 3: Simple Values Table 4: Simple Values
An encoder MUST NOT issue two-byte sequences that start with 0xf8 An encoder MUST NOT issue two-byte sequences that start with 0xf8
(major type = 7, additional information = 24) and continue with a (major type = 7, additional information = 24) and continue with a
byte less than 0x20 (32 decimal). Such sequences are not well- byte less than 0x20 (32 decimal). Such sequences are not well-
formed. (This implies that an encoder cannot encode false, true, formed. (This implies that an encoder cannot encode false, true,
null, or undefined in two-byte sequences, only the one-byte variants null, or undefined in two-byte sequences, only the one-byte variants
of these are well-formed.) of these are well-formed.)
The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
IEEE 754 binary floating-point values [IEEE754]. These floating- IEEE 754 binary floating-point values [IEEE754]. These floating-
point values are encoded in the additional bytes of the appropriate point values are encoded in the additional bytes of the appropriate
size. (See Appendix D for some information about 16-bit floating size. (See Appendix D for some information about 16-bit floating-
point.) point numbers.)
3.4. Tagging of Items 3.4. Tagging of Items
In CBOR, a data item can be enclosed by a tag to give it additional In CBOR, a data item can be enclosed by a tag to give it some
semantics while retaining its structure. The tag is major type 6, additional semantics, as uniquely identified by a "tag number". The
and represents an unsigned integer as indicated by the tag's argument tag is major type 6, its argument (Section 3) indicates the tag
(Section 3); the (sole) enclosed data item is carried as content number, and it contains a single enclosed data item, the "tag
data. If a tag requires structured data, this structure is encoded content". (If a tag requires further structure to its content, this
into the nested data item. The definition of a tag number usually structure is provided by the enclosed data item.) We use the term
restricts what kinds of nested data item or items are valid for tags "tag" for the entire data item consisting of both a tag number and
using this tag number. the tag content: the tag content is the data item that is being
tagged.
For example, assume that a byte string of length 12 is marked with a For example, assume that a byte string of length 12 is marked with a
tag of number 2 to indicate it is a positive bignum (Section 3.4.3). tag of number 2 to indicate it is a positive "bignum"
This would be marked as 0b110_00010 (major type 6, additional (Section 3.4.3). The encoded data item would start with a byte
information 2 for the tag number) followed by 0b010_01100 (major type 0b110_00010 (major type 6, additional information 2 for the tag
number) followed by the encoded tag content: 0b010_01100 (major type
2, additional information of 12 for the length) followed by the 12 2, additional information of 12 for the length) followed by the 12
bytes of the bignum. bytes of the bignum.
The definition of a tag number describes the additional semantics
conveyed for tags with this tag number in the extended generic data
model. These semantics may include equivalence of some tagged data
items with other data items, including some that can already be
represented in the basic generic data model. For instance, 0xc24101,
a bignum the tag content of which is the byte string with the single
byte 0x01, is equivalent to an integer 1, which could also be encoded
for instance as 0x01, 0x1801, or 0x190001. The tag definition may
include the definition of a preferred serialization (Section 4.1)
that is recommended for generic encoders; this may prefer basic
generic data model representations over ones that employ a tag.
The tag definition usually restricts what kinds of nested data item
or items are valid for such tags. Tag definitions may restrict their
content to a very specific syntactic structure, as the tags defined
in this document do, or they may aim at a more semantically defined
definition of their content, as for instance tags 40 and 1040 do
[rfc8746]: These accept a number of different ways of representing
arrays.
As a matter of convention, many tags do not accept null or undefined
values as tag content; instead, the expectation is that a null or
undefined value can be used in place of the entire tag; Section 3.4.2
provides some further considerations for one specific tag about the
handling of this convention in application protocols and in mapping
to platform types.
Decoders do not need to understand tags of every tag number, and tags Decoders do not need to understand tags of every tag number, and tags
may be of little value in applications where the implementation may be of little value in applications where the implementation
creating a particular CBOR data item and the implementation decoding creating a particular CBOR data item and the implementation decoding
that stream know the semantic meaning of each item in the data flow. that stream know the semantic meaning of each item in the data flow.
Their primary purpose in this specification is to define common data Their primary purpose in this specification is to define common data
types such as dates. A secondary purpose is to provide conversion types such as dates. A secondary purpose is to provide conversion
hints when it is foreseen that the CBOR data item needs to be hints when it is foreseen that the CBOR data item needs to be
translated into a different format, requiring hints about the content translated into a different format, requiring hints about the content
of items. Understanding the semantics of tags is optional for a of items. Understanding the semantics of tags is optional for a
decoder; it can just jump over the initial bytes of the tag (that decoder; it can simply present both the tag number and the tag
encode the tag number) and interpret the tag content itself, content to the application, without interpreting the additional
presenting both tag number and tag content to the application. semantics of the tag.
A tag applies semantics to the data item it encloses. Thus, if tag A A tag applies semantics to the data item it encloses. Tags can nest:
encloses tag B, which encloses data item C, tag A applies to the If tag A encloses tag B, which encloses data item C, tag A applies to
result of applying tag B on data item C. That is, a tag is a data the result of applying tag B on data item C.
item consisting of a tag number and an enclosed value. The content
of the tag (the enclosed data item) is the data item (the value) that
is being tagged.
IANA maintains a registry of tag numbers as described in Section 9.2. IANA maintains a registry of tag numbers as described in Section 9.2.
Table 4 provides a list of tag numbers that were defined in Table 5 provides a list of tag numbers that were defined in
[RFC7049], with definitions in the rest of this section. Note that [RFC7049], with definitions in the rest of this section. Note that
many other tag numbers have been defined since the publication of many other tag numbers have been defined since the publication of
[RFC7049]; see the registry described at Section 9.2 for the complete [RFC7049]; see the registry described at Section 9.2 for the complete
list. list.
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| Tag Number | Data Item | Semantics | | Tag Number | Data Item | Semantics |
+============+=============+==================================+ +============+=============+==================================+
| 0 | text string | Standard date/time string; see | | 0 | text string | Standard date/time string; see |
| | | Section 3.4.1 | | | | Section 3.4.1 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 1 | multiple | Epoch-based date/time; see | | 1 | integer or | Epoch-based date/time; see |
| | | Section 3.4.2 | | | float | Section 3.4.2 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 2 | byte string | Positive bignum; see | | 2 | byte string | Positive bignum; see |
| | | Section 3.4.3 | | | | Section 3.4.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 3 | byte string | Negative bignum; see | | 3 | byte string | Negative bignum; see |
| | | Section 3.4.3 | | | | Section 3.4.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 4 | array | Decimal fraction; see | | 4 | array | Decimal fraction; see |
| | | Section 3.4.4 | | | | Section 3.4.4 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 5 | array | Bigfloat; see Section 3.4.4 | | 5 | array | Bigfloat; see Section 3.4.4 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 21 | multiple | Expected conversion to base64url | | 21 | (any) | Expected conversion to base64url |
| | | encoding; see Section 3.4.5.2 | | | | encoding; see Section 3.4.5.2 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 22 | multiple | Expected conversion to base64 | | 22 | (any) | Expected conversion to base64 |
| | | encoding; see Section 3.4.5.2 | | | | encoding; see Section 3.4.5.2 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 23 | multiple | Expected conversion to base16 | | 23 | (any) | Expected conversion to base16 |
| | | encoding; see Section 3.4.5.2 | | | | encoding; see Section 3.4.5.2 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 24 | byte string | Encoded CBOR data item; see | | 24 | byte string | Encoded CBOR data item; see |
| | | Section 3.4.5.1 | | | | Section 3.4.5.1 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 32 | text string | URI; see Section 3.4.5.3 | | 32 | text string | URI; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 33 | text string | base64url; see Section 3.4.5.3 | | 33 | text string | base64url; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 34 | text string | base64; see Section 3.4.5.3 | | 34 | text string | base64; see Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 35 | text string | Regular expression; see | | 35 | text string | Regular expression; see |
| | | Section 3.4.5.3 | | | | Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 36 | text string | MIME message; see | | 36 | text string | MIME message; see |
| | | Section 3.4.5.3 | | | | Section 3.4.5.3 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
| 55799 | multiple | Self-described CBOR; see | | 55799 | (any) | Self-described CBOR; see |
| | | Section 3.4.6 | | | | Section 3.4.6 |
+------------+-------------+----------------------------------+ +------------+-------------+----------------------------------+
Table 4: Tag numbers defined in RFC 7049 Table 5: Tag numbers defined in RFC 7049
Conceptually, tags are interpreted in the generic data model, not at Conceptually, tags are interpreted in the generic data model, not at
(de-)serialization time. A small number of tags (specifically, tag (de-)serialization time. A small number of tags (specifically, tag
number 25 and tag number 29) have been registered with semantics that number 25 and tag number 29) have been registered with semantics that
may require processing at (de-)serialization time: The decoder needs may require processing at (de-)serialization time: The decoder needs
to be aware and the encoder needs to be in control of the exact to be aware and the encoder needs to be in control of the exact
sequence in which data items are encoded into the CBOR data stream. sequence in which data items are encoded into the CBOR data stream.
This means these tags cannot be implemented on top of every generic This means these tags cannot be implemented on top of every generic
CBOR encoder/decoder (which might not reflect the serialization order CBOR encoder/decoder (which might not reflect the serialization order
for entries in a map at the data model level and vice versa); their for entries in a map at the data model level and vice versa); their
implementation therefore typically needs to be integrated into the implementation therefore typically needs to be integrated into the
generic encoder/decoder. The definition of new tags with this generic encoder/decoder. The definition of new tags with this
property is NOT RECOMMENDED. property is NOT RECOMMENDED.
Protocols using tag numbers 0 and 1 extend the generic data model Protocols using tag numbers 0 and 1 extend the generic data model
(Section 2) with data items representing points in time; tag numbers (Section 2) with data items representing points in time; tag numbers
2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5, 2 and 3, with arbitrarily sized integers; and tag numbers 4 and 5,
with floating point values of arbitrary size and precision. with floating-point values of arbitrary size and precision.
3.4.1. Standard Date/Time String 3.4.1. Standard Date/Time String
Tag number 0 contains a text string in the standard format described Tag number 0 contains a text string in the standard format described
by the "date-time" production in [RFC3339], as refined by Section 3.3 by the "date-time" production in [RFC3339], as refined by Section 3.3
of [RFC4287], representing the point in time described there. A of [RFC4287], representing the point in time described there. A
nested item of another type or that doesn't match the [RFC4287] nested item of another type or that doesn't match the [RFC4287]
format is invalid. format is invalid.
3.4.2. Epoch-based Date/Time 3.4.2. Epoch-based Date/Time
Tag number 1 contains a numerical value counting the number of Tag number 1 contains a numerical value counting the number of
seconds from 1970-01-01T00:00Z in UTC time to the represented point seconds from 1970-01-01T00:00Z in UTC time to the represented point
in civil time. in civil time.
The enclosed item MUST be an unsigned or negative integer (major The tag content MUST be an unsigned or negative integer (major types
types 0 and 1), or a floating-point number (major type 7 with 0 and 1), or a floating-point number (major type 7 with additional
additional information 25, 26, or 27). Other contained types are information 25, 26, or 27). Other contained types are invalid.
invalid.
Non-negative values (major type 0 and non-negative floating-point Non-negative values (major type 0 and non-negative floating-point
numbers) stand for time values on or after 1970-01-01T00:00Z UTC and numbers) stand for time values on or after 1970-01-01T00:00Z UTC and
are interpreted according to POSIX [TIME_T]. (POSIX time is also are interpreted according to POSIX [TIME_T]. (POSIX time is also
known as UNIX Epoch time. Note that leap seconds are handled known as UNIX Epoch time. Note that leap seconds are handled
specially by POSIX time and this results in a 1 second discontinuity specially by POSIX time and this results in a 1 second discontinuity
several times per decade.) Note that applications that require the several times per decade.) Note that applications that require the
expression of times beyond early 2106 cannot leave out support of expression of times beyond early 2106 cannot leave out support of
64-bit integers for the enclosed value. 64-bit integers for the tag content.
Negative values (major type 1 and negative floating-point numbers) Negative values (major type 1 and negative floating-point numbers)
are interpreted as determined by the application requirements as are interpreted as determined by the application requirements as
there is no universal standard for UTC count-of-seconds time before there is no universal standard for UTC count-of-seconds time before
1970-01-01T00:00Z (this is particularly true for points in time that 1970-01-01T00:00Z (this is particularly true for points in time that
precede discontinuities in national calendars). The same applies to precede discontinuities in national calendars). The same applies to
non-finite values. non-finite values.
To indicate fractional seconds, floating-point values can be used To indicate fractional seconds, floating-point values can be used
within tag number 1 instead of integer values. Note that this within tag number 1 instead of integer values. Note that this
generally requires binary64 support, as binary16 and binary32 provide generally requires binary64 support, as binary16 and binary32 provide
non-zero fractions of seconds only for a short period of time around non-zero fractions of seconds only for a short period of time around
early 1970. An application that requires tag number 1 support may early 1970. An application that requires tag number 1 support may
restrict the enclosed value to be an integer (or a floating-point restrict the tag content to be an integer (or a floating-point value)
value) only. only.
Note that platform types for date/time may include null or undefined
values, which may also be desirable at an application protocol level.
While emitting tag number 1 values with non-finite tag content values
(e.g., with NaN for undefined date/time values or with Infinite for
an expiry date that is not set) may seem an obvious way to handle
this, using untagged null or undefined is often a better solution.
Application protocol designers are encouraged to consider these cases
and include clear guidelines for handling them.
3.4.3. Bignums 3.4.3. Bignums
Protocols using tag numbers 2 and 3 extend the generic data model Protocols using tag numbers 2 and 3 extend the generic data model
(Section 2) with "bignums" representing arbitrarily sized integers. (Section 2) with "bignums" representing arbitrarily sized integers.
In the generic data model, bignum values are not equal to integers In the basic generic data model, bignum values are not equal to
from the basic data model, but specific data models can define that integers from the same model, but the extended generic data model
equivalence, and preferred encoding never makes use of bignums that created by this tag definition defines equivalence based on numeric
also can be expressed as basic integers (see below). value, and preferred serialization (Section 4.1) never makes use of
bignums that also can be expressed as basic integers (see below).
Bignums are encoded as a byte string data item, which is interpreted Bignums are encoded as a byte string data item, which is interpreted
as an unsigned integer n in network byte order. Contained items of as an unsigned integer n in network byte order. Contained items of
other types are invalid. For tag number 2, the value of the bignum other types are invalid. For tag number 2, the value of the bignum
is n. For tag number 3, the value of the bignum is -1 - n. The is n. For tag number 3, the value of the bignum is -1 - n. The
preferred encoding of the byte string is to leave out any leading preferred serialization of the byte string is to leave out any
zeroes (note that this means the preferred encoding for n = 0 is the leading zeroes (note that this means the preferred serialization for
empty byte string, but see below). Decoders that understand these n = 0 is the empty byte string, but see below). Decoders that
tags MUST be able to decode bignums that do have leading zeroes. The understand these tags MUST be able to decode bignums that do have
preferred encoding of an integer that can be represented using major leading zeroes. The preferred serialization of an integer that can
type 0 or 1 is to encode it this way instead of as a bignum (which be represented using major type 0 or 1 is to encode it this way
means that the empty string never occurs in a bignum when using instead of as a bignum (which means that the empty string never
preferred encoding). Note that this means the non-preferred choice occurs in a bignum when using preferred serialization). Note that
of a bignum representation instead of a basic integer for encoding a this means the non-preferred choice of a bignum representation
number is not intended to have application semantics (just as the instead of a basic integer for encoding a number is not intended to
choice of a longer basic integer representation than needed, such as have application semantics (just as the choice of a longer basic
0x1800 for 0x00 does not). integer representation than needed, such as 0x1800 for 0x00 does
not).
For example, the number 18446744073709551616 (2**64) is represented For example, the number 18446744073709551616 (2**64) is represented
as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001
(major type 2, length 9), followed by 0x010000000000000000 (one byte (major type 2, length 9), followed by 0x010000000000000000 (one byte
0x01 and eight bytes 0x00). In hexadecimal: 0x01 and eight bytes 0x00). In hexadecimal:
C2 -- Tag 2 C2 -- Tag 2
49 -- Byte string of length 9 49 -- Byte string of length 9
010000000000000000 -- Bytes content 010000000000000000 -- Bytes content
skipping to change at page 22, line 28 skipping to change at page 24, line 17
Protocols using tag number 4 extend the generic data model with data Protocols using tag number 4 extend the generic data model with data
items representing arbitrary-length decimal fractions of the form items representing arbitrary-length decimal fractions of the form
m*(10**e). Protocols using tag number 5 extend the generic data m*(10**e). Protocols using tag number 5 extend the generic data
model with data items representing arbitrary-length binary fractions model with data items representing arbitrary-length binary fractions
of the form m*(2**e). As with bignums, values of different types are of the form m*(2**e). As with bignums, values of different types are
not equal in the generic data model. not equal in the generic data model.
Decimal fractions combine an integer mantissa with a base-10 scaling Decimal fractions combine an integer mantissa with a base-10 scaling
factor. They are most useful if an application needs the exact factor. They are most useful if an application needs the exact
representation of a decimal fraction such as 1.1 because there is no representation of a decimal fraction such as 1.1 because there is no
exact representation for many decimal fractions in binary floating exact representation for many decimal fractions in binary floating-
point. point representations.
Bigfloats combine an integer mantissa with a base-2 scaling factor. "Bigfloats" combine an integer mantissa with a base-2 scaling factor.
They are binary floating-point values that can exceed the range or They are binary floating-point values that can exceed the range or
the precision of the three IEEE 754 formats supported by CBOR the precision of the three IEEE 754 formats supported by CBOR
(Section 3.3). Bigfloats may also be used by constrained (Section 3.3). Bigfloats may also be used by constrained
applications that need some basic binary floating-point capability applications that need some basic binary floating-point capability
without the need for supporting IEEE 754. without the need for supporting IEEE 754.
A decimal fraction or a bigfloat is represented as a tagged array A decimal fraction or a bigfloat is represented as a tagged array
that contains exactly two integer numbers: an exponent e and a that contains exactly two integer numbers: an exponent e and a
mantissa m. Decimal fractions (tag number 4) use base-10 exponents; mantissa m. Decimal fractions (tag number 4) use base-10 exponents;
the value of a decimal fraction data item is m*(10**e). Bigfloats the value of a decimal fraction data item is m*(10**e). Bigfloats
(tag number 5) use base-2 exponents; the value of a bigfloat data (tag number 5) use base-2 exponents; the value of a bigfloat data
item is m*(2**e). The exponent e MUST be represented in an integer item is m*(2**e). The exponent e MUST be represented in an integer
of major type 0 or 1, while the mantissa also can be a bignum of major type 0 or 1, while the mantissa can also be a bignum
(Section 3.4.3). Contained items with other structures are invalid. (Section 3.4.3). Contained items with other structures are invalid.
An example of a decimal fraction is that the number 273.15 could be An example of a decimal fraction is that the number 273.15 could be
represented as 0b110_00100 (major type of 6 for the tag, additional represented as 0b110_00100 (major type of 6 for the tag, additional
information of 4 for the number of tag), followed by 0b100_00010 information of 4 for the number of tag), followed by 0b100_00010
(major type of 4 for the array, additional information of 2 for the (major type of 4 for the array, additional information of 2 for the
length of the array), followed by 0b001_00001 (major type of 1 for length of the array), followed by 0b001_00001 (major type of 1 for
the first integer, additional information of 1 for the value of -2), the first integer, additional information of 1 for the value of -2),
followed by 0b000_11001 (major type of 0 for the second integer, followed by 0b000_11001 (major type of 0 for the second integer,
additional information of 25 for a two-byte value), followed by additional information of 25 for a two-byte value), followed by
skipping to change at page 23, line 31 skipping to change at page 25, line 19
information of 3 for the value of 3). In hexadecimal: information of 3 for the value of 3). In hexadecimal:
C5 -- Tag 5 C5 -- Tag 5
82 -- Array of length 2 82 -- Array of length 2
20 -- -1 20 -- -1
03 -- 3 03 -- 3
Decimal fractions and bigfloats provide no representation of Decimal fractions and bigfloats provide no representation of
Infinity, -Infinity, or NaN; if these are needed in place of a Infinity, -Infinity, or NaN; if these are needed in place of a
decimal fraction or bigfloat, the IEEE 754 half-precision decimal fraction or bigfloat, the IEEE 754 half-precision
representations from Section 3.3 can be used. For constrained representations from Section 3.3 can be used.
applications, where there is a choice between representing a specific
number as an integer and as a decimal fraction or bigfloat (such as
when the exponent is small and non-negative), there is a quality-of-
implementation expectation that the integer representation is used
directly.
3.4.5. Content Hints 3.4.5. Content Hints
The tags in this section are for content hints that might be used by The tags in this section are for content hints that might be used by
generic CBOR processors. These content hints do not extend the generic CBOR processors. These content hints do not extend the
generic data model. generic data model.
3.4.5.1. Encoded CBOR Data Item 3.4.5.1. Encoded CBOR Data Item
Sometimes it is beneficial to carry an embedded CBOR data item that Sometimes it is beneficial to carry an embedded CBOR data item that
skipping to change at page 24, line 27 skipping to change at page 26, line 11
does not know whether or not the converter will be generic, and does not know whether or not the converter will be generic, and
therefore wants to say what it believes is the proper way to convert therefore wants to say what it believes is the proper way to convert
binary strings to JSON. binary strings to JSON.
The data item tagged can be a byte string or any other data item. In The data item tagged can be a byte string or any other data item. In
the latter case, the tag applies to all of the byte string data items the latter case, the tag applies to all of the byte string data items
contained in the data item, except for those contained in a nested contained in the data item, except for those contained in a nested
data item tagged with an expected conversion. data item tagged with an expected conversion.
These three tag numbers suggest conversions to three of the base data These three tag numbers suggest conversions to three of the base data
encodings defined in [RFC4648]. For base64url encoding (tag number encodings defined in [RFC4648]. Tag number 21 suggests conversion to
21), padding is not used (see Section 3.2 of RFC 4648); that is, all base64url encoding (Section 5 of RFC 4648), where padding is not used
trailing equals signs ("=") are removed from the encoded string. For (see Section 3.2 of RFC 4648); that is, all trailing equals signs
base64 encoding (tag number 22), padding is used as defined in RFC ("=") are removed from the encoded string. Tag number 22 suggests
4648. For both base64url and base64, padding bits are set to zero conversion to classical base64 encoding (Section 4 of RFC 4648), with
(see Section 3.5 of RFC 4648), and encoding is performed without the padding as defined in RFC 4648. For both base64url and base64,
inclusion of any line breaks, whitespace, or other additional padding bits are set to zero (see Section 3.5 of RFC 4648), and
characters. Note that, for all three tag numbers, the encoding of encoding is performed without the inclusion of any line breaks,
the empty byte string is the empty text string. whitespace, or other additional characters. Tag number 23 suggests
conversion to base16 (hex) encoding, with uppercase alphabetics (see
Section 8 of RFC 4648). Note that, for all three tag numbers, the
encoding of the empty byte string is the empty text string.
3.4.5.3. Encoded Text 3.4.5.3. Encoded Text
Some text strings hold data that have formats widely used on the Some text strings hold data that have formats widely used on the
Internet, and sometimes those formats can be validated and presented Internet, and sometimes those formats can be validated and presented
to the application in appropriate form by the decoder. There are to the application in appropriate form by the decoder. There are
tags for some of these formats. As with tag numbers 21 to 23, if tags for some of these formats. As with tag numbers 21 to 23, if
these tags are applied to an item other than a text string, they these tags are applied to an item other than a text string, they
apply to all text string data items it contains. apply to all text string data items it contains.
* Tag number 32 is for URIs, as defined in [RFC3986]. If the text * Tag number 32 is for URIs, as defined in [RFC3986]. If the text
string doesn't match the "URI-reference" production, the string is string doesn't match the "URI-reference" production, the string is
invalid. invalid.
* Tag numbers 33 and 34 are for base64url- and base64-encoded text * Tag numbers 33 and 34 are for base64url- and base64-encoded text
strings, as defined in [RFC4648]. If any of: strings, respectively, as defined in [RFC4648]. If any of:
- the encoded text string contains non-alphabet characters or - the encoded text string contains non-alphabet characters or
only 1 character in the last block of 4, or only 1 character in the last block of 4, or
- the padding bits in a 2- or 3-character block are not 0, or - the padding bits in a 2- or 3-character block are not 0, or
- the base64 encoding has the wrong number of padding characters, - the base64 encoding has the wrong number of padding characters,
or or
- the base64url encoding has padding characters, - the base64url encoding has padding characters,
skipping to change at page 25, line 33 skipping to change at page 27, line 21
itself, need to be conveyed.) Any contained string value is itself, need to be conveyed.) Any contained string value is
valid. valid.
* Tag number 36 is for MIME messages (including all headers), as * Tag number 36 is for MIME messages (including all headers), as
defined in [RFC2045]. A text string that isn't a valid MIME defined in [RFC2045]. A text string that isn't a valid MIME
message is invalid. (For this tag, validity checking may be message is invalid. (For this tag, validity checking may be
particularly onerous for a generic decoder and might therefore not particularly onerous for a generic decoder and might therefore not
be offered. Note that many MIME messages are general binary data be offered. Note that many MIME messages are general binary data
and can therefore not be represented in a text string; and can therefore not be represented in a text string;
[IANA.cbor-tags] lists a registration for tag number 257 that is [IANA.cbor-tags] lists a registration for tag number 257 that is
similar to tag number 36 but is used with an enclosed byte similar to tag number 36 but uses a byte string as its tag
string.) content.)
Note that tag numbers 33 and 34 differ from 21 and 22 in that the Note that tag numbers 33 and 34 differ from 21 and 22 in that the
data is transported in base-encoded form for the former and in raw data is transported in base-encoded form for the former and in raw
byte string form for the latter. byte string form for the latter.
3.4.6. Self-Described CBOR 3.4.6. Self-Described CBOR
In many applications, it will be clear from the context that CBOR is In many applications, it will be clear from the context that CBOR is
being employed for encoding a data item. For instance, a specific being employed for encoding a data item. For instance, a specific
protocol might specify the use of CBOR, or a media type is indicated protocol might specify the use of CBOR, or a media type is indicated
that specifies its use. However, there may be applications where that specifies its use. However, there may be applications where
such context information is not available, such as when CBOR data is such context information is not available, such as when CBOR data is
stored in a file that does not have disambiguating metadata. Here, stored in a file that does not have disambiguating metadata. Here,
it may help to have some distinguishing characteristics for the data it may help to have some distinguishing characteristics for the data
itself. itself.
Tag number 55799 is defined for this purpose. It does not impart any Tag number 55799 is defined for this purpose. It does not impart any
special semantics on the data item that it encloses; that is, the special semantics on the data item that it encloses; that is, the
semantics of a data item enclosed in tag number 55799 is exactly semantics of the tag content enclosed in tag number 55799 is exactly
identical to the semantics of the data item itself. identical to the semantics of the tag content itself.
The serialization of this tag's head is 0xd9d9f7, which does not The serialization of this tag's head is 0xd9d9f7, which does not
appear to be in use as a distinguishing mark for any frequently used appear to be in use as a distinguishing mark for any frequently used
file types. In particular, 0xd9d9f7 is not a valid start of a file types. In particular, 0xd9d9f7 is not a valid start of a
Unicode text in any Unicode encoding if it is followed by a valid Unicode text in any Unicode encoding if it is followed by a valid
CBOR data item. CBOR data item.
For instance, a decoder might be able to decode both CBOR and JSON. For instance, a decoder might be able to decode both CBOR and JSON.
Such a decoder would need to mechanically distinguish the two Such a decoder would need to mechanically distinguish the two
formats. An easy way for an encoder to help the decoder would be to formats. An easy way for an encoder to help the decoder would be to
skipping to change at page 26, line 31 skipping to change at page 28, line 19
4.1. Preferred Serialization 4.1. Preferred Serialization
For some values at the data model level, CBOR provides multiple For some values at the data model level, CBOR provides multiple
serializations. For many applications, it is desirable that an serializations. For many applications, it is desirable that an
encoder always chooses a preferred serialization (preferred encoder always chooses a preferred serialization (preferred
encoding); however, the present specification does not put the burden encoding); however, the present specification does not put the burden
of enforcing this preference on either encoder or decoder. of enforcing this preference on either encoder or decoder.
Some constrained decoders may be limited in their ability to decode Some constrained decoders may be limited in their ability to decode
non-preferred serializations: For example, if only integers below non-preferred serializations: For example, if only integers below
1_000_000_000 are expected in an application, the decoder may leave 1_000_000_000 (one billion) are expected in an application, the
out the code that would be needed to decode 64-bit arguments in decoder may leave out the code that would be needed to decode 64-bit
integers. An encoder that always uses preferred serialization arguments in integers. An encoder that always uses preferred
("preferred encoder") interoperates with this decoder for the numbers serialization ("preferred encoder") interoperates with this decoder
that can occur in this application. More generally speaking, it for the numbers that can occur in this application. More generally
therefore can be said that a preferred encoder is more universally speaking, it therefore can be said that a preferred encoder is more
interoperable (and also less wasteful) than one that, say, always universally interoperable (and also less wasteful) than one that,
uses 64-bit integers. say, always uses 64-bit integers.
Similarly, a constrained encoder may be limited in the variety of Similarly, a constrained encoder may be limited in the variety of
representation variants it supports in such a way that it does not representation variants it supports in such a way that it does not
emit preferred serializations ("variant encoder"): Say, it could be emit preferred serializations ("variant encoder"): Say, it could be
designed to always use the 32-bit variant for an integer that it designed to always use the 32-bit variant for an integer that it
encodes even if a short representation is available (again, assuming encodes even if a short representation is available (again, assuming
that there is no application need for integers that can only be that there is no application need for integers that can only be
represented with the 64-bit variant). A decoder that does not rely represented with the 64-bit variant). A decoder that does not rely
on only ever receiving preferred serializations ("variation-tolerant on only ever receiving preferred serializations ("variation-tolerant
decoder") can there be said to be more universally interoperable (it decoder") can there be said to be more universally interoperable (it
might very well optimize for the case of receiving preferred might very well optimize for the case of receiving preferred
serializations, though). Full implementations of CBOR decoders are serializations, though). Full implementations of CBOR decoders are
by definition variation-tolerant; the distinction is only relevant if by definition variation-tolerant; the distinction is only relevant if
a constrained implementation of a CBOR decoder meets a variant a constrained implementation of a CBOR decoder meets a variant
encoder. encoder.
The preferred serialization always uses the shortest form of The preferred serialization always uses the shortest form of
representing the argument (Section 3)); it also uses the shortest representing the argument (Section 3); it also uses the shortest
floating-point encoding that preserves the value being encoded (see floating-point encoding that preserves the value being encoded.
Section 5.5). Definite length encoding is preferred whenever the
length is known at the time the serialization of the item starts. The preferred serialization for a floating-point value is the
shortest floating-point encoding that preserves its value, e.g.,
0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5.
For NaN values, a shorter encoding is preferred if zero-padding the
shorter significand towards the right reconstitutes the original NaN
value (for many applications, the single NaN encoding 0xf97e00 will
suffice).
Definite length encoding is preferred whenever the length is known at
the time the serialization of the item starts.
4.2. Deterministically Encoded CBOR 4.2. Deterministically Encoded CBOR
Some protocols may want encoders to only emit CBOR in a particular Some protocols may want encoders to only emit CBOR in a particular
deterministic format; those protocols might also have the decoders deterministic format; those protocols might also have the decoders
check that their input is in that deterministic format. Those check that their input is in that deterministic format. Those
protocols are free to define what they mean by a "deterministic protocols are free to define what they mean by a "deterministic
format" and what encoders and decoders are expected to do. This format" and what encoders and decoders are expected to do. This
section defines a set of restrictions that can serve as the base of section defines a set of restrictions that can serve as the base of
such a deterministic format. such a deterministic format.
skipping to change at page 27, line 45 skipping to change at page 29, line 42
- 24 to 255 and -25 to -256 MUST be expressed only with an - 24 to 255 and -25 to -256 MUST be expressed only with an
additional uint8_t; additional uint8_t;
- 256 to 65535 and -257 to -65536 MUST be expressed only with an - 256 to 65535 and -257 to -65536 MUST be expressed only with an
additional uint16_t; additional uint16_t;
- 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
only with an additional uint32_t. only with an additional uint32_t.
Floating point values also MUST use the shortest form that Floating-point values also MUST use the shortest form that
preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5 preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5
as 0xfa49742408. as 0xfa49742408. (One implementation of this is to have all
floats start as a 64-bit float, then do a test conversion to a
32-bit float; if the result is the same numeric value, use the
shorter form and repeat the process with a test conversion to a
16-bit float. This also works to select 16-bit float for positive
and negative Infinity as well.)
* Indefinite-length items MUST NOT appear. They can be encoded as * Indefinite-length items MUST NOT appear. They can be encoded as
definite-length items instead. definite-length items instead.
* The keys in every map MUST be sorted in the bytewise lexicographic * The keys in every map MUST be sorted in the bytewise lexicographic
order of their deterministic encodings. For example, the order of their deterministic encodings. For example, the
following keys are sorted correctly: following keys are sorted correctly:
1. 10, encoded as 0x0a. 1. 10, encoded as 0x0a.
skipping to change at page 28, line 27 skipping to change at page 30, line 30
5. "aa", encoded as 0x626161. 5. "aa", encoded as 0x626161.
6. [100], encoded as 0x811864. 6. [100], encoded as 0x811864.
7. [-1], encoded as 0x8120. 7. [-1], encoded as 0x8120.
8. false, encoded as 0xf4. 8. false, encoded as 0xf4.
4.2.2. Additional Deterministic Encoding Considerations 4.2.2. Additional Deterministic Encoding Considerations
If a protocol allows for IEEE floats, then additional deterministic
encoding rules might need to be added. One example rule might be to
have all floats start as a 64-bit float, then do a test conversion to
a 32-bit float; if the result is the same numeric value, use the
shorter value and repeat the process with a test conversion to a
16-bit float. (This rule selects 16-bit float for positive and
negative Infinity as well.) Although IEEE floats can represent both
positive and negative zero as distinct values, the application might
not distinguish these and might decide to represent all zero values
with a positive sign, disallowing negative zero.
CBOR tags present additional considerations for deterministic CBOR tags present additional considerations for deterministic
encoding. If a CBOR-based protocol were to provide the same encoding. If a CBOR-based protocol were to provide the same
semantics for the presence and absence of a specific tag (e.g., by semantics for the presence and absence of a specific tag (e.g., by
allowing both tag 1 data items and raw numbers in a date/time allowing both tag 1 data items and raw numbers in a date/time
position, treating the latter as if they were tagged), the position, treating the latter as if they were tagged), the
deterministic format would not allow them. In a protocol that deterministic format would not allow the presence of the tag, based
requires tags in certain places to obtain specific semantics, the tag on the "shortest form" principle. For example, a protocol might give
needs to appear in the deterministic format as well. Deterministic encoders the choice of representing a URL as either a text string or,
encoding considerations also apply to the content of tags. using Section 3.4.5.3, tag number 32 containing a text string. This
protocol's deterministic encoding needs to either require that the
tag is present or require that it is absent, not allow either one.
Protocols that include floating, big integer, or other complex values In a protocol that does require tags in certain places to obtain
need to define extra requirements on their deterministic encodings. specific semantics, the tag needs to appear in the deterministic
For example: format as well. Deterministic encoding considerations also apply to
the content of tags.
If a protocol includes a field that can express integers with an
absolute value of 2^64 or larger using tag numbers 2 or 3
(Section 3.4.3), the protocol's deterministic encoding needs to
specify whether smaller integers are also expressed using these tags
or using major types 0 and 1. Preferred serialization uses the
latter choice, which is therefore recommended.
Protocols that include floating-point values, whether represented
using basic floating-point values (Section 3.3) or using tags (or
both), may need to define extra requirements on their deterministic
encodings, such as:
* Although IEEE floating-point values can represent both positive
and negative zero as distinct values, the application might not
distinguish these and might decide to represent all zero values
with a positive sign, disallowing negative zero. (The application
may also want to restrict the precision of floating point values
in such a way that there is never a need to represent 64-bit -- or
even 32-bit -- floating-point values.)
* If a protocol includes a field that can express floating-point * If a protocol includes a field that can express floating-point
values (Section 3.3), the protocol's deterministic encoding needs values, with a specific data model that declares integer and
to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, floating-point values to be interchangeable, the protocol's
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for deterministic encoding needs to specify whether the integer 1.0 is
this are: encoded as 0x01, 0xf93c00, 0xfa3f800000, or 0xfb3ff0000000000000.
Example rules for this are:
1. Encode integral values that fit in 64 bits as values from 1. Encode integral values that fit in 64 bits as values from
major types 0 and 1, and other values as the smallest of 16-, major types 0 and 1, and other values as the preferred
32-, or 64-bit floating point that accurately represents the (smallest of 16-, 32-, or 64-bit) floating-point
value, representation that accurately represents the value,
2. Encode all values as the smallest of 16-, 32-, or 64-bit 2. Encode all values as the preferred floating-point
floating point that accurately represents the value, even for representation that accurately represents the value, even for
integral values, or integral values, or
3. Encode all values as 64-bit floating point. 3. Encode all values as 64-bit floating-point representations.
Rule 1 straddles the boundaries between integers and floating Rule 1 straddles the boundaries between integers and floating-
point values, and Rule 3 does not use preferred encoding, so Rule point values, and Rule 3 does not use preferred serialization, so
2 may be a good choice in many cases. Rule 2 may be a good choice in many cases.
If NaN is an allowed value and there is no intent to support NaN * If NaN is an allowed value and there is no intent to support NaN
payloads or signaling NaNs, the protocol needs to pick a single payloads or signaling NaNs, the protocol needs to pick a single
representation, for example 0xf97e00. If that simple choice is representation, typically 0xf97e00. If that simple choice is not
not possible, specific attention will be needed for NaN handling. possible, specific attention will be needed for NaN handling.
Subnormal numbers (nonzero numbers with the lowest possible * Subnormal numbers (nonzero numbers with the lowest possible
exponent of a given IEEE 754 number format) may be flushed to zero exponent of a given IEEE 754 number format) may be flushed to zero
outputs or be treated as zero inputs in some floating point outputs or be treated as zero inputs in some floating-point
implementations. A protocol's deterministic encoding may want to implementations. A protocol's deterministic encoding may want to
exclude them from interchange, interchanging zero instead. specifically accommodate such implementations while creating an
onus on other implementations, by excluding subnormal numbers from
* If a protocol includes a field that can express integers with an interchange, interchanging zero instead.
absolute value of 2^64 or larger using tag numbers 2 or 3
(Section 3.4.3), the protocol's deterministic encoding needs to
specify whether small integers are expressed using the tag or
major types 0 and 1.
* A protocol might give encoders the choice of representing a URL as * The same number can be represented by different decimal fractions,
either a text string or, using Section 3.4.5.3, tag number 32 by different bigfloats, and by different forms under other tags
containing a text string. This protocol's deterministic encoding that may be defined to express numeric values. Depending on the
needs to either require that the tag is present or require that implementation, it may not always be practical to determine
it's absent, not allow either one. whether any of these forms (or forms in the basic generic data
model) are equivalent. An application protocol that presents
choices of this kind for the representation format of numbers
needs to be explicit in how the formats are to be chosen for
deterministic encoding.
4.2.3. Length-first map key ordering 4.2.3. Length-first Map Key Ordering
The core deterministic encoding requirements sort map keys in a The core deterministic encoding requirements (Section 4.2.1) sort map
different order from the one suggested by Section 3.9 of [RFC7049] keys in a different order from the one suggested by Section 3.9 of
(called "Canonical CBOR" there). Protocols that need to be [RFC7049] (called "Canonical CBOR" there). Protocols that need to be
compatible with [RFC7049]'s order can instead be specified in terms compatible with [RFC7049]'s order can instead be specified in terms
of this specification's "length-first core deterministic encoding of this specification's "length-first core deterministic encoding
requirements": requirements":
A CBOR encoding satisfies the "length-first core deterministic A CBOR encoding satisfies the "length-first core deterministic
encoding requirements" if it satisfies the core deterministic encoding requirements" if it satisfies the core deterministic
encoding requirements except that the keys in every map MUST be encoding requirements except that the keys in every map MUST be
sorted such that: sorted such that:
1. If two keys have different lengths, the shorter one sorts 1. If two keys have different lengths, the shorter one sorts
skipping to change at page 31, line 31 skipping to change at page 33, line 39
and other unexpected data. CBOR-based protocols MAY specify that and other unexpected data. CBOR-based protocols MAY specify that
they treat arbitrary valid data as unexpected. Encoders for CBOR- they treat arbitrary valid data as unexpected. Encoders for CBOR-
based protocols MUST produce only valid items, that is, the protocol based protocols MUST produce only valid items, that is, the protocol
cannot be designed to make use of invalid items. An encoder can be cannot be designed to make use of invalid items. An encoder can be
capable of encoding as many or as few types of values as is required capable of encoding as many or as few types of values as is required
by the protocol in which it is used; a decoder can be capable of by the protocol in which it is used; a decoder can be capable of
understanding as many or as few types of values as is required by the understanding as many or as few types of values as is required by the
protocols in which it is used. This lack of restrictions allows CBOR protocols in which it is used. This lack of restrictions allows CBOR
to be used in extremely constrained environments. to be used in extremely constrained environments.
This section discusses some considerations in creating CBOR-based The rest of this section discusses some considerations in creating
protocols. With few exceptions, it is advisory only and explicitly CBOR-based protocols. With few exceptions, it is advisory only and
excludes any language from BCP 14 other than words that could be explicitly excludes any language from BCP 14 other than words that
interpreted as "MAY" in the sense of BCP 14. The exceptions aim at could be interpreted as "MAY" in the sense of BCP 14. The exceptions
facilitating interoperability of CBOR-based protocols while making aim at facilitating interoperability of CBOR-based protocols while
use of a wide variety of both generic and application-specific making use of a wide variety of both generic and application-specific
encoders and decoders. encoders and decoders.
5.1. CBOR in Streaming Applications 5.1. CBOR in Streaming Applications
In a streaming application, a data stream may be composed of a In a streaming application, a data stream may be composed of a
sequence of CBOR data items concatenated back-to-back. In such an sequence of CBOR data items concatenated back-to-back. In such an
environment, the decoder immediately begins decoding a new data item environment, the decoder immediately begins decoding a new data item
if data is found after the end of a previous data item. if data is found after the end of a previous data item.
Not all of the bytes making up a data item may be immediately Not all of the bytes making up a data item may be immediately
skipping to change at page 33, line 37 skipping to change at page 35, line 51
Invalid UTF-8 string: A decoder might or might not want to verify Invalid UTF-8 string: A decoder might or might not want to verify
that the sequence of bytes in a UTF-8 string (major type 3) is that the sequence of bytes in a UTF-8 string (major type 3) is
actually valid UTF-8 and react appropriately. actually valid UTF-8 and react appropriately.
5.3.2. Tag validity 5.3.2. Tag validity
Two additional kinds of validity errors are introduced by adding tags Two additional kinds of validity errors are introduced by adding tags
to the basic generic data model: to the basic generic data model:
Inadmissible type for tag content: Tags (Section 3.4) specify what Inadmissible type for tag content: Tag numbers (Section 3.4) specify
type of data item is supposed to be enclosed by the tag; for what type of data item is supposed to be used as their tag
example, the tags for positive or negative bignums are supposed to content; for example, the tag numbers for positive or negative
be put on byte strings. A decoder that decodes the tagged data bignums are supposed to be put on byte strings. A decoder that
item into a native representation (a native big integer in this decodes the tagged data item into a native representation (a
example) is expected to check the type of the data item being native big integer in this example) is expected to check the type
tagged. Even decoders that don't have such native representations of the data item being tagged. Even decoders that don't have such
available in their environment may perform the check on those tags native representations available in their environment may perform
known to them and react appropriately. the check on those tags known to them and react appropriately.
Inadmissible value for tag content: The type of data item may be Inadmissible value for tag content: The type of data item may be
admissible for a tag's content, but the specific value may not be; admissible for a tag's content, but the specific value may not be;
e.g., a value of "yesterday" is not acceptable for the content of e.g., a value of "yesterday" is not acceptable for the content of
tag 0, even though it properly is a text string. A decoder that tag 0, even though it properly is a text string. A decoder that
normally ingests such tags into equivalent platform types might normally ingests such tags into equivalent platform types might
present this tag to the application in a similar way to how it present this tag to the application in a similar way to how it
would present a tag with an unknown tag number (Section 5.4). would present a tag with an unknown tag number (Section 5.4).
5.4. Validity and Evolution 5.4. Validity and Evolution
skipping to change at page 34, line 38 skipping to change at page 37, line 4
with an indication that the decoder did not recognize that tag with an indication that the decoder did not recognize that tag
number or simple value. number or simple value.
The latter approach, which is also appropriate for decoders that do The latter approach, which is also appropriate for decoders that do
not support validity checking, provides forward compatibility with not support validity checking, provides forward compatibility with
newly registered tags and simple values without the requirement to newly registered tags and simple values without the requirement to
update the encoder at the same time as the calling application. (For update the encoder at the same time as the calling application. (For
this, the API for the decoder needs to have a way to mark unknown this, the API for the decoder needs to have a way to mark unknown
items so that the calling application can handle them in a manner items so that the calling application can handle them in a manner
appropriate for the program.) appropriate for the program.)
Since some of the processing needed for validity checking may have an Since some of the processing needed for validity checking may have an
appreciable cost (in particular with duplicate detection for maps), appreciable cost (in particular with duplicate detection for maps),
support of validity checking is not a requirement placed on all CBOR support of validity checking is not a requirement placed on all CBOR
decoders. decoders.
Some encoders will rely on their applications to provide input data Some encoders will rely on their applications to provide input data
in such a way that valid CBOR results from the encoder. A generic in such a way that valid CBOR results from the encoder. A generic
encoder also may want to provide a validity-checking mode where it encoder may also want to provide a validity-checking mode where it
reliably limits its output to valid CBOR, independent of whether or reliably limits its output to valid CBOR, independent of whether or
not its application is indeed providing API-conformant data. not its application is indeed providing API-conformant data.
5.5. Numbers 5.5. Numbers
CBOR-based protocols should take into account that different language CBOR-based protocols should take into account that different language
environments pose different restrictions on the range and precision environments pose different restrictions on the range and precision
of numbers that are representable. For example, the JavaScript of numbers that are representable. For example, the basic JavaScript
number system treats all numbers as floating point, which may result number system treats all numbers as floating-point values, which may
in silent loss of precision in decoding integers with more than 53 result in silent loss of precision in decoding integers with more
significant bits. A protocol that uses numbers should define its than 53 significant bits. A protocol that uses numbers should define
expectations on the handling of non-trivial numbers in decoders and its expectations on the handling of non-trivial numbers in decoders
receiving applications. and receiving applications.
A CBOR-based protocol that includes floating-point numbers can A CBOR-based protocol that includes floating-point numbers can
restrict which of the three formats (half-precision, single- restrict which of the three formats (half-precision, single-
precision, and double-precision) are to be supported. For an precision, and double-precision) are to be supported. For an
integer-only application, a protocol may want to completely exclude integer-only application, a protocol may want to completely exclude
the use of floating-point values. the use of floating-point values.
A CBOR-based protocol designed for compactness may want to exclude A CBOR-based protocol designed for compactness may want to exclude
specific integer encodings that are longer than necessary for the specific integer encodings that are longer than necessary for the
application, such as to save the need to implement 64-bit integers. application, such as to save the need to implement 64-bit integers.
There is an expectation that encoders will use the most compact There is an expectation that encoders will use the most compact
integer representation that can represent a given value. However, a integer representation that can represent a given value. However, a
compact application should accept values that use a longer-than- compact application that does not require deterministic encoding
needed encoding (such as encoding "0" as 0b000_11001 followed by two should accept values that use a longer-than-needed encoding (such as
bytes of 0x00) as long as the application can decode an integer of encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as
the given size. the application can decode an integer of the given size. Similar
considerations apply to floating-point values; decoding both
preferred serializations and longer-than-needed ones is recommended.
The preferred encoding for a floating-point value is the shortest CBOR-based protocols for constrained applications that provide a
floating-point encoding that preserves its value, e.g., 0xf94580 for choice between representing a specific number as an integer and as a
the number 5.5, and 0xfa45ad9c00 for the number 5555.5, unless the decimal fraction or bigfloat (such as when the exponent is small and
CBOR-based protocol specifically excludes the use of the shorter non-negative), might express a quality-of-implementation expectation
floating-point encodings. For NaN values, a shorter encoding is that the integer representation is used directly.
preferred if zero-padding the shorter significand towards the right
reconstitutes the original NaN value (for many applications, the
single NaN encoding 0xf97e00 will suffice).
5.6. Specifying Keys for Maps 5.6. Specifying Keys for Maps
The encoding and decoding applications need to agree on what types of The encoding and decoding applications need to agree on what types of
keys are going to be used in maps. In applications that need to keys are going to be used in maps. In applications that need to
interwork with JSON-based applications, keys probably should be interwork with JSON-based applications, conversion is simplified by
limited to UTF-8 strings only; otherwise, there has to be a specified limiting keys to text strings only; otherwise, there has to be a
mapping from the other CBOR types to Unicode characters, and this specified mapping from the other CBOR types to text strings, and this
often leads to implementation errors. In applications where keys are often leads to implementation errors. In applications where keys are
numeric in nature and numeric ordering of keys is important to the numeric in nature and numeric ordering of keys is important to the
application, directly using the numbers for the keys is useful. application, directly using the numbers for the keys is useful.
If multiple types of keys are to be used, consideration should be If multiple types of keys are to be used, consideration should be
given to how these types would be represented in the specific given to how these types would be represented in the specific
programming environments that are to be used. For example, in programming environments that are to be used. For example, in
JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished
from a key of floating-point 1.0. This means that, if integer keys from a key of floating-point 1.0. This means that, if integer keys
are used, the protocol needs to avoid use of floating-point keys the are used, the protocol needs to avoid use of floating-point keys the
skipping to change at page 36, line 27 skipping to change at page 38, line 38
the enclosing data item is completely available ("streaming encoder") the enclosing data item is completely available ("streaming encoder")
may want to reduce its overhead significantly by relying on its data may want to reduce its overhead significantly by relying on its data
source to maintain uniqueness. source to maintain uniqueness.
A CBOR-based protocol MUST define what to do when a receiving A CBOR-based protocol MUST define what to do when a receiving
application does see multiple identical keys in a map. The resulting application does see multiple identical keys in a map. The resulting
rule in the protocol MUST respect the CBOR data model: it cannot rule in the protocol MUST respect the CBOR data model: it cannot
prescribe a specific handling of the entries with the identical keys, prescribe a specific handling of the entries with the identical keys,
except that it might have a rule that having identical keys in a map except that it might have a rule that having identical keys in a map
indicates a malformed map and that the decoder has to stop with an indicates a malformed map and that the decoder has to stop with an
error. Duplicate keys are also prohibited by CBOR decoders that error. When processing maps that exhibit entries with duplicate
enforce validity (Section 5.4). keys, a generic decoder might do one of the following:
* Not accept maps duplicate keys (that is, enforce validity for
maps, see also Section 5.4). These generic decoders are
universally useful. An application may still need to do perform
its own duplicate checking based on application rules (for
instance if the application equates integers and floating point
values in map key positions for specific maps).
* Pass all map entries to the application, including ones with
duplicate keys. This requires the application to handle (check
against) duplicate keys, even if the application rules are
identical to the generic data model rules.
* Lose some entries with duplicate keys, e.g. by only delivering the
final (or first) entry out of the entries with the same key. With
such a generic decoder, applications may get different results for
a specific key on different runs and with different generic
decoders as which value is returned is based on generic decoder
implementation and the actual order of keys in the map. In
particular, applications cannot validate key uniqueness on their
own as they do not necessarily see all entries; they may not be
able to use such a generic decoder if they do need to validate key
uniqueness. These generic decoders can only be used in situations
where the data source and transfer can be relied upon to always
provide valid maps; this is not possible if the data source and
transfer can be attacked.
Generic decoders need to document which of these three approaches
they implement.
The CBOR data model for maps does not allow ascribing semantics to The CBOR data model for maps does not allow ascribing semantics to
the order of the key/value pairs in the map representation. Thus, a the order of the key/value pairs in the map representation. Thus, a
CBOR-based protocol MUST NOT specify that changing the key/value pair CBOR-based protocol MUST NOT specify that changing the key/value pair
order in a map would change the semantics, except to specify that order in a map would change the semantics, except to specify that
some, orders are disallowed, for example where they would not meet some orders are disallowed, for example where they would not meet the
the requirements of a deterministic encoding (Section 4.2). (Any requirements of a deterministic encoding (Section 4.2). (Any
secondary effects of map ordering such as on timing, cache usage, and secondary effects of map ordering such as on timing, cache usage, and
other potential side channels are not considered part of the other potential side channels are not considered part of the
semantics but may be enough reason on its own for a protocol to semantics but may be enough reason on their own for a protocol to
require a deterministic encoding format.) require a deterministic encoding format.)
Applications for constrained devices that have maps where a small Applications for constrained devices that have maps where a small
number of frequently used keys can be identified should consider number of frequently used keys can be identified should consider
using small integers as keys; for instance, a set of 24 or fewer using small integers as keys; for instance, a set of 24 or fewer
frequent keys can be encoded in a single byte as unsigned integers, frequent keys can be encoded in a single byte as unsigned integers,
up to 48 if negative integers are also used. Less frequently up to 48 if negative integers are also used. Less frequently
occurring keys can then use integers with longer encodings. occurring keys can then use integers with longer encodings.
5.6.1. Equivalence of Keys 5.6.1. Equivalence of Keys
skipping to change at page 37, line 24 skipping to change at page 40, line 17
purpose of map key equivalence, NaN (not a number) values are purpose of map key equivalence, NaN (not a number) values are
equivalent if they have the same significand after zero-extending equivalent if they have the same significand after zero-extending
both significands at the right to 64 bits. both significands at the right to 64 bits.
(Byte and text) strings are compared byte by byte, arrays element by (Byte and text) strings are compared byte by byte, arrays element by
element, and are equal if they have the same number of bytes/elements element, and are equal if they have the same number of bytes/elements
and the same values at the same positions. Two maps are equal if and the same values at the same positions. Two maps are equal if
they have the same set of pairs regardless of their order; pairs are they have the same set of pairs regardless of their order; pairs are
equal if both the key and value are equal. equal if both the key and value are equal.
Tagged values are equal if both the tag number and the enclosed item Tagged values are equal if both the tag number and the tag content
are equal. (Note that a generic decoder that provides processing for are equal. (Note that a generic decoder that provides processing for
a specific tag may not be able to distinguish some semantically a specific tag may not be able to distinguish some semantically
equivalent values, e.g. if leading zeroes occur in the content of tag equivalent values, e.g. if leading zeroes occur in the content of tag
2/3 (Section 3.4.3).) Simple values are equal if they simply have 2/3 (Section 3.4.3).) Simple values are equal if they simply have
the same value. Nothing else is equal in the generic data model, a the same value. Nothing else is equal in the generic data model, a
simple value 2 is not equivalent to an integer 2 and an array is simple value 2 is not equivalent to an integer 2 and an array is
never equivalent to a map. never equivalent to a map.
As discussed in Section 2.2, specific data models can make values As discussed in Section 2.2, specific data models can make values
equivalent for the purpose of comparing map keys that are distinct in equivalent for the purpose of comparing map keys that are distinct in
skipping to change at page 39, line 27 skipping to change at page 42, line 15
* A bignum (major type 6, tag number 2 or 3) is represented by * A bignum (major type 6, tag number 2 or 3) is represented by
encoding its byte string in base64url without padding and becomes encoding its byte string in base64url without padding and becomes
a JSON string. For tag number 3 (negative bignum), a "~" (ASCII a JSON string. For tag number 3 (negative bignum), a "~" (ASCII
tilde) is inserted before the base-encoded value. (The conversion tilde) is inserted before the base-encoded value. (The conversion
to a binary blob instead of a number is to prevent a likely to a binary blob instead of a number is to prevent a likely
numeric overflow for the JSON decoder.) numeric overflow for the JSON decoder.)
* A byte string with an encoding hint (major type 6, tag number 21 * A byte string with an encoding hint (major type 6, tag number 21
through 23) is encoded as described and becomes a JSON string. through 23) is encoded as described and becomes a JSON string.
* For all other tags (major type 6, any other tag number), the * For all other tags (major type 6, any other tag number), the tag
enclosed CBOR item is represented as a JSON value; the tag number content is represented as a JSON value; the tag number is ignored.
is ignored.
* Indefinite-length items are made definite before conversion. * Indefinite-length items are made definite before conversion.
6.2. Converting from JSON to CBOR 6.2. Converting from JSON to CBOR
All JSON values, once decoded, directly map into one or more CBOR All JSON values, once decoded, directly map into one or more CBOR
values. As with any kind of CBOR generation, decisions have to be values. As with any kind of CBOR generation, decisions have to be
made with respect to number representation. In a suggested made with respect to number representation. In a suggested
conversion: conversion:
skipping to change at page 41, line 34 skipping to change at page 44, line 21
been allocated. Implementations receiving an unknown simple data been allocated. Implementations receiving an unknown simple data
item may be able to process it as such, given that the structure item may be able to process it as such, given that the structure
of the value is indeed simple. The IANA registry in Section 9.1 of the value is indeed simple. The IANA registry in Section 9.1
is the appropriate way to address the extensibility of this is the appropriate way to address the extensibility of this
codepoint space. codepoint space.
* the "tag" space (values in major type 6). Again, only a small * the "tag" space (values in major type 6). Again, only a small
part of the codepoint space has been allocated, and the space is part of the codepoint space has been allocated, and the space is
abundant (although the early numbers are more efficient than the abundant (although the early numbers are more efficient than the
later ones). Implementations receiving an unknown tag number can later ones). Implementations receiving an unknown tag number can
choose to simply ignore it or to process it as an unknown tag choose to simply ignore it (process just the enclosed tag content)
number wrapping the enclosed data item. The IANA registry in or to process it as an unknown tag number wrapping the tag
Section 9.2 is the appropriate way to address the extensibility of content. The IANA registry in Section 9.2 is the appropriate way
this codepoint space. to address the extensibility of this codepoint space.
* the "additional information" space. An implementation receiving * the "additional information" space. An implementation receiving
an unknown additional information value has no way to continue an unknown additional information value has no way to continue
decoding, so allocating codepoints to this space is a major step. decoding, so allocating codepoints to this space is a major step.
There are also very few codepoints left. There are also very few codepoints left. See also Section 7.2.
7.2. Curating the Additional Information Space 7.2. Curating the Additional Information Space
The human mind is sometimes drawn to filling in little perceived gaps The human mind is sometimes drawn to filling in little perceived gaps
to make something neat. We expect the remaining gaps in the to make something neat. We expect the remaining gaps in the
codepoint space for the additional information values to be an codepoint space for the additional information values to be an
attractor for new ideas, just because they are there. attractor for new ideas, just because they are there.
The present specification does not manage the additional information The present specification does not manage the additional information
codepoint space by an IANA registry. Instead, allocations out of codepoint space by an IANA registry. Instead, allocations out of
this space can only be done by updating this specification. this space can only be done by updating this specification.
For an additional information value of n >= 24, the size of the For an additional information value of n >= 24, the size of the
additional data typically is 2**(n-24) bytes. Therefore, additional additional data typically is 2**(n-24) bytes. Therefore, additional
information values 28 and 29 should be viewed as candidates for information values 28 and 29 should be viewed as candidates for
128-bit and 256-bit quantities, in case a need arises to add them to 128-bit and 256-bit quantities, in case a need arises to add them to
the protocol. Additional information value 30 is then the only the protocol. Additional information value 30 is then the only
additional information value available for general allocation, and additional information value available for general allocation, and
there should be a very good reason for allocating it before assigning there should be a very good reason for allocating it before assigning
it through an update of this protocol. it through an update of the present specification.
8. Diagnostic Notation 8. Diagnostic Notation
CBOR is a binary interchange format. To facilitate documentation and CBOR is a binary interchange format. To facilitate documentation and
debugging, and in particular to facilitate communication between debugging, and in particular to facilitate communication between
entities cooperating in debugging, this section defines a simple entities cooperating in debugging, this section defines a simple
human-readable diagnostic notation. All actual interchange always human-readable diagnostic notation. All actual interchange always
happens in the binary format. happens in the binary format.
Note that this truly is a diagnostic format; it is not meant to be Note that this truly is a diagnostic format; it is not meant to be
parsed. Therefore, no formal definition (as in ABNF) is given in parsed. Therefore, no formal definition (as in ABNF) is given in
this document. (Implementers looking for a text-based format for this document. (Implementers looking for a text-based format for
representing CBOR data items in configuration files may also want to representing CBOR data items in configuration files may also want to
consider YAML [YAML].) consider YAML [YAML].)
The diagnostic notation is loosely based on JSON as it is defined in The diagnostic notation is loosely based on JSON as it is defined in
RFC 8259, extending it where needed. RFC 8259, extending it where needed.
The notation borrows the JSON syntax for numbers (integer and The notation borrows the JSON syntax for numbers (integer and
floating point), True (>true<), False (>false<), Null (>null<), UTF-8 floating-point), True (>true<), False (>false<), Null (>null<), UTF-8
strings, arrays, and maps (maps are called objects in JSON; the strings, arrays, and maps (maps are called objects in JSON; the
diagnostic notation extends JSON here by allowing any data item in diagnostic notation extends JSON here by allowing any data item in
the key position). Undefined is written >undefined< as in the key position). Undefined is written >undefined< as in
JavaScript. The non-finite floating-point numbers Infinity, JavaScript. The non-finite floating-point numbers Infinity,
-Infinity, and NaN are written exactly as in this sentence (this is -Infinity, and NaN are written exactly as in this sentence (this is
also a way they can be written in JavaScript, although JSON does not also a way they can be written in JavaScript, although JSON does not
allow them). A tag is written as an integer number for the tag allow them). A tag is written as an integer number for the tag
number, followed by the tag content in parentheses; for instance, an number, followed by the tag content in parentheses; for instance, an
RFC 3339 (ISO 8601) date could be notated as: RFC 3339 (ISO 8601) date could be notated as:
skipping to change at page 43, line 16 skipping to change at page 45, line 51
padding, enclosed in single quotes, prefixed by >h< for base16, >b32< padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
for base32, >h32< for base32hex, >b64< for base64 or base64url (the for base32, >h32< for base32hex, >b64< for base64 or base64url (the
actual encodings do not overlap, so the string remains unambiguous). actual encodings do not overlap, so the string remains unambiguous).
For example, the byte string 0x12345678 could be written h'12345678', For example, the byte string 0x12345678 could be written h'12345678',
b32'CI2FM6A', or b64'EjRWeA'. b32'CI2FM6A', or b64'EjRWeA'.
Unassigned simple values are given as "simple()" with the appropriate Unassigned simple values are given as "simple()" with the appropriate
integer in the parentheses. For example, "simple(42)" indicates integer in the parentheses. For example, "simple(42)" indicates
major type 7, value 42. major type 7, value 42.
A number of useful extensions to the diagnostic notation defined here
are provided in Appendix G of [RFC8610], "Extended Diagnostic
Notation" (EDN).
8.1. Encoding Indicators 8.1. Encoding Indicators
Sometimes it is useful to indicate in the diagnostic notation which Sometimes it is useful to indicate in the diagnostic notation which
of several alternative representations were actually used; for of several alternative representations were actually used; for
example, a data item written >1.5< by a diagnostic decoder might have example, a data item written >1.5< by a diagnostic decoder might have
been encoded as a half-, single-, or double-precision float. been encoded as a half-, single-, or double-precision float.
The convention for encoding indicators is that anything starting with The convention for encoding indicators is that anything starting with
an underscore and all following characters that are alphanumeric or an underscore and all following characters that are alphanumeric or
underscore, is an encoding indicator, and can be ignored by anyone underscore, is an encoding indicator, and can be ignored by anyone
not interested in this information. Encoding indicators are always not interested in this information. For example, "_" or "_3".
optional. Encoding indicators are always optional.
A single underscore can be written after the opening brace of a map A single underscore can be written after the opening brace of a map
or the opening bracket of an array to indicate that the data item was or the opening bracket of an array to indicate that the data item was
represented in indefinite-length format. For example, [_ 1, 2] represented in indefinite-length format. For example, [_ 1, 2]
contains an indicator that an indefinite-length representation was contains an indicator that an indefinite-length representation was
used to represent the data item [1, 2]. used to represent the data item [1, 2].
An underscore followed by a decimal digit n indicates that the An underscore followed by a decimal digit n indicates that the
preceding item (or, for arrays and maps, the item starting with the preceding item (or, for arrays and maps, the item starting with the
preceding bracket or brace) was encoded with an additional preceding bracket or brace) was encoded with an additional
information value of 24+n. For example, 1.5_1 is a half-precision information value of 24+n. For example, 1.5_1 is a half-precision
floating-point number, while 1.5_3 is encoded as double precision. floating-point number, while 1.5_3 is encoded as double precision.
This encoding indicator is not shown in Appendix A. (Note that the This encoding indicator is not shown in Appendix A. (Note that the
encoding indicator "_" is thus an abbreviation of the full form "_7", encoding indicator "_" is thus an abbreviation of the full form "_7",
which is not used.) which is not used.)
As a special case, byte and text strings of indefinite length can be Byte and text strings of indefinite length can be notated in the form
notated in the form (_ h'0123', h'4567') and (_ "foo", "bar"). (_ h'0123', h'4567') and (_ "foo", "bar").
9. IANA Considerations 9. IANA Considerations
IANA has created two registries for new CBOR values. The registries IANA has created two registries for new CBOR values. The registries
are separate, that is, not under an umbrella registry, and follow the are separate, that is, not under an umbrella registry, and follow the
rules in [RFC8126]. IANA has also assigned a new MIME media type and rules in [RFC8126]. IANA has also assigned a new MIME media type and
an associated Constrained Application Protocol (CoAP) Content-Format an associated Constrained Application Protocol (CoAP) Content-Format
entry. entry.
[To be removed by RFC editor:] IANA is requested to update these [To be removed by RFC editor:] IANA is requested to update these
registries to point to the present document instead of RFC 7049. registries to point to the present document instead of RFC 7049.
9.1. Simple Values Registry 9.1. Simple Values Registry
IANA has created the "Concise Binary Object Representation (CBOR) IANA has created the "Concise Binary Object Representation (CBOR)
Simple Values" registry at [IANA.cbor-simple-values]. The initial Simple Values" registry at [IANA.cbor-simple-values]. The initial
values are shown in Table 3. values are shown in Table 4.
New entries in the range 0 to 19 are assigned by Standards Action. New entries in the range 0 to 19 are assigned by Standards Action.
It is suggested that these Standards Actions allocate values starting It is suggested that these Standards Actions allocate values starting
with the number 16 in order to reserve the lower numbers for with the number 16 in order to reserve the lower numbers for
contiguous blocks (if any). contiguous blocks (if any).
New entries in the range 32 to 255 are assigned by Specification New entries in the range 32 to 255 are assigned by Specification
Required. Required.
9.2. Tags Registry 9.2. Tags Registry
IANA has created the "Concise Binary Object Representation (CBOR) IANA has created the "Concise Binary Object Representation (CBOR)
Tags" registry at [IANA.cbor-tags]. The tags that were defined in Tags" registry at [IANA.cbor-tags]. The tags that were defined in
[RFC7049] are described in detail in Section 3.4, but other tags have [RFC7049] are described in detail in Section 3.4, and other tags have
already been defined. already been defined.
New entries in the range 0 to 23 are assigned by Standards Action. New entries in the range 0 to 23 are assigned by Standards Action.
New entries in the range 24 to 255 are assigned by Specification New entries in the range 24 to 255 are assigned by Specification
Required. New entries in the range 256 to 18446744073709551615 are Required. New entries in the range 256 to 18446744073709551615 are
assigned by First Come First Served. The template for registration assigned by First Come First Served. The template for registration
requests is: requests is:
* Data item * Data item
skipping to change at page 45, line 8 skipping to change at page 47, line 46
In addition, First Come First Served requests should include: In addition, First Come First Served requests should include:
* Point of contact * Point of contact
* Description of semantics (URL) - This description is optional; the * Description of semantics (URL) - This description is optional; the
URL can point to something like an Internet-Draft or a web page. URL can point to something like an Internet-Draft or a web page.
9.3. Media Type ("MIME Type") 9.3. Media Type ("MIME Type")
The Internet media type [RFC6838] for a single encoded CBOR data item The Internet media type [RFC6838] for a single encoded CBOR data item
is application/cbor. is application/cbor, as defined in [IANA.media-types]:
Type name: application Type name: application
Subtype name: cbor Subtype name: cbor
Required parameters: n/a Required parameters: n/a
Optional parameters: n/a Optional parameters: n/a
Encoding considerations: binary Encoding considerations: binary
Security considerations: See Section 10 of this document Security considerations: See Section 10 of this document
Interoperability considerations: n/a Interoperability considerations: n/a
Published specification: This document Published specification: This document
skipping to change at page 45, line 29 skipping to change at page 48, line 17
Security considerations: See Section 10 of this document Security considerations: See Section 10 of this document
Interoperability considerations: n/a Interoperability considerations: n/a
Published specification: This document Published specification: This document
Applications that use this media type: None yet, but it is expected Applications that use this media type: None yet, but it is expected
that this format will be deployed in protocols and applications. that this format will be deployed in protocols and applications.
Additional information: Additional information: * Magic number(s): n/a
Magic number(s): n/a
File extension(s): .cbor
Macintosh file type code(s): n/a
Person & email address to contact for further information: * File extension(s): .cbor
Carsten Bormann
cabo@tzi.org * Macintosh file type code(s): n/a
Person & email address to contact for further information: IETF CBOR
Working Group cbor@ietf.org (mailto:cbor@ietf.org) or IETF
Applications and Real-Time Area art@ietf.org (mailto:art@ietf.org)
Intended usage: COMMON Intended usage: COMMON
Restrictions on usage: none Restrictions on usage: none
Author: Author: IETF CBOR Working Group cbor@ietf.org (mailto:cbor@ietf.org)
Carsten Bormann <cabo@tzi.org>
Change controller: Change controller: The IESG iesg@ietf.org (mailto:iesg@ietf.org)
The IESG <iesg@ietf.org>
9.4. CoAP Content-Format 9.4. CoAP Content-Format
The CoAP Content-Format for CBOR is defined in
[IANA.core-parameters]:
Media Type: application/cbor Media Type: application/cbor
Encoding: - Encoding: -
Id: 60 Id: 60
Reference: [RFCthis] Reference: [RFCthis]
9.5. The +cbor Structured Syntax Suffix Registration 9.5. The +cbor Structured Syntax Suffix Registration
The Structured Syntax Suffix [RFC6838] for media types based on a
single encoded CBOR data item is +cbor, as defined in
[IANA.media-type-structured-suffix]:
Name: Concise Binary Object Representation (CBOR) Name: Concise Binary Object Representation (CBOR)
+suffix: +cbor +suffix: +cbor
References: [RFCthis] References: [RFCthis]
Encoding Considerations: CBOR is a binary format. Encoding Considerations: CBOR is a binary format.
Interoperability Considerations: n/a Interoperability Considerations: n/a
Fragment Identifier Considerations: Fragment Identifier Considerations: The syntax and semantics of
The syntax and semantics of fragment identifiers specified for fragment identifiers specified for +cbor SHOULD be as specified
+cbor SHOULD be as specified for "application/cbor". (At for "application/cbor". (At publication of this document, there
publication of this document, there is no fragment identification is no fragment identification syntax defined for "application/
syntax defined for "application/cbor".) cbor".)
The syntax and semantics for fragment identifiers for a specific The syntax and semantics for fragment identifiers for a specific
"xxx/yyy+cbor" SHOULD be processed as follows: "xxx/yyy+cbor" SHOULD be processed as follows:
For cases defined in +cbor, where the fragment identifier resolves * For cases defined in +cbor, where the fragment identifier
per the +cbor rules, then process as specified in +cbor. resolves per the +cbor rules, then process as specified in
+cbor.
For cases defined in +cbor, where the fragment identifier does * For cases defined in +cbor, where the fragment identifier does
not resolve per the +cbor rules, then process as specified in not resolve per the +cbor rules, then process as specified in
"xxx/yyy+cbor". "xxx/yyy+cbor".
For cases not defined in +cbor, then process as specified in * For cases not defined in +cbor, then process as specified in
"xxx/yyy+cbor". "xxx/yyy+cbor".
Security Considerations: See Section 10 of this document Security Considerations: See Section 10 of this document
Contact: Contact: IETF CBOR Working Group cbor@ietf.org
Apps Area Working Group (apps-discuss@ietf.org) (mailto:cbor@ietf.org) or IETF Applications and Real-Time Area
art@ietf.org (mailto:art@ietf.org)
Author/Change Controller: Author/Change Controller: The IESG iesg@ietf.org
The Apps Area Working Group. (mailto:iesg@ietf.org)
The IESG has change control over this registration. // Editors' note: RFC 6838 has a template
field Author/Change
// controller, the descriptive text of
which makes clear that this is
// the change controller, not the author.
Go figure. There is no
// separate author entry as in the media
types registry. (RFC
// editor: Please remove this note before
publication.)
10. Security Considerations 10. Security Considerations
A network-facing application can exhibit vulnerabilities in its A network-facing application can exhibit vulnerabilities in its
processing logic for incoming data. Complex parsers are well known processing logic for incoming data. Complex parsers are well known
as a likely source of such vulnerabilities, such as the ability to as a likely source of such vulnerabilities, such as the ability to
remotely crash a node, or even remotely execute arbitrary code on it. remotely crash a node, or even remotely execute arbitrary code on it.
CBOR attempts to narrow the opportunities for introducing such CBOR attempts to narrow the opportunities for introducing such
vulnerabilities by reducing parser complexity, by giving the entire vulnerabilities by reducing parser complexity, by giving the entire
range of encodable values a meaning where possible. range of encodable values a meaning where possible.
skipping to change at page 50, line 19 skipping to change at page 53, line 26
[ASN.1] International Telecommunication Union, "Information [ASN.1] International Telecommunication Union, "Information
Technology -- ASN.1 encoding rules: Specification of Basic Technology -- ASN.1 encoding rules: Specification of Basic
Encoding Rules (BER), Canonical Encoding Rules (CER) and Encoding Rules (BER), Canonical Encoding Rules (CER) and
Distinguished Encoding Rules (DER)", ITU-T Recommendation Distinguished Encoding Rules (DER)", ITU-T Recommendation
X.690, 1994. X.690, 1994.
[BSON] Various, "BSON - Binary JSON", 2013, [BSON] Various, "BSON - Binary JSON", 2013,
<http://bsonspec.org/>. <http://bsonspec.org/>.
[I-D.ietf-cbor-sequence]
Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", Work in Progress, Internet-Draft, draft-ietf-
cbor-sequence-02, 25 September 2019, <http://www.ietf.org/
internet-drafts/draft-ietf-cbor-sequence-02.txt>.
[IANA.cbor-simple-values] [IANA.cbor-simple-values]
IANA, "Concise Binary Object Representation (CBOR) Simple IANA, "Concise Binary Object Representation (CBOR) Simple
Values", Values",
<http://www.iana.org/assignments/cbor-simple-values>. <http://www.iana.org/assignments/cbor-simple-values>.
[IANA.cbor-tags] [IANA.cbor-tags]
IANA, "Concise Binary Object Representation (CBOR) Tags", IANA, "Concise Binary Object Representation (CBOR) Tags",
<http://www.iana.org/assignments/cbor-tags>. <http://www.iana.org/assignments/cbor-tags>.
[IANA.core-parameters]
IANA, "Constrained RESTful Environments (CoRE)
Parameters",
<http://www.iana.org/assignments/core-parameters>.
[IANA.media-type-structured-suffix]
IANA, "Structured Syntax Suffix Registry",
<http://www.iana.org/assignments/media-type-structured-
suffix>.
[IANA.media-types]
IANA, "Media Types",
<http://www.iana.org/assignments/media-types>.
[MessagePack] [MessagePack]
Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>. Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>.
[PCRE] Ho, A., "PCRE - Perl Compatible Regular Expressions", [PCRE] Ho, A., "PCRE - Perl Compatible Regular Expressions",
2018, <http://www.pcre.org/>. 2018, <http://www.pcre.org/>.
[RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission
Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
<https://www.rfc-editor.org/info/rfc713>. <https://www.rfc-editor.org/info/rfc713>.
skipping to change at page 51, line 14 skipping to change at page 54, line 30
[RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for
Constrained-Node Networks", RFC 7228, Constrained-Node Networks", RFC 7228,
DOI 10.17487/RFC7228, May 2014, DOI 10.17487/RFC7228, May 2014,
<https://www.rfc-editor.org/info/rfc7228>. <https://www.rfc-editor.org/info/rfc7228>.
[RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493,
DOI 10.17487/RFC7493, March 2015, DOI 10.17487/RFC7493, March 2015,
<https://www.rfc-editor.org/info/rfc7493>. <https://www.rfc-editor.org/info/rfc7493>.
[RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary",
RFC 7991, DOI 10.17487/RFC7991, December 2016,
<https://www.rfc-editor.org/info/rfc7991>.
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
Interchange Format", STD 90, RFC 8259, Interchange Format", STD 90, RFC 8259,
DOI 10.17487/RFC8259, December 2017, DOI 10.17487/RFC8259, December 2017,
<https://www.rfc-editor.org/info/rfc8259>. <https://www.rfc-editor.org/info/rfc8259>.
[RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
Definition Language (CDDL): A Notational Convention to
Express Concise Binary Object Representation (CBOR) and
JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
June 2019, <https://www.rfc-editor.org/info/rfc8610>.
[RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T., [RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T.,
and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS
Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September
2019, <https://www.rfc-editor.org/info/rfc8618>. 2019, <https://www.rfc-editor.org/info/rfc8618>.
[RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
<https://www.rfc-editor.org/info/rfc8742>.
[RFC8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020,
<https://www.rfc-editor.org/info/rfc8746>.
[rfc8746] Bormann, C., Ed., "Concise Binary Object Representation
(CBOR) Tags for Typed Arrays", RFC 8746,
DOI 10.17487/RFC8746, February 2020,
<https://www.rfc-editor.org/info/rfc8746>.
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture Input PRF", DOI 10.1007/978-3-642-34931-7_28, Lecture
Notes in Computer Science pp. 489-508, 2012, Notes in Computer Science pp. 489-508, 2012,
<https://doi.org/10.1007/978-3-642-34931-7_28>. <https://doi.org/10.1007/978-3-642-34931-7_28>.
[YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
Language (YAML[TM]) Version 1.2", 3rd Edition, October Language (YAML[TM]) Version 1.2", 3rd Edition, October
2009, <http://www.yaml.org/spec/1.2/spec.html>. 2009, <http://www.yaml.org/spec/1.2/spec.html>.
Appendix A. Examples Appendix A. Examples
skipping to change at page 55, line 35 skipping to change at page 59, line 25
| 17, 18, 19, 20, 21, 22, 23, | | | 17, 18, 19, 20, 21, 22, 23, | |
| 24, 25] | | | 24, 25] | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | | {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | ["a", {_ "b": "c"}] | 0x826161bf61626163ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
| {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 5: Examples of Encoded CBOR Data Items Table 6: Examples of Encoded CBOR Data Items
Appendix B. Jump Table Appendix B. Jump Table
For brevity, this jump table does not show initial bytes that are For brevity, this jump table does not show initial bytes that are
reserved for future extension. It also only shows a selection of the reserved for future extension. It also only shows a selection of the
initial bytes that can be used for optional features. (All unsigned initial bytes that can be used for optional features. (All unsigned
integers are in network byte order.) integers are in network byte order.)
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| Byte | Structure/Semantics | | Byte | Structure/Semantics |
skipping to change at page 58, line 42 skipping to change at page 62, line 32
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xf9 | Half-Precision Float (two-byte IEEE 754) | | 0xf9 | Half-Precision Float (two-byte IEEE 754) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xfa | Single-Precision Float (four-byte IEEE 754) | | 0xfa | Single-Precision Float (four-byte IEEE 754) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xfb | Double-Precision Float (eight-byte IEEE 754) | | 0xfb | Double-Precision Float (eight-byte IEEE 754) |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
| 0xff | "break" stop code | | 0xff | "break" stop code |
+------------+------------------------------------------------+ +------------+------------------------------------------------+
Table 6: Jump Table for Initial Byte Table 7: Jump Table for Initial Byte
Appendix C. Pseudocode Appendix C. Pseudocode
The well-formedness of a CBOR item can be checked by the pseudocode The well-formedness of a CBOR item can be checked by the pseudocode
in Figure 1. The data is well-formed if and only if: in Figure 1. The data is well-formed if and only if:
* the pseudocode does not "fail"; * the pseudocode does not "fail";
* after execution of the pseudocode, no bytes are left in the input * after execution of the pseudocode, no bytes are left in the input
(except in streaming applications) (except in streaming applications)
The pseudocode has the following prerequisites: The pseudocode has the following prerequisites:
* take(n) reads n bytes from the input data and returns them as a * take(n) reads n bytes from the input data and returns them as a
byte string. If n bytes are no longer available, take(n) fails. byte string. If n bytes are no longer available, take(n) fails.
* uint() converts a byte string into an unsigned integer by * uint() converts a byte string into an unsigned integer by
interpreting the byte string in network byte order. interpreting the byte string in network byte order.
skipping to change at page 64, line 34 skipping to change at page 68, line 34
Message Services Data Transmission (MSDTP) is a very early example of Message Services Data Transmission (MSDTP) is a very early example of
a compact message format; it is described in [RFC0713], written in a compact message format; it is described in [RFC0713], written in
1976. It is included here for its historical value, not because it 1976. It is included here for its historical value, not because it
was ever widely used. was ever widely used.
E.5. Conciseness on the Wire E.5. Conciseness on the Wire
While CBOR's design objective of code compactness for encoders and While CBOR's design objective of code compactness for encoders and
decoders is a higher priority than its objective of conciseness on decoders is a higher priority than its objective of conciseness on
the wire, many people focus on the wire size. Table 7 shows some the wire, many people focus on the wire size. Table 8 shows some
encoding examples for the simple nested array [1, [2, 3]]; where some encoding examples for the simple nested array [1, [2, 3]]; where some
form of indefinite-length encoding is supported by the encoding, form of indefinite-length encoding is supported by the encoding,
[_ 1, [2, 3]] (indefinite length on the outer array) is also shown. [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| Format | [1, [2, 3]] | [_ 1, [2, 3]] | | Format | [1, [2, 3]] | [_ 1, [2, 3]] |
+=============+============================+================+ +=============+============================+================+
| RFC 713 | c2 05 81 c2 02 82 83 | | | RFC 713 | c2 05 81 c2 02 82 83 | |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 | | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 |
skipping to change at page 65, line 25 skipping to change at page 69, line 25
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| BSON | 22 00 00 00 10 30 00 01 00 | | | BSON | 22 00 00 00 10 30 00 01 00 | |
| | 00 00 04 31 00 13 00 00 00 | | | | 00 00 04 31 00 13 00 00 00 | |
| | 10 30 00 02 00 00 00 10 31 | | | | 10 30 00 02 00 00 00 10 31 | |
| | 00 03 00 00 00 00 00 | | | | 00 03 00 00 00 00 00 | |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
| CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 |
| | | ff | | | | ff |
+-------------+----------------------------+----------------+ +-------------+----------------------------+----------------+
Table 7: Examples for Different Levels of Conciseness Table 8: Examples for Different Levels of Conciseness
Appendix F. Changes from RFC 7049 Appendix F. Changes from RFC 7049
The following is a list of known changes from RFC 7049. This list is The following is a list of known changes from RFC 7049. This list is
non-authoritative. It is meant to help reviewers see the significant non-authoritative. It is meant to help reviewers see the significant
differences. differences.
* Updated reference for [RFC4627] to [RFC8259] in many places * Made some use of new RFCXML functionality [RFC7991]
* Updated reference for [CNN-TERMS] to [RFC7228] * Updated references, e.g. for [RFC4627] to [RFC8259] in many
places, for [CNN-TERMS] to [RFC7228]; added missing reference to
[IEEE754] and updated to [ECMA262]
* Added a comment to the last example in Section 2.2.1 (added * Fixed errata: in the example in Section 2.4.2 ("29" -> "49"), and
in the last paragraph of Section 3.6 ("0b000_11101" ->
"0b000_11001")
* Added a comment to the last example in Section 3.2.2 (added
"Second value") "Second value")
* Fixed a bug in the example in Section 2.4.2 ("29" -> "49") * Applied numerous small editorial changes
* Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> * Added a few tables for illustration
"0b000_11001")
* More stringently used terminology for well-formed and valid data,
avoiding less well-defined alternative terms such as "syntax
error", "decoding error" and "strict mode" outside examples
* Streamlined terminology to talk about tags, tag numbers, and tag
content
* Clarified the restrictions on tag content, in general and
specifically for tag 1
* Added text about the CBOR data model and its small variations
(basic generic, extended generic, specific)
* More clearly separated integers from floating-point values;
provided a suggestion (based on I-JSON [RFC7493]) for handling
these types when converting JSON to CBOR
* Added term "preferred serialization" and defined it for various
kinds of data items
* Added comment about tags with semantics that depend on
serialization order
* Defined "deterministic encoding", making use of "preferred
serialization", and simplified the suggested map ordering for the
"Core Deterministic Encoding Requirements", easing implementation,
while keeping RFC 7049 map ordering as an alternative "length-
first map key ordering"; now avoiding the terms "canonical" and
"canonicalization"
* Clarified map validity (handling of duplicate keys) and explained
the domain of applicability of certain implementation choices
* Updated IANA considerations
* Added security considerations
* Clarified handling of non-well-formed simple values in text and
pseudocode
* Added Appendix G, well-formedness errors and examples
* Removed UBJSON from Appendix E, as that format has completely
changed since RFC 7049; added reference to [RFC8618]
Appendix G. Well-formedness errors and examples Appendix G. Well-formedness errors and examples
There are three basic kinds of well-formedness errors that can occur There are three basic kinds of well-formedness errors that can occur
in decoding a CBOR data item: in decoding a CBOR data item:
* Too much data: There are input bytes left that were not consumed. * Too much data: There are input bytes left that were not consumed.
This is only an error if the application assumed that the input This is only an error if the application assumed that the input
bytes would span exactly one data item. Where the application bytes would span exactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in additional data after the data item, as is for example done in
CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can CBOR sequences [RFC8742], the CBOR decoder can simply indicate
simply indicate what part of the input has not been consumed. what part of the input has not been consumed.
* Too little data: The input data available would need additional * Too little data: The input data available would need additional
bytes added at their end for a complete CBOR data item. This may bytes added at their end for a complete CBOR data item. This may
indicate the input is truncated; it is also a common error when indicate the input is truncated; it is also a common error when
trying to decode random data as CBOR. For some applications trying to decode random data as CBOR. For some applications
however, this may not be actually be an error, as the application however, this may not actually be an error, as the application may
may not be certain it has all the data yet and can obtain or wait not be certain it has all the data yet and can obtain or wait for
for additional input bytes. Some of these applications may have additional input bytes. Some of these applications may have an
an upper limit for how much additional data can show up; here the upper limit for how much additional data can show up; here the
decoder may be able to indicate that the encoded CBOR data item decoder may be able to indicate that the encoded CBOR data item
cannot be completed within this limit. cannot be completed within this limit.
* Syntax error: The input data are not consistent with the * Syntax error: The input data are not consistent with the
requirements of the CBOR encoding, and this cannot be remedied by requirements of the CBOR encoding, and this cannot be remedied by
adding (or removing) data at the end. adding (or removing) data at the end.
In Appendix C, errors of the first kind are addressed in the first In Appendix C, errors of the first kind are addressed in the first
paragraph/bullet list (requiring "no bytes are left"), and errors of paragraph/bullet list (requiring "no bytes are left"), and errors of
the second kind are addressed in the second paragraph/bullet list the second kind are addressed in the second paragraph/bullet list
skipping to change at page 67, line 31 skipping to change at page 72, line 31
00 00, fb 00 00 00 00 00, fb 00 00 00
* Definite length strings with short data: 41, 61, 5a ff ff ff ff * Definite length strings with short data: 41, 61, 5a ff ff ff ff
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f 00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
ff ff ff ff ff ff ff 01 02 03 ff ff ff ff ff ff ff 01 02 03
* Definite length maps and arrays not closed with enough items: 81, * Definite length maps and arrays not closed with enough items: 81,
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
00 00
* Tag number not followed by tag content: c0
* Indefinite length strings not closed by a break stop code: 5f 41 * Indefinite length strings not closed by a break stop code: 5f 41
00, 7f 61 00 00, 7f 61 00
* Indefinite length maps and arrays not closed by a break stop code: * Indefinite length maps and arrays not closed by a break stop code:
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
A few examples for the five subkinds of well-formedness error kind 3 A few examples for the five subkinds of well-formedness error kind 3
(syntax error) are shown below. (syntax error) are shown below.
 End of changes. 135 change blocks. 
356 lines changed or deleted 600 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/