draft-ietf-cbor-7049bis-07.txt   draft-ietf-cbor-7049bis-08.txt 
Network Working Group C. Bormann Network Working Group C. Bormann
Internet-Draft Universitaet Bremen TZI Internet-Draft Universitaet Bremen TZI
Intended status: Standards Track P. Hoffman Obsoletes: 7049 (if approved) P. Hoffman
Expires: February 26, 2020 ICANN Intended status: Standards Track ICANN
August 25, 2019 Expires: May 8, 2020 November 05, 2019
Concise Binary Object Representation (CBOR) Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-07 draft-ietf-cbor-7049bis-08
Abstract Abstract
The Concise Binary Object Representation (CBOR) is a data format The Concise Binary Object Representation (CBOR) is a data format
whose design goals include the possibility of extremely small code whose design goals include the possibility of extremely small code
size, fairly small message size, and extensibility without the need size, fairly small message size, and extensibility without the need
for version negotiation. These design goals make it different from for version negotiation. These design goals make it different from
earlier binary serializations such as ASN.1 and MessagePack. earlier binary serializations such as ASN.1 and MessagePack.
This document is a revised edition of RFC 7049, with editorial This document is a revised edition of RFC 7049, with editorial
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 26, 2020. This Internet-Draft will expire on May 8, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 33 skipping to change at page 2, line 33
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7 2. CBOR Data Models . . . . . . . . . . . . . . . . . . . . . . 7
2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8 2.1. Extended Generic Data Models . . . . . . . . . . . . . . 8
2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9 2.2. Specific Data Models . . . . . . . . . . . . . . . . . . 9
3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9 3. Specification of the CBOR Encoding . . . . . . . . . . . . . 9
3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 10 3.1. Major Types . . . . . . . . . . . . . . . . . . . . . . . 11
3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13 3.2. Indefinite Lengths for Some Major Types . . . . . . . . . 13
3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13 3.2.1. The "break" Stop Code . . . . . . . . . . . . . . . . 13
3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 13 3.2.2. Indefinite-Length Arrays and Maps . . . . . . . . . . 14
3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 15 3.2.3. Indefinite-Length Byte Strings and Text Strings . . . 16
3.3. Floating-Point Numbers and Values with No Content . . . . 16 3.3. Floating-Point Numbers and Values with No Content . . . . 16
3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 17 3.4. Tagging of Items . . . . . . . . . . . . . . . . . . . . 18
3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 19 3.4.1. Date and Time . . . . . . . . . . . . . . . . . . . . 20
3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 19 3.4.2. Standard Date/Time String . . . . . . . . . . . . . . 20
3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 20 3.4.3. Epoch-based Date/Time . . . . . . . . . . . . . . . . 21
3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.4. Bignums . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 21 3.4.5. Decimal Fractions and Bigfloats . . . . . . . . . . . 22
3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 23 3.4.6. Content Hints . . . . . . . . . . . . . . . . . . . . 24
3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 23 3.4.6.1. Encoded CBOR Data Item . . . . . . . . . . . . . 24
3.4.6.2. Expected Later Encoding for CBOR-to-JSON 3.4.6.2. Expected Later Encoding for CBOR-to-JSON
Converters . . . . . . . . . . . . . . . . . . . 23 Converters . . . . . . . . . . . . . . . . . . . 24
3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 24 3.4.6.3. Encoded Text . . . . . . . . . . . . . . . . . . 25
3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 25 3.4.7. Self-Described CBOR . . . . . . . . . . . . . . . . . 26
4. Serialization Considerations . . . . . . . . . . . . . . . . 25 4. Serialization Considerations . . . . . . . . . . . . . . . . 26
4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 25 4.1. Preferred Serialization . . . . . . . . . . . . . . . . . 26
4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 26 4.2. Deterministically Encoded CBOR . . . . . . . . . . . . . 27
4.2.1. Core Deterministic Encoding Requirements . . . . . . 26 4.2.1. Core Deterministic Encoding Requirements . . . . . . 27
4.2.2. Additional Deterministic Encoding Considerations . . 27 4.2.2. Additional Deterministic Encoding Considerations . . 28
4.2.3. Length-first map key ordering . . . . . . . . . . . . 28 4.2.3. Length-first map key ordering . . . . . . . . . . . . 30
5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 29 5. Creating CBOR-Based Protocols . . . . . . . . . . . . . . . . 31
5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 30 5.1. CBOR in Streaming Applications . . . . . . . . . . . . . 31
5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 31 5.2. Generic Encoders and Decoders . . . . . . . . . . . . . . 32
5.3. Invalid Items . . . . . . . . . . . . . . . . . . . . . . 31 5.3. Validity of Items . . . . . . . . . . . . . . . . . . . . 32
5.4. Handling Unknown Simple Values and Tags . . . . . . . . . 32 5.3.1. Basic validity . . . . . . . . . . . . . . . . . . . 33
5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3.2. Tag validity . . . . . . . . . . . . . . . . . . . . 33
5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 33 5.4. Handling Unknown Simple Values and Tag numbers . . . . . 33
5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 34 5.5. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 35 5.6. Specifying Keys for Maps . . . . . . . . . . . . . . . . 35
5.8. Strict Decoding Mode . . . . . . . . . . . . . . . . . . 35 5.6.1. Equivalence of Keys . . . . . . . . . . . . . . . . . 36
6. Converting Data between CBOR and JSON . . . . . . . . . . . . 37 5.7. Undefined Values . . . . . . . . . . . . . . . . . . . . 37
6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 37 5.8. Validity Checking and Robustness . . . . . . . . . . . . 37
6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 38 6. Converting Data between CBOR and JSON . . . . . . . . . . . . 38
7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 39 6.1. Converting from CBOR to JSON . . . . . . . . . . . . . . 38
7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 40 6.2. Converting from JSON to CBOR . . . . . . . . . . . . . . 40
7.2. Curating the Additional Information Space . . . . . . . . 40 7. Future Evolution of CBOR . . . . . . . . . . . . . . . . . . 41
8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 41 7.1. Extension Points . . . . . . . . . . . . . . . . . . . . 41
8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 42 7.2. Curating the Additional Information Space . . . . . . . . 42
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 8. Diagnostic Notation . . . . . . . . . . . . . . . . . . . . . 42
9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 43 8.1. Encoding Indicators . . . . . . . . . . . . . . . . . . . 43
9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 43 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44
9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 43 9.1. Simple Values Registry . . . . . . . . . . . . . . . . . 44
9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 44 9.2. Tags Registry . . . . . . . . . . . . . . . . . . . . . . 44
9.5. The +cbor Structured Syntax Suffix Registration . . . . . 45 9.3. Media Type ("MIME Type") . . . . . . . . . . . . . . . . 45
10. Security Considerations . . . . . . . . . . . . . . . . . . . 45 9.4. CoAP Content-Format . . . . . . . . . . . . . . . . . . . 46
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 9.5. The +cbor Structured Syntax Suffix Registration . . . . . 46
11.1. Normative References . . . . . . . . . . . . . . . . . . 47 10. Security Considerations . . . . . . . . . . . . . . . . . . . 47
11.2. Informative References . . . . . . . . . . . . . . . . . 48 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 49
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 50 11.1. Normative References . . . . . . . . . . . . . . . . . . 49
Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 54 11.2. Informative References . . . . . . . . . . . . . . . . . 50
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 57 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 53
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 59 Appendix B. Jump Table . . . . . . . . . . . . . . . . . . . . . 57
Appendix C. Pseudocode . . . . . . . . . . . . . . . . . . . . . 60
Appendix D. Half-Precision . . . . . . . . . . . . . . . . . . . 62
Appendix E. Comparison of Other Binary Formats to CBOR's Design Appendix E. Comparison of Other Binary Formats to CBOR's Design
Objectives . . . . . . . . . . . . . . . . . . . . . 60 Objectives . . . . . . . . . . . . . . . . . . . . . 63
E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 61 E.1. ASN.1 DER, BER, and PER . . . . . . . . . . . . . . . . . 64
E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 61 E.2. MessagePack . . . . . . . . . . . . . . . . . . . . . . . 64
E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 62 E.3. BSON . . . . . . . . . . . . . . . . . . . . . . . . . . 65
E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 62 E.4. MSDTP: RFC 713 . . . . . . . . . . . . . . . . . . . . . 65
E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 62 E.5. Conciseness on the Wire . . . . . . . . . . . . . . . . . 65
Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 63 Appendix F. Changes from RFC 7049 . . . . . . . . . . . . . . . 66
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63 Appendix G. Well-formedness errors and examples . . . . . . . . 66
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 G.1. Examples for CBOR data items that are not well-formed . . 67
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 69
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 70
1. Introduction 1. Introduction
There are hundreds of standardized formats for binary representation There are hundreds of standardized formats for binary representation
of structured data (also known as binary serialization formats). Of of structured data (also known as binary serialization formats). Of
those, some are for specific domains of information, while others are those, some are for specific domains of information, while others are
generalized for arbitrary data. In the IETF, probably the best-known generalized for arbitrary data. In the IETF, probably the best-known
formats in the latter category are ASN.1's BER and DER [ASN.1]. formats in the latter category are ASN.1's BER and DER [ASN.1].
The format defined here follows some specific design goals that are The format defined here follows some specific design goals that are
skipping to change at page 6, line 29 skipping to change at page 6, line 31
The term "byte" is used in its now-customary sense as a synonym for The term "byte" is used in its now-customary sense as a synonym for
"octet". All multi-byte values are encoded in network byte order "octet". All multi-byte values are encoded in network byte order
(that is, most significant byte first, also known as "big-endian"). (that is, most significant byte first, also known as "big-endian").
This specification makes use of the following terminology: This specification makes use of the following terminology:
Data item: A single piece of CBOR data. The structure of a data Data item: A single piece of CBOR data. The structure of a data
item may contain zero, one, or more nested data items. The term item may contain zero, one, or more nested data items. The term
is used both for the data item in representation format and for is used both for the data item in representation format and for
the abstract idea that can be derived from that by a decoder. the abstract idea that can be derived from that by a decoder; the
former can be addressed specifically by using "encoded data item".
Decoder: A process that decodes a well-formed CBOR data item and Decoder: A process that decodes a well-formed CBOR data item and
makes it available to an application. Formally speaking, a makes it available to an application. Formally speaking, a
decoder contains a parser to break up the input using the syntax decoder contains a parser to break up the input using the syntax
rules of CBOR, as well as a semantic processor to prepare the data rules of CBOR, as well as a semantic processor to prepare the data
in a form suitable to the application. in a form suitable to the application.
Encoder: A process that generates the representation format of a Encoder: A process that generates the representation format of a
CBOR data item from application information. CBOR data item from application information.
skipping to change at page 6, line 49 skipping to change at page 7, line 4
Data Stream: A sequence of zero or more data items, not further Data Stream: A sequence of zero or more data items, not further
assembled into a larger containing data item. The independent assembled into a larger containing data item. The independent
data items that make up a data stream are sometimes also referred data items that make up a data stream are sometimes also referred
to as "top-level data items". to as "top-level data items".
Well-formed: A data item that follows the syntactic structure of Well-formed: A data item that follows the syntactic structure of
CBOR. A well-formed data item uses the initial bytes and the byte CBOR. A well-formed data item uses the initial bytes and the byte
strings and/or data items that are implied by their values as strings and/or data items that are implied by their values as
defined in CBOR and does not include following extraneous data. defined in CBOR and does not include following extraneous data.
CBOR decoders by definition only return contents from well-formed CBOR decoders by definition only return contents from well-formed
data items. data items.
Valid: A data item that is well-formed and also follows the semantic Valid: A data item that is well-formed and also follows the semantic
restrictions that apply to CBOR data items. restrictions that apply to CBOR data items.
Expected: Besides its normal English meaning, the term "expected" is
used to describe requirements beyond CBOR validity that an
application has on its input data. Well-formed (processable at
all), valid (checked by a valdity-checking generic decoder), and
expected (checked by the application) form a hierarchy of layers
of acceptability.
Stream decoder: A process that decodes a data stream and makes each Stream decoder: A process that decodes a data stream and makes each
of the data items in the sequence available to an application as of the data items in the sequence available to an application as
they are received. they are received.
Where bit arithmetic or data types are explained, this document uses Where bit arithmetic or data types are explained, this document uses
the notation familiar from the programming language C, except that the notation familiar from the programming language C, except that
"**" denotes exponentiation. Similar to the "0x" notation for "**" denotes exponentiation. Similar to the "0x" notation for
hexadecimal numbers, numbers in binary notation are prefixed with hexadecimal numbers, numbers in binary notation are prefixed with
"0b". Underscores can be added to such a number solely for "0b". Underscores can be added to such a number solely for
readability, so 0b00100001 (0x21) might be written 0b001_00001 to readability, so 0b00100001 (0x21) might be written 0b001_00001 to
skipping to change at page 12, line 13 skipping to change at page 12, line 22
pairs) followed by the 18 remaining items. The first item is the pairs) followed by the 18 remaining items. The first item is the
first key, the second item is the first value, the third item is first key, the second item is the first value, the third item is
the second key, and so on. Because items in a map come in pairs, the second key, and so on. Because items in a map come in pairs,
their total number is always even: A map that contains an odd their total number is always even: A map that contains an odd
number of items (no value data present after the last key data number of items (no value data present after the last key data
item) is not well-formed. A map that has duplicate keys may be item) is not well-formed. A map that has duplicate keys may be
well-formed, but it is not valid, and thus it causes indeterminate well-formed, but it is not valid, and thus it causes indeterminate
decoding; see also Section 5.6. decoding; see also Section 5.6.
Major type 6: a tagged data item ("tag") whose tag number is the Major type 6: a tagged data item ("tag") whose tag number is the
argument and whose enclosed data item is the single encoded data argument and whose enclosed data item ("tag content") is the
item that follows the head. See Section 3.4. single encoded data item that follows the head. See Section 3.4.
Major type 7: floating-point numbers and simple values, as well as Major type 7: floating-point numbers and simple values, as well as
the "break" stop code. See Section 3.3. the "break" stop code. See Section 3.3.
These eight major types lead to a simple table showing which of the These eight major types lead to a simple table showing which of the
256 possible values for the initial byte of a data item are used 256 possible values for the initial byte of a data item are used
(Table 6). (Table 6).
In major types 6 and 7, many of the possible values are reserved for In major types 6 and 7, many of the possible values are reserved for
future specification. See Section 9 for more information on these future specification. See Section 9 for more information on these
skipping to change at page 18, line 12 skipping to change at page 19, line 12
This would be marked as 0b110_00010 (major type 6, additional This would be marked as 0b110_00010 (major type 6, additional
information 2 for the tag number) followed by 0b010_01100 (major type information 2 for the tag number) followed by 0b010_01100 (major type
2, additional information of 12 for the length) followed by the 12 2, additional information of 12 for the length) followed by the 12
bytes of the bignum. bytes of the bignum.
Decoders do not need to understand tags of every tag number, and tags Decoders do not need to understand tags of every tag number, and tags
may be of little value in applications where the implementation may be of little value in applications where the implementation
creating a particular CBOR data item and the implementation decoding creating a particular CBOR data item and the implementation decoding
that stream know the semantic meaning of each item in the data flow. that stream know the semantic meaning of each item in the data flow.
Their primary purpose in this specification is to define common data Their primary purpose in this specification is to define common data
types such as dates. A secondary purpose is to allow optional types such as dates. A secondary purpose is to provide conversion
tagging when the decoder is a generic CBOR decoder that might be able hints when it is foreseen that the CBOR data item needs to be
to benefit from hints about the content of items. Understanding the translated into a different format, requiring hints about the content
semantic tags is optional for a decoder; it can just jump over the of items. Understanding the semantics of tags is optional for a
initial bytes of the tag and interpret the tagged data item itself. decoder; it can just jump over the initial bytes of the tag (that
encode the tag number) and interpret the tag content itself,
presenting both tag number and tag content to the application.
A tag applies semantics to the data item it encloses. Thus, if tag A A tag applies semantics to the data item it encloses. Thus, if tag A
encloses tag B, which encloses data item C, tag A applies to the encloses tag B, which encloses data item C, tag A applies to the
result of applying tag B on data item C. That is, a tagged item is a result of applying tag B on data item C. That is, a tag is a data
data item consisting of a tag number and an enclosed value. The item consisting of a tag number and an enclosed value. The content
content of the tagged item (the enclosed data item) is the data item of the tag (the enclosed data item) is the data item (the value) that
(the value) that is being tagged. is being tagged.
IANA maintains a registry of tag numbers as described in Section 9.2. IANA maintains a registry of tag numbers as described in Section 9.2.
Table 4 provides a list of tag numbers that were defined in Table 4 provides a list of tag numbers that were defined in
[RFC7049], with definitions in the rest of this section. Note that [RFC7049], with definitions in the rest of this section. Note that
many other tag numbers have been defined since the publication of many other tag numbers have been defined since the publication of
[RFC7049]; see the registry described at Section 9.2 for the complete [RFC7049]; see the registry described at Section 9.2 for the complete
list. list.
+----------+----------+---------------------------------------------+ +----------+----------+---------------------------------------------+
| Tag | Data | Semantics | | Tag | Data | Semantics |
skipping to change at page 20, line 33 skipping to change at page 21, line 33
64-bit integers for the enclosed value. 64-bit integers for the enclosed value.
Negative values (major type 1 and negative floating-point numbers) Negative values (major type 1 and negative floating-point numbers)
are interpreted as determined by the application requirements as are interpreted as determined by the application requirements as
there is no universal standard for UTC count-of-seconds time before there is no universal standard for UTC count-of-seconds time before
1970-01-01T00:00Z (this is particularly true for points in time that 1970-01-01T00:00Z (this is particularly true for points in time that
precede discontinuities in national calendars). The same applies to precede discontinuities in national calendars). The same applies to
non-finite values. non-finite values.
To indicate fractional seconds, floating-point values can be used To indicate fractional seconds, floating-point values can be used
within Tag number 1 instead of integer values. Note that this within tag number 1 instead of integer values. Note that this
generally requires binary64 support, as binary16 and binary32 provide generally requires binary64 support, as binary16 and binary32 provide
non-zero fractions of seconds only for a short period of time around non-zero fractions of seconds only for a short period of time around
early 1970. An application that requires Tag number 1 support may early 1970. An application that requires tag number 1 support may
restrict the enclosed value to be an integer (or a floating-point restrict the enclosed value to be an integer (or a floating-point
value) only. value) only.
3.4.4. Bignums 3.4.4. Bignums
Protocols using tag numbers 2 and 3 extend the generic data model Protocols using tag numbers 2 and 3 extend the generic data model
(Section 2) with "bignums" representing arbitrarily sized integers. (Section 2) with "bignums" representing arbitrarily sized integers.
In the generic data model, bignum values are not equal to integers In the generic data model, bignum values are not equal to integers
from the basic data model, but specific data models can define that from the basic data model, but specific data models can define that
equivalence, and preferred encoding never makes use of bignums that equivalence, and preferred encoding never makes use of bignums that
skipping to change at page 21, line 29 skipping to change at page 22, line 29
(major type 2, length 9), followed by 0x010000000000000000 (one byte (major type 2, length 9), followed by 0x010000000000000000 (one byte
0x01 and eight bytes 0x00). In hexadecimal: 0x01 and eight bytes 0x00). In hexadecimal:
C2 -- Tag 2 C2 -- Tag 2
49 -- Byte string of length 9 49 -- Byte string of length 9
010000000000000000 -- Bytes content 010000000000000000 -- Bytes content
3.4.5. Decimal Fractions and Bigfloats 3.4.5. Decimal Fractions and Bigfloats
Protocols using tag number 4 extend the generic data model with data Protocols using tag number 4 extend the generic data model with data
items representing arbitrary-length decimal fractions m*(10*e). items representing arbitrary-length decimal fractions of the form
Protocols using tag number 5 extend the generic data model with data m*(10**e). Protocols using tag number 5 extend the generic data
items representing arbitrary-length binary fractions m*(2*e). As model with data items representing arbitrary-length binary fractions
with bignums, values of different types are not equal in the generic of the form m*(2**e). As with bignums, values of different types are
data model. not equal in the generic data model.
Decimal fractions combine an integer mantissa with a base-10 scaling Decimal fractions combine an integer mantissa with a base-10 scaling
factor. They are most useful if an application needs the exact factor. They are most useful if an application needs the exact
representation of a decimal fraction such as 1.1 because there is no representation of a decimal fraction such as 1.1 because there is no
exact representation for many decimal fractions in binary floating exact representation for many decimal fractions in binary floating
point. point.
Bigfloats combine an integer mantissa with a base-2 scaling factor. Bigfloats combine an integer mantissa with a base-2 scaling factor.
They are binary floating-point values that can exceed the range or They are binary floating-point values that can exceed the range or
the precision of the three IEEE 754 formats supported by CBOR the precision of the three IEEE 754 formats supported by CBOR
skipping to change at page 23, line 17 skipping to change at page 24, line 17
The tags in this section are for content hints that might be used by The tags in this section are for content hints that might be used by
generic CBOR processors. These content hints do not extend the generic CBOR processors. These content hints do not extend the
generic data model. generic data model.
3.4.6.1. Encoded CBOR Data Item 3.4.6.1. Encoded CBOR Data Item
Sometimes it is beneficial to carry an embedded CBOR data item that Sometimes it is beneficial to carry an embedded CBOR data item that
is not meant to be decoded immediately at the time the enclosing data is not meant to be decoded immediately at the time the enclosing data
item is being decoded. Tag number 24 (CBOR data item) can be used to item is being decoded. Tag number 24 (CBOR data item) can be used to
tag the embedded byte string as a data item encoded in CBOR format. tag the embedded byte string as a data item encoded in CBOR format.
Contained items that aren't byte strings are invalid. Any contained Contained items that aren't byte strings are invalid. A contained
byte string is valid, even if it encodes an invalid or ill-formed byte string is valid if it encodes a well-formed CBOR item; validity
CBOR item. checking of the decoded CBOR item is not required for tag validity
(but could be offered by a generic decoder as a special option).
3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters 3.4.6.2. Expected Later Encoding for CBOR-to-JSON Converters
Tags number 21 to 23 indicate that a byte string might require a Tags number 21 to 23 indicate that a byte string might require a
specific encoding when interoperating with a text-based specific encoding when interoperating with a text-based
representation. These tags are useful when an encoder knows that the representation. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a byte string data it is writing is likely to be later converted to a
particular JSON-based usage. That usage specifies that some strings particular JSON-based usage. That usage specifies that some strings
are encoded as base64, base64url, and so on. The encoder uses byte are encoded as base64, base64url, and so on. The encoder uses byte
strings instead of doing the encoding itself to reduce the message strings instead of doing the encoding itself to reduce the message
skipping to change at page 24, line 44 skipping to change at page 25, line 44
Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] or a
version of the JavaScript regular expression syntax [ECMA262]. version of the JavaScript regular expression syntax [ECMA262].
(Note that more specific identification may be necessary if the (Note that more specific identification may be necessary if the
actual version of the specification underlying the regular actual version of the specification underlying the regular
expression, or more than just the text of the regular expression expression, or more than just the text of the regular expression
itself, need to be conveyed.) Any contained string value is itself, need to be conveyed.) Any contained string value is
valid. valid.
o Tag number 36 is for MIME messages (including all headers), as o Tag number 36 is for MIME messages (including all headers), as
defined in [RFC2045]. A text string that isn't a valid MIME defined in [RFC2045]. A text string that isn't a valid MIME
message is invalid. message is invalid. (For this tag, validity checking may be
particularly onerous for a generic decoder and might therefore not
be offered.)
Note that tag numbers 33 and 34 differ from 21 and 22 in that the Note that tag numbers 33 and 34 differ from 21 and 22 in that the
data is transported in base-encoded form for the former and in raw data is transported in base-encoded form for the former and in raw
byte string form for the latter. byte string form for the latter.
3.4.7. Self-Described CBOR 3.4.7. Self-Described CBOR
In many applications, it will be clear from the context that CBOR is In many applications, it will be clear from the context that CBOR is
being employed for encoding a data item. For instance, a specific being employed for encoding a data item. For instance, a specific
protocol might specify the use of CBOR, or a media type is indicated protocol might specify the use of CBOR, or a media type is indicated
skipping to change at page 25, line 39 skipping to change at page 26, line 39
formats. An easy way for an encoder to help the decoder would be to formats. An easy way for an encoder to help the decoder would be to
tag the entire CBOR item with tag number 55799, the serialization of tag the entire CBOR item with tag number 55799, the serialization of
which will never be found at the beginning of a JSON text. which will never be found at the beginning of a JSON text.
4. Serialization Considerations 4. Serialization Considerations
4.1. Preferred Serialization 4.1. Preferred Serialization
For some values at the data model level, CBOR provides multiple For some values at the data model level, CBOR provides multiple
serializations. For many applications, it is desirable that an serializations. For many applications, it is desirable that an
encoder always chooses a preferred serialization; however, the encoder always chooses a preferred serialization (preferred
present specification does not put the burden of enforcing this encoding); however, the present specification does not put the burden
preference on either encoder or decoder. of enforcing this preference on either encoder or decoder.
Some constrained decoders may be limited in their ability to decode Some constrained decoders may be limited in their ability to decode
non-preferred serializations: For example, if only integers below non-preferred serializations: For example, if only integers below
1_000_000_000 are expected in an application, the decoder may leave 1_000_000_000 are expected in an application, the decoder may leave
out the code that would be needed to decode 64-bit arguments in out the code that would be needed to decode 64-bit arguments in
integers. An encoder that always uses preferred serialization integers. An encoder that always uses preferred serialization
("preferred encoder") interoperates with this decoder for the numbers ("preferred encoder") interoperates with this decoder for the numbers
that can occur in this application. More generally speaking, it that can occur in this application. More generally speaking, it
therefore can be said that a preferred encoder is more universally therefore can be said that a preferred encoder is more universally
interoperable (and also less wasteful) than one that, say, always interoperable (and also less wasteful) than one that, say, always
skipping to change at page 26, line 41 skipping to change at page 27, line 41
protocols are free to define what they mean by a "deterministic protocols are free to define what they mean by a "deterministic
format" and what encoders and decoders are expected to do. This format" and what encoders and decoders are expected to do. This
section defines a set of restrictions that can serve as the base of section defines a set of restrictions that can serve as the base of
such a deterministic format. such a deterministic format.
4.2.1. Core Deterministic Encoding Requirements 4.2.1. Core Deterministic Encoding Requirements
A CBOR encoding satisfies the "core deterministic encoding A CBOR encoding satisfies the "core deterministic encoding
requirements" if it satisfies the following restrictions: requirements" if it satisfies the following restrictions:
o Arguments (see Section 3) for integers, lengths in major types 2 o Preferred serialization MUST be used. In particular, this means
through 5, and tags MUST be as short as possible. In particular: that arguments (see Section 3) for integers, lengths in major
types 2 through 5, and tags MUST be as short as possible, for
instance:
* 0 to 23 and -1 to -24 MUST be expressed in the same byte as the * 0 to 23 and -1 to -24 MUST be expressed in the same byte as the
major type; major type;
* 24 to 255 and -25 to -256 MUST be expressed only with an * 24 to 255 and -25 to -256 MUST be expressed only with an
additional uint8_t; additional uint8_t;
* 256 to 65535 and -257 to -65536 MUST be expressed only with an * 256 to 65535 and -257 to -65536 MUST be expressed only with an
additional uint16_t; additional uint16_t;
* 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed * 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
only with an additional uint32_t. only with an additional uint32_t.
Floating point values also MUST use the shortest form that
preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5
as 0xfa49742408.
o Indefinite-length items MUST NOT appear. They can be encoded as
definite-length items instead.
o The keys in every map MUST be sorted in the bytewise lexicographic o The keys in every map MUST be sorted in the bytewise lexicographic
order of their deterministic encodings. For example, the order of their deterministic encodings. For example, the
following keys are sorted correctly: following keys are sorted correctly:
1. 10, encoded as 0x0a. 1. 10, encoded as 0x0a.
2. 100, encoded as 0x1864. 2. 100, encoded as 0x1864.
3. -1, encoded as 0x20. 3. -1, encoded as 0x20.
4. "z", encoded as 0x617a. 4. "z", encoded as 0x617a.
5. "aa", encoded as 0x626161. 5. "aa", encoded as 0x626161.
6. [100], encoded as 0x811864. 6. [100], encoded as 0x811864.
7. [-1], encoded as 0x8120. 7. [-1], encoded as 0x8120.
8. false, encoded as 0xf4. 8. false, encoded as 0xf4.
o Indefinite-length items MUST NOT appear. They can be encoded as
definite-length items instead.
4.2.2. Additional Deterministic Encoding Considerations 4.2.2. Additional Deterministic Encoding Considerations
If a protocol allows for IEEE floats, then additional deterministic If a protocol allows for IEEE floats, then additional deterministic
encoding rules might need to be added. One example rule might be to encoding rules might need to be added. One example rule might be to
have all floats start as a 64-bit float, then do a test conversion to have all floats start as a 64-bit float, then do a test conversion to
a 32-bit float; if the result is the same numeric value, use the a 32-bit float; if the result is the same numeric value, use the
shorter value and repeat the process with a test conversion to a shorter value and repeat the process with a test conversion to a
16-bit float. (This rule selects 16-bit float for positive and 16-bit float. (This rule selects 16-bit float for positive and
negative Infinity as well.) Although IEEE floats can represent both negative Infinity as well.) Although IEEE floats can represent both
positive and negative zero as distinct values, the application might positive and negative zero as distinct values, the application might
not distinguish these and might decide to represent all zero values not distinguish these and might decide to represent all zero values
with a positive sign, disallowing negative zero. Also, there are with a positive sign, disallowing negative zero. Also, there are
many representations for NaN. If NaN is an allowed value, it must many representations for NaN. If NaN is an allowed value, it must
always be represented as 0xf97e00. always be represented as 0xf97e00.
CBOR tags present additional considerations for deterministic CBOR tags present additional considerations for deterministic
encoding. The absence or presence of tags in a deterministic format encoding. If a CBOR-based protocol were to provide the same
is determined by the optionality of the tags in the protocol. In a semantics for the presence and absence of a specific tag (e.g., by
CBOR-based protocol that allows optional tagging anywhere, the allowing both tag 1 data items and raw numbers in a date/time
deterministic format must not allow them. In a protocol that position, treating the latter as if they were tagged), the
requires tags in certain places, the tag needs to appear in the deterministic format would not allow them. In a protocol that
deterministic format. A CBOR-based protocol that uses deterministic requires tags in certain places to obtain specific semantics, the tag
encoding might instead say that all tags that appear in a message needs to appear in the deterministic format as well.
must be retained regardless of whether they are optional.
Protocols that include floating, big integer, or other complex values Protocols that include floating, big integer, or other complex values
need to define extra requirements on their deterministic encodings. need to define extra requirements on their deterministic encodings.
For example: For example:
o If a protocol includes a field that can express floating-point o If a protocol includes a field that can express floating-point
values (Section 3.3), the protocol's deterministic encoding needs values (Section 3.3), the protocol's deterministic encoding needs
to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00, to specify whether the integer 1.0 is encoded as 0x01, 0xf93c00,
0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for 0xfa3f800000, or 0xfb3ff0000000000000. Three sensible rules for
this are: this are:
skipping to change at page 29, line 39 skipping to change at page 30, line 48
6. [-1], encoded as 0x8120. 6. [-1], encoded as 0x8120.
7. "aa", encoded as 0x626161. 7. "aa", encoded as 0x626161.
8. [100], encoded as 0x811864. 8. [100], encoded as 0x811864.
(Although [RFC7049] used the term "Canonical CBOR" for its form of (Although [RFC7049] used the term "Canonical CBOR" for its form of
requirements on deterministic encoding, this document avoids this requirements on deterministic encoding, this document avoids this
term because "canonicalization" is often associated with specific term because "canonicalization" is often associated with specific
uses of deterministic encoding only. The terms are essentially uses of deterministic encoding only. The terms are essentially
exchangeable, however, and the set of core requirements in this interchangeable, however, and the set of core requirements in this
document could also be called "Canonical CBOR", while the length- document could also be called "Canonical CBOR", while the length-
first-ordered version of that could be called "Old Canonical CBOR".) first-ordered version of that could be called "Old Canonical CBOR".)
5. Creating CBOR-Based Protocols 5. Creating CBOR-Based Protocols
Data formats such as CBOR are often used in environments where there Data formats such as CBOR are often used in environments where there
is no format negotiation. A specific design goal of CBOR is to not is no format negotiation. A specific design goal of CBOR is to not
need any included or assumed schema: a decoder can take a CBOR item need any included or assumed schema: a decoder can take a CBOR item
and decode it with no other knowledge. and decode it with no other knowledge.
skipping to change at page 31, line 31 skipping to change at page 32, line 38
registered at the time the encoder/decoder is written (Section 5.4). registered at the time the encoder/decoder is written (Section 5.4).
Generic decoders provide ways to present well-formed CBOR values, Generic decoders provide ways to present well-formed CBOR values,
both valid and invalid, to an application. The diagnostic notation both valid and invalid, to an application. The diagnostic notation
(Section 8) may be used to present well-formed CBOR values to humans. (Section 8) may be used to present well-formed CBOR values to humans.
Generic encoders provide an application interface that allows the Generic encoders provide an application interface that allows the
application to specify any well-formed value, including simple values application to specify any well-formed value, including simple values
and tags unknown to the encoder. and tags unknown to the encoder.
5.3. Invalid Items 5.3. Validity of Items
A well-formed but invalid CBOR data item presents a problem with A well-formed but invalid CBOR data item presents a problem with
interpreting the data encoded in it in the CBOR data model. A CBOR- interpreting the data encoded in it in the CBOR data model. A CBOR-
based protocol could be specified in several layers, in which the based protocol could be specified in several layers, in which the
lower layers don't process the semantics of some of the CBOR data lower layers don't process the semantics of some of the CBOR data
they forward. These layers can't notice the invalidity in data they they forward. These layers can't notice any validity errors in data
don't process and MUST forward that data as-is. The first layer that they don't process and MUST forward that data as-is. The first layer
does process the semantics of an invalid CBOR item MUST take one of that does process the semantics of an invalid CBOR item MUST take one
two choices: of two choices:
1. Replace the problematic item with an error marker and continue 1. Replace the problematic item with an error marker and continue
with the next item, or with the next item, or
2. Issue an error and stop processing altogether. 2. Issue an error and stop processing altogether.
A CBOR-based protocol MUST specify which of these options its A CBOR-based protocol MUST specify which of these options its
decoders take, for each kind of invalid item they might encounter. decoders take, for each kind of invalid item they might encounter.
Such problems might include: Such problems might occur at the basic validity level of CBOR or in
the context of tags (tag validity).
5.3.1. Basic validity
Duplicate keys in a map: Generic decoders (Section 5.2) make data Duplicate keys in a map: Generic decoders (Section 5.2) make data
available to applications using the native CBOR data model. That available to applications using the native CBOR data model. That
data model includes maps (key-value mappings with unique keys), data model includes maps (key-value mappings with unique keys),
not multimaps (key-value mappings where multiple entries can have not multimaps (key-value mappings where multiple entries can have
the same key). Thus, a generic decoder that gets a CBOR map item the same key). Thus, a generic decoder that gets a CBOR map item
that has duplicate keys will decode to a map with only one that has duplicate keys will decode to a map with only one
instance of that key, or it might stop processing altogether. On instance of that key, or it might stop processing altogether. On
the other hand, a "streaming decoder" may not even be able to the other hand, a "streaming decoder" may not even be able to
notice (Section 5.6). notice (Section 5.6).
Inadmissible type on the value enclosed by a tag: Tags (Section 3.4)
specify what type of data item is supposed to be enclosed by the
tag; for example, the tags for positive or negative bignums are
supposed to be put on byte strings. A decoder that decodes the
tagged data item into a native representation (a native big
integer in this example) is expected to check the type of the data
item being tagged. Even decoders that don't have such native
representations available in their environment may perform the
check on those tags known to them and react appropriately.
Invalid UTF-8 string: A decoder might or might not want to verify Invalid UTF-8 string: A decoder might or might not want to verify
that the sequence of bytes in a UTF-8 string (major type 3) is that the sequence of bytes in a UTF-8 string (major type 3) is
actually valid UTF-8 and react appropriately. actually valid UTF-8 and react appropriately.
5.4. Handling Unknown Simple Values and Tags 5.3.2. Tag validity
Inadmissible type for tag content: Tags (Section 3.4) specify what
type of data item is supposed to be enclosed by the tag; for
example, the tags for positive or negative bignums are supposed to
be put on byte strings. A decoder that decodes the tagged data
item into a native representation (a native big integer in this
example) is expected to check the type of the data item being
tagged. Even decoders that don't have such native representations
available in their environment may perform the check on those tags
known to them and react appropriately.
Inadmissible value for tag content: The type of data item may be
admissible for a tag's content, but the specific value may not be;
e.g., a value of "yesterday" is not acceptable for the content of
tag 0, even though it properly is a text string. A decoder that
normally ingests such tags into equivalent platform types might
present this tag to the application in a similar way to how it
would present a tag with an unknown tag number (Section 5.4).
5.4. Handling Unknown Simple Values and Tag numbers
A decoder that comes across a simple value (Section 3.3) that it does A decoder that comes across a simple value (Section 3.3) that it does
not recognize, such as a value that was added to the IANA registry not recognize, such as a value that was added to the IANA registry
after the decoder was deployed or a value that the decoder chose not after the decoder was deployed or a value that the decoder chose not
to implement, might issue a warning, might stop processing to implement, might issue a warning, might stop processing
altogether, might handle the error by making the unknown value altogether, might handle the error by making the unknown value
available to the application as such (as is expected of generic available to the application as such (as is expected of generic
decoders), or take some other type of action. decoders), or take some other type of action.
A decoder that comes across a tag number (Section 3.4) that it does A decoder that comes across a tag number (Section 3.4) that it does
not recognize, such as a tag number that was added to the IANA not recognize, such as a tag number that was added to the IANA
registry after the decoder was deployed or a tag number that the registry after the decoder was deployed or a tag number that the
decoder chose not to implement, might issue a warning, might stop decoder chose not to implement, might issue a warning, might stop
processing altogether, might handle the error and present the unknown processing altogether, might handle the error and present the unknown
tag number together with the enclosed data item to the application tag number together with the enclosed data item to the application
(as is expected of generic decoders), might ignore the tag and simply (as is expected of generic decoders), or take some other type of
present the contained data item only to the application, or take some action.
other type of action.
5.5. Numbers 5.5. Numbers
CBOR-based protocols should take into account that different language CBOR-based protocols should take into account that different language
environments pose different restrictions on the range and precision environments pose different restrictions on the range and precision
of numbers that are representable. For example, the JavaScript of numbers that are representable. For example, the JavaScript
number system treats all numbers as floating point, which may result number system treats all numbers as floating point, which may result
in silent loss of precision in decoding integers with more than 53 in silent loss of precision in decoding integers with more than 53
significant bits. A protocol that uses numbers should define its significant bits. A protocol that uses numbers should define its
expectations on the handling of non-trivial numbers in decoders and expectations on the handling of non-trivial numbers in decoders and
skipping to change at page 34, line 27 skipping to change at page 35, line 38
the enclosing data item is completely available ("streaming encoder") the enclosing data item is completely available ("streaming encoder")
may want to reduce its overhead significantly by relying on its data may want to reduce its overhead significantly by relying on its data
source to maintain uniqueness. source to maintain uniqueness.
A CBOR-based protocol MUST define what to do when a receiving A CBOR-based protocol MUST define what to do when a receiving
application does see multiple identical keys in a map. The resulting application does see multiple identical keys in a map. The resulting
rule in the protocol MUST respect the CBOR data model: it cannot rule in the protocol MUST respect the CBOR data model: it cannot
prescribe a specific handling of the entries with the identical keys, prescribe a specific handling of the entries with the identical keys,
except that it might have a rule that having identical keys in a map except that it might have a rule that having identical keys in a map
indicates a malformed map and that the decoder has to stop with an indicates a malformed map and that the decoder has to stop with an
error. Duplicate keys are also prohibited by CBOR decoders that are error. Duplicate keys are also prohibited by CBOR decoders that
using strict mode (Section 5.8). enforce validity (Section 5.8).
The CBOR data model for maps does not allow ascribing semantics to The CBOR data model for maps does not allow ascribing semantics to
the order of the key/value pairs in the map representation. Thus, a the order of the key/value pairs in the map representation. Thus, a
CBOR-based protocol MUST NOT specify that changing the key/value pair CBOR-based protocol MUST NOT specify that changing the key/value pair
order in a map would change the semantics, except to specify that order in a map would change the semantics, except to specify that
some, orders are disallowed, for example where they would not meet some, orders are disallowed, for example where they would not meet
the requirements of a deterministic encoding (Section 4.2). (Any the requirements of a deterministic encoding (Section 4.2). (Any
secondary effects of map ordering such as on timing, cache usage, and secondary effects of map ordering such as on timing, cache usage, and
other potential side channels are not considered part of the other potential side channels are not considered part of the
semantics but may be enough reason on its own for a protocol to semantics but may be enough reason on its own for a protocol to
require a deterministic encoding format.) require a deterministic encoding format.)
Applications for constrained devices that have maps where a small
Applications for constrained devices that have maps with 24 or fewer number of frequently used keys can be identified should consider
frequently used keys should consider using small integers (and those using small integers as keys; for instance, a set of 24 or fewer
with up to 48 frequently used keys should consider also using small frequent keys can be encoded in a single byte as unsigned integers,
negative integers) because the keys can then be encoded in a single up to 48 if negative integers are also used. Less frequently
byte. occurring keys can then use integers with longer encodings.
5.6.1. Equivalence of Keys 5.6.1. Equivalence of Keys
The specific data model applying to a CBOR data item is used to The specific data model applying to a CBOR data item is used to
determine whether keys occurring in maps are duplicates or distinct. determine whether keys occurring in maps are duplicates or distinct.
At the generic data model level, numerically equivalent integer and At the generic data model level, numerically equivalent integer and
floating-point values are distinct from each other, as they are from floating-point values are distinct from each other, as they are from
the various big numbers (Tags 2 to 5). Similarly, text strings are the various big numbers (Tags 2 to 5). Similarly, text strings are
distinct from byte strings, even if composed of the same bytes. A distinct from byte strings, even if composed of the same bytes. A
tagged value is distinct from an untagged value or from a value tagged value is distinct from an untagged value or from a value
tagged with a different tag. tagged with a different tag number.
Within each of these groups, numeric values are distinct unless they Within each of these groups, numeric values are distinct unless they
are numerically equal (specifically, -0.0 is equal to 0.0); for the are numerically equal (specifically, -0.0 is equal to 0.0); for the
purpose of map key equivalence, NaN (not a number) values are purpose of map key equivalence, NaN (not a number) values are
equivalent if they have the same significand after zero-extending equivalent if they have the same significand after zero-extending
both significands at the right to 64 bits. both significands at the right to 64 bits.
(Byte and text) strings are compared byte by byte, arrays element by (Byte and text) strings are compared byte by byte, arrays element by
element, and are equal if they have the same number of bytes/elements element, and are equal if they have the same number of bytes/elements
and the same values at the same positions. Two maps are equal if and the same values at the same positions. Two maps are equal if
they have the same set of pairs regardless of their order; pairs are they have the same set of pairs regardless of their order; pairs are
equal if both the key and value are equal. equal if both the key and value are equal.
Tagged values are equal if both the tag number and the enclosed item Tagged values are equal if both the tag number and the enclosed item
are equal. Simple values are equal if they simply have the same are equal. (Note that a generic decoder that provides processing for
value. Nothing else is equal in the generic data model, a simple a specific tag may not be able to distinguish some semantically
value 2 is not equivalent to an integer 2 and an array is never equivalent values, e.g. if leading zeroes occur in the content of tag
equivalent to a map. 2/3 (Section 3.4.4).) Simple values are equal if they simply have
the same value. Nothing else is equal in the generic data model, a
simple value 2 is not equivalent to an integer 2 and an array is
never equivalent to a map.
As discussed in Section 2.2, specific data models can make values As discussed in Section 2.2, specific data models can make values
equivalent for the purpose of comparing map keys that are distinct in equivalent for the purpose of comparing map keys that are distinct in
the generic data model. Note that this implies that a generic the generic data model. Note that this implies that a generic
decoder may deliver a decoded map to an application that needs to be decoder may deliver a decoded map to an application that needs to be
checked for duplicate map keys by that application (alternatively, checked for duplicate map keys by that application (alternatively,
the decoder may provide a programming interface to perform this the decoder may provide a programming interface to perform this
service for the application). Specific data models cannot service for the application). Specific data models cannot
distinguish values for map keys that are equal for this purpose at distinguish values for map keys that are equal for this purpose at
the generic data model level. the generic data model level.
5.7. Undefined Values 5.7. Undefined Values
In some CBOR-based protocols, the simple value (Section 3.3) of In some CBOR-based protocols, the simple value (Section 3.3) of
Undefined might be used by an encoder as a substitute for a data item Undefined might be used by an encoder as a substitute for a data item
with an encoding problem, in order to allow the rest of the enclosing with an encoding problem, in order to allow the rest of the enclosing
data items to be encoded without harm. data items to be encoded without harm.
5.8. Strict Decoding Mode 5.8. Validity Checking and Robustness
Some areas of application of CBOR do not require deterministic Some areas of application of CBOR do not require deterministic
encoding (Section 4.2) but may require that different decoders reach encoding (Section 4.2) but may require that different decoders reach
the same (semantically equivalent) results, even in the presence of the same (semantically equivalent) results, even in the presence of
potentially malicious data. This can be required if one application potentially malicious data. This can be required if one application
(such as a firewall or other protecting entity) makes a decision (such as a firewall or other protecting entity) makes a decision
based on the data that another application, which independently based on the data that another application, which independently
decodes the data, relies on. decodes the data, relies on.
Normally, it is the responsibility of the sender to avoid ambiguously Normally, it is the responsibility of the sender to avoid ambiguously
decodable data. However, the sender might be an attacker specially decodable data. However, the sender might be an attacker specially
making up CBOR data such that it will be interpreted differently by making up CBOR data such that it will be interpreted differently by
different decoders in an attempt to exploit that as a vulnerability. different decoders in an attempt to exploit that as a vulnerability.
Generic decoders used in applications where this might be a problem Generic decoders used in applications where this might be a problem
need to support a strict mode in which it is also the responsibility can help by providing a validity-checking mode in which it is also
of the receiver to reject ambiguously decodable data. It is expected the responsibility of the generic decoder to reject invalid data. It
that firewalls and other security systems that decode CBOR will only is expected that firewalls and other security systems that decode
decode in strict mode. CBOR will employ their decoders with validity checking applied.
A decoder in strict mode will reliably reject any data that could be A decoder with validity checking will expend the effort to reliably
interpreted by other decoders in different ways. It will expend the detect invalid data items (Section 5.3). For example, such a decoder
effort to reliably detect invalid data items (Section 5.3). For needs to have an API that reports an error (and does not return data)
example, a strict decoder needs to have an API that reports an error for a CBOR data item that contains any of the following:
(and does not return data) for a CBOR data item that contains any of
the following:
o a map (major type 5) that has more than one entry with the same o a map (major type 5) that has more than one entry with the same
key key
o a tag that is used on a data item of the incorrect type o a tag that is used on a data item of the incorrect type
o a data item that is incorrectly formatted for the type given to o a data item that is incorrectly formatted for the type given to
it, such as invalid UTF-8 or data that cannot be interpreted with it, such as invalid UTF-8 in a text string or data that (even if
the specific tag number that it has been tagged with of the correct type) cannot be interpreted with the specific tag
number that it has been tagged with
A decoder in strict mode can do one of two things when it encounters A validity-checking decoder can do one of two things when it
a tag number or simple value that it does not recognize: encounters a tag number or simple value that it does not recognize:
o It can report an error (and not return data). o It can report an error (and not return data).
o It can emit the unknown item (type, value, and, for tags, the o It can emit the unknown item (type, value, and, for tags, the
decoded tagged data item) to the application calling the decoder decoded tagged data item) to the application calling the decoder,
with an indication that the decoder did not recognize that tag with an indication that the decoder did not recognize that tag
number or simple value. number or simple value.
The latter approach, which is also appropriate for non-strict The latter approach, which is also appropriate for decoders that do
decoders, supports forward compatibility with newly registered tags not support validity checking, provides forward compatibility with
and simple values without the requirement to update the encoder at newly registered tags and simple values without the requirement to
the same time as the calling application. (For this, the API for the update the encoder at the same time as the calling application. (For
decoder needs to have a way to mark unknown items so that the calling this, the API for the decoder needs to have a way to mark unknown
application can handle them in a manner appropriate for the program.) items so that the calling application can handle them in a manner
Since some of this processing may have an appreciable cost (in appropriate for the program.)
particular with duplicate detection for maps), support of strict mode
is not a requirement placed on all CBOR decoders. Since some of the processing needed for validity checking may have an
appreciable cost (in particular with duplicate detection for maps),
support of validity checking is not a requirement placed on all CBOR
decoders.
Some encoders will rely on their applications to provide input data Some encoders will rely on their applications to provide input data
in such a way that unambiguously decodable CBOR results. A generic in such a way that valid CBOR results. A generic encoder also may
encoder also may want to provide a strict mode where it reliably want to provide a validity-checking mode where it reliably limits its
limits its output to unambiguously decodable CBOR, independent of output to valid CBOR, independent of whether or not its application
whether or not its application is providing API-conformant data. is providing API-conformant data.
6. Converting Data between CBOR and JSON 6. Converting Data between CBOR and JSON
This section gives non-normative advice about converting between CBOR This section gives non-normative advice about converting between CBOR
and JSON. Implementations of converters are free to use whichever and JSON. Implementations of converters are free to use whichever
advice here they want. advice here they want.
It is worth noting that a JSON text is a sequence of characters, not It is worth noting that a JSON text is a sequence of characters, not
an encoded sequence of bytes, while a CBOR data item consists of an encoded sequence of bytes, while a CBOR data item consists of
bytes, not characters. bytes, not characters.
skipping to change at page 38, line 51 skipping to change at page 40, line 21
6.2. Converting from JSON to CBOR 6.2. Converting from JSON to CBOR
All JSON values, once decoded, directly map into one or more CBOR All JSON values, once decoded, directly map into one or more CBOR
values. As with any kind of CBOR generation, decisions have to be values. As with any kind of CBOR generation, decisions have to be
made with respect to number representation. In a suggested made with respect to number representation. In a suggested
conversion: conversion:
o JSON numbers without fractional parts (integer numbers) are o JSON numbers without fractional parts (integer numbers) are
represented as integers (major types 0 and 1, possibly major type represented as integers (major types 0 and 1, possibly major type
6 tag number 2 and 3), choosing the shortest form; integers longer 6 tag number 2 and 3), choosing the shortest form; integers longer
than an implementation-defined threshold (which is usually either than an implementation-defined threshold may instead be
32 or 64 bits) may instead be represented as floating-point represented as floating-point values. The default range that is
values. (If the JSON was generated from a JavaScript represented as integer is -2**53+1..2**53-1 (fully exploiting the
implementation, its precision is already limited to 53 bits range for exact integers in the binary64 representation often used
maximum.) for decoding JSON [RFC7493]), implementations may choose
-2**32..2**32-1 or -2**64..2**64-1 (fully using the integer ranges
available in CBOR with uint32_t or uint64_t, respectively) or even
-2**31..2**31-1 or -2**63..2**63-1 (using popular ranges for two's
complement signed integers). (If the JSON was generated from a
JavaScript implementation, its precision is already limited to 53
bits maximum.)
o Numbers with fractional parts are represented as floating-point o Numbers with fractional parts are represented as floating-point
values. Preferably, the shortest exact floating-point values, performing the decimal-to-binary conversion based on the
representation is used; for instance, 1.5 is represented in a precision provided by IEEE 754 binary64. Then, when encoding in
16-bit floating-point value (not all implementations will be CBOR, the preferred serialization uses the shortest floating-point
capable of efficiently finding the minimum form, though). There representation exactly representing this conversion result; for
may be an implementation-defined limit to the precision that will instance, 1.5 is represented in a 16-bit floating-point value (not
affect the precision of the represented values. Decimal all implementations will be capable of efficiently finding the
representation should only be used if that is specified in a minimum form, though). Instead of using the default binary64
protocol. precision, there may be an implementation-defined limit to the
precision of the conversion that will affect the precision of the
represented values. Decimal representation should only be used on
the CBOR side if that is specified in a protocol.
CBOR has been designed to generally provide a more compact encoding CBOR has been designed to generally provide a more compact encoding
than JSON. One implementation strategy that might come to mind is to than JSON. One implementation strategy that might come to mind is to
perform a JSON-to-CBOR encoding in place in a single buffer. This perform a JSON-to-CBOR encoding in place in a single buffer. This
strategy would need to carefully consider a number of pathological strategy would need to carefully consider a number of pathological
cases, such as that some strings represented with no or very few cases, such as that some strings represented with no or very few
escapes and longer (or much longer) than 255 bytes may expand when escapes and longer (or much longer) than 255 bytes may expand when
encoded as UTF-8 strings in CBOR. Similarly, a few of the binary encoded as UTF-8 strings in CBOR. Similarly, a few of the binary
floating-point representations might cause expansion from some short floating-point representations might cause expansion from some short
decimal representations (1.1, 1e9) in JSON. This may be hard to get decimal representations (1.1, 1e9) in JSON. This may be hard to get
skipping to change at page 41, line 43 skipping to change at page 43, line 18
RFC 8259, extending it where needed. RFC 8259, extending it where needed.
The notation borrows the JSON syntax for numbers (integer and The notation borrows the JSON syntax for numbers (integer and
floating point), True (>true<), False (>false<), Null (>null<), UTF-8 floating point), True (>true<), False (>false<), Null (>null<), UTF-8
strings, arrays, and maps (maps are called objects in JSON; the strings, arrays, and maps (maps are called objects in JSON; the
diagnostic notation extends JSON here by allowing any data item in diagnostic notation extends JSON here by allowing any data item in
the key position). Undefined is written >undefined< as in the key position). Undefined is written >undefined< as in
JavaScript. The non-finite floating-point numbers Infinity, JavaScript. The non-finite floating-point numbers Infinity,
-Infinity, and NaN are written exactly as in this sentence (this is -Infinity, and NaN are written exactly as in this sentence (this is
also a way they can be written in JavaScript, although JSON does not also a way they can be written in JavaScript, although JSON does not
allow them). A tagged item is written as an integer number for the allow them). A tag is written as an integer number for the tag
tag, followed by the item in parentheses; for instance, an RFC 3339 number, followed by the tag content in parentheses; for instance, an
(ISO 8601) date could be notated as: RFC 3339 (ISO 8601) date could be notated as:
0("2013-03-21T20:04:00Z") 0("2013-03-21T20:04:00Z")
or the equivalent relative time as or the equivalent relative time as
1(1363896240) 1(1363896240)
Byte strings are notated in one of the base encodings, without Byte strings are notated in one of the base encodings, without
padding, enclosed in single quotes, prefixed by >h< for base16, >b32< padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
for base32, >h32< for base32hex, >b64< for base64 or base64url (the for base32, >h32< for base32hex, >b64< for base64 or base64url (the
skipping to change at page 47, line 4 skipping to change at page 48, line 41
The input check itself may consume resources. This is usually linear The input check itself may consume resources. This is usually linear
in the size of the input, which means that an attacker has to spend in the size of the input, which means that an attacker has to spend
resources that are commensurate to the resources spent by the resources that are commensurate to the resources spent by the
defender on input validation. Processing for arbitrary-precision defender on input validation. Processing for arbitrary-precision
numbers may exceed linear effort. Also, some hash-table numbers may exceed linear effort. Also, some hash-table
implementations that are used by decoders to build in-memory implementations that are used by decoders to build in-memory
representations of maps can be attacked to spend quadratic effort, representations of maps can be attacked to spend quadratic effort,
unless a secret key is employed (see Section 7 of [SIPHASH]). Such unless a secret key is employed (see Section 7 of [SIPHASH]). Such
superlinear efforts can be employed by an attacker to exhaust superlinear efforts can be employed by an attacker to exhaust
resources at or before the input validator; they therefore need to be resources at or before the input validator; they therefore need to be
avoided in a CBOR decoder implementation. Note that Tag number avoided in a CBOR decoder implementation. Note that tag number
definitions and their implementations can add security considerations definitions and their implementations can add security considerations
of this kind; this should then be discussed in the security of this kind; this should then be discussed in the security
considerations of the Tag number definition. considerations of the tag number definition.
CBOR encoders do not receive input directly from the network and are CBOR encoders do not receive input directly from the network and are
thus not directly attackable in the same way as CBOR decoders. thus not directly attackable in the same way as CBOR decoders.
However, CBOR encoders often have an API that takes input from However, CBOR encoders often have an API that takes input from
another level in the implementation and can be attacked through that another level in the implementation and can be attacked through that
API. The design and implementation of that API should assume the API. The design and implementation of that API should assume the
behavior of its caller may be based on hostile input or on coding behavior of its caller may be based on hostile input or on coding
mistakes. It should check inputs for buffer overruns, overflow and mistakes. It should check inputs for buffer overruns, overflow and
underflow of integer arithmetic, and other such errors that are aimed underflow of integer arithmetic, and other such errors that are aimed
to disrupt the encoder. to disrupt the encoder.
Protocols that are used in a security context should be defined in Protocols should be defined in such a way that potential multiple
such a way that potential multiple interpretations are reliably interpretations are reliably reduced to a single interpretation. For
reduced to a single interpretation. For example, an attacker could example, an attacker could make use of invalid input such as
make use of invalid input such as duplicate keys in maps, or exploit duplicate keys in maps, or exploit different precision in processing
different precision in processing numbers to make one application numbers to make one application base its decisions on a different
base its decisions on a different interpretation than the one that interpretation than the one that will be used by a second
will be used by a second application. To facilitate consistent application. To facilitate consistent interpretation, encoder and
interpretation, encoder and decoder implementations used in such decoder implementations should provide a validity checking mode of
contexts should provide at least one strict mode of operation operation (Section 5.8). Note, however, that a generic decoder
(Section 5.8). cannot know about all requirements that an application poses on its
input data; it is therefore not relieving the application from
performing its own input checking. Also, since the set of defined
tag numbers evolves, the application may employ a tag number that is
not yet supported for validity checking by the generic decoder it
uses. Generic decoders therefore need to provide documentation which
tag numbers they support and what validity checking they can provide
for each of them as well as for basic CBOR validity (UTF-8 checking,
duplicate map key checking).
11. References 11. References
11.1. Normative References 11.1. Normative References
[ECMA262] Ecma International, "ECMAScript 2018 Language [ECMA262] Ecma International, "ECMAScript 2018 Language
Specification", ECMA Standard ECMA-262, 9th Edition, June Specification", ECMA Standard ECMA-262, 9th Edition, June
2018, <https://www.ecma- 2018, <https://www.ecma-
international.org/publications/files/ECMA-ST/ international.org/publications/files/ECMA-ST/Ecma-
Ecma-262.pdf>. 262.pdf>.
[IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
Std 754-2008. Std 754-2008.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
<https://www.rfc-editor.org/info/rfc2045>. <https://www.rfc-editor.org/info/rfc2045>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
skipping to change at page 49, line 5 skipping to change at page 50, line 48
[ASN.1] International Telecommunication Union, "Information [ASN.1] International Telecommunication Union, "Information
Technology -- ASN.1 encoding rules: Specification of Basic Technology -- ASN.1 encoding rules: Specification of Basic
Encoding Rules (BER), Canonical Encoding Rules (CER) and Encoding Rules (BER), Canonical Encoding Rules (CER) and
Distinguished Encoding Rules (DER)", ITU-T Recommendation Distinguished Encoding Rules (DER)", ITU-T Recommendation
X.690, 1994. X.690, 1994.
[BSON] Various, "BSON - Binary JSON", 2013, [BSON] Various, "BSON - Binary JSON", 2013,
<http://bsonspec.org/>. <http://bsonspec.org/>.
[I-D.ietf-cbor-sequence]
Bormann, C., "Concise Binary Object Representation (CBOR)
Sequences", draft-ietf-cbor-sequence-02 (work in
progress), September 2019.
[IANA.cbor-simple-values] [IANA.cbor-simple-values]
IANA, "Concise Binary Object Representation (CBOR) Simple IANA, "Concise Binary Object Representation (CBOR) Simple
Values", Values",
<http://www.iana.org/assignments/cbor-simple-values>. <http://www.iana.org/assignments/cbor-simple-values>.
[IANA.cbor-tags] [IANA.cbor-tags]
IANA, "Concise Binary Object Representation (CBOR) Tags", IANA, "Concise Binary Object Representation (CBOR) Tags",
<http://www.iana.org/assignments/cbor-tags>. <http://www.iana.org/assignments/cbor-tags>.
[MessagePack] [MessagePack]
skipping to change at page 49, line 38 skipping to change at page 51, line 38
[RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object
Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
October 2013, <https://www.rfc-editor.org/info/rfc7049>. October 2013, <https://www.rfc-editor.org/info/rfc7049>.
[RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for
Constrained-Node Networks", RFC 7228, Constrained-Node Networks", RFC 7228,
DOI 10.17487/RFC7228, May 2014, DOI 10.17487/RFC7228, May 2014,
<https://www.rfc-editor.org/info/rfc7228>. <https://www.rfc-editor.org/info/rfc7228>.
[RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493,
DOI 10.17487/RFC7493, March 2015,
<https://www.rfc-editor.org/info/rfc7493>.
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
Interchange Format", STD 90, RFC 8259, Interchange Format", STD 90, RFC 8259,
DOI 10.17487/RFC8259, December 2017, DOI 10.17487/RFC8259, December 2017,
<https://www.rfc-editor.org/info/rfc8259>. <https://www.rfc-editor.org/info/rfc8259>.
[RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T.,
and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS
Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September
2019, <https://www.rfc-editor.org/info/rfc8618>.
[SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- [SIPHASH] Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
Input PRF", Lecture Notes in Computer Science pp. 489-508, Input PRF", Lecture Notes in Computer Science pp. 489-508,
DOI 10.1007/978-3-642-34931-7_28, 2012. DOI 10.1007/978-3-642-34931-7_28, 2012.
[YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup
Language (YAML[TM]) Version 1.2", 3rd Edition, October Language (YAML[TM]) Version 1.2", 3rd Edition, October
2009, <http://www.yaml.org/spec/1.2/spec.html>. 2009, <http://www.yaml.org/spec/1.2/spec.html>.
Appendix A. Examples Appendix A. Examples
skipping to change at page 56, line 29 skipping to change at page 59, line 29
| 0xc2 | Positive bignum (data item "byte string" follows) | | 0xc2 | Positive bignum (data item "byte string" follows) |
| | | | | |
| 0xc3 | Negative bignum (data item "byte string" follows) | | 0xc3 | Negative bignum (data item "byte string" follows) |
| | | | | |
| 0xc4 | Decimal Fraction (data item "array" follows; see | | 0xc4 | Decimal Fraction (data item "array" follows; see |
| | Section 3.4.5) | | | Section 3.4.5) |
| | | | | |
| 0xc5 | Bigfloat (data item "array" follows; see | | 0xc5 | Bigfloat (data item "array" follows; see |
| | Section 3.4.5) | | | Section 3.4.5) |
| | | | | |
| 0xc6..0xd4 | (tagged item) | | 0xc6..0xd4 | (tag) |
| | | | | |
| 0xd5..0xd7 | Expected Conversion (data item follows; see | | 0xd5..0xd7 | Expected Conversion (data item follows; see |
| | Section 3.4.6.2) | | | Section 3.4.6.2) |
| | | | | |
| 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a data | | 0xd8..0xdb | (more tags, 1/2/4/8 bytes and then a data item |
| | item follow) | | | follow) |
| | | | | |
| 0xe0..0xf3 | (simple value) | | 0xe0..0xf3 | (simple value) |
| | | | | |
| 0xf4 | False | | 0xf4 | False |
| | | | | |
| 0xf5 | True | | 0xf5 | True |
| | | | | |
| 0xf6 | Null | | 0xf6 | Null |
| | | | | |
| 0xf7 | Undefined | | 0xf7 | Undefined |
skipping to change at page 61, line 22 skipping to change at page 64, line 22
3. no schema description needed 3. no schema description needed
4. reasonably compact serialization 4. reasonably compact serialization
5. applicability to constrained and unconstrained applications 5. applicability to constrained and unconstrained applications
6. good JSON conversion 6. good JSON conversion
7. extensibility 7. extensibility
A discussion of CBOR and other formats with respect to a different
set of design objectives is provided in Section 5 and Appendix C of
[RFC8618].
E.1. ASN.1 DER, BER, and PER E.1. ASN.1 DER, BER, and PER
[ASN.1] has many serializations. In the IETF, DER and BER are the [ASN.1] has many serializations. In the IETF, DER and BER are the
most common. The serialized output is not particularly compact for most common. The serialized output is not particularly compact for
many items, and the code needed to decode numeric items can be many items, and the code needed to decode numeric items can be
complex on a constrained device. complex on a constrained device.
Few (if any) IETF protocols have adopted one of the several variants Few (if any) IETF protocols have adopted one of the several variants
of Packed Encoding Rules (PER). There could be many reasons for of Packed Encoding Rules (PER). There could be many reasons for
this, but one that is commonly stated is that PER makes use of the this, but one that is commonly stated is that PER makes use of the
skipping to change at page 63, line 44 skipping to change at page 66, line 44
o Updated reference for [CNN-TERMS] to [RFC7228] o Updated reference for [CNN-TERMS] to [RFC7228]
o Added a comment to the last example in Section 2.2.1 (added o Added a comment to the last example in Section 2.2.1 (added
"Second value") "Second value")
o Fixed a bug in the example in Section 2.4.2 ("29" -> "49") o Fixed a bug in the example in Section 2.4.2 ("29" -> "49")
o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" -> o Fixed a bug in the last paragraph of Section 3.6 ("0b000_11101" ->
"0b000_11001") "0b000_11001")
Appendix G. Well-formedness errors and examples
There are three basic kinds of well-formedness errors that can occur
in decoding a CBOR data item:
o Too much data: There are input bytes left that were not consumed.
This is only an error if the application assumed that the input
bytes would span exexactly one data item. Where the application
uses the self-delimiting nature of CBOR encoding to permit
additional data after the data item, as is for example done in
CBOR sequences [I-D.ietf-cbor-sequence], the CBOR decoder can
simply indicate what part of the input has not been consumed.
o Too little data: The input data available would need additional
bytes added at their end for a complete CBOR data item. This may
indicate the input is truncated; it is also a common error when
trying to decode random data as CBOR. For some applications
however, this may not be actually be an error, as the application
may not be certain it has all the data yet and can obtain or wait
for additional input bytes. Some of these applications may have
an upper limit for how much additional data can show up; here the
decoder may be able to indicate that the encoded CBOR data item
cannot be completed within this limit.
o Syntax error: The input data are not consistent with the
requirements of the CBOR encoding, and this cannot be remedied by
adding (or removing) data at the end.
In Appendix C, errors of the first kind are addressed in the first
paragraph/bullet list (requiring "no bytes are left"), and errors of
the second kind are addressed in the second paragraph/bullet list
(failing "if n bytes are no longer available"). Errors of the third
kind are identified in the pseudocode by specific instances of
calling fail(), in order:
o a reserved value is used for additional information (28, 29, 30)
o major type 7, additional information 24, value < 32 (incorrect or
incorrectly encoded simple type)
o incorrect substructure of indefinite length byte/text string (may
only contain definite length strings of the same major type)
o break stop code (mt=7, ai=31) occurs in a value position of a map
or except at a position directly in an indefinite length item
where also another enclosed data item could occur
o additional information 31 used with major type 0, 1, or 6
G.1. Examples for CBOR data items that are not well-formed
This subsection shows a few examples for CBOR data items that are not
well-formed. Each example is a sequence of bytes each shown in
hexadecimal; multiple examples in a list are separated by commas.
Examples for well-formedness error kind 1 (too much data) can easily
be formed by adding data to a well-formed encoded CBOR data item.
Similarly, examples for well-formedness error kind 2 (too little
data) can be formed by truncating a well-formed encoded CBOR data
item. In test suites, it may be beneficial to specifically test with
incomplete data items that would require large amounts of addition to
be completed (for instance by starting the encoding of a string of a
very large size).
A premature end of the input can occur in a head or within the
enclosed data, which may be bare strings or enclosed data items that
are either counted or should have been ended by a break stop code.
o End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02
03 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa
00 00, fb 00 00 00
o Definite length strings with short data: 41, 61, 5a ff ff ff ff
00, 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f
ff ff ff ff ff ff ff 01 02 03
o Definite length maps and arrays not closed with enough items: 81,
81 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00
00
o Indefinite length strings not closed by a break stop code: 5f 41
00, 7f 61 00
o Indefinite length maps and arrays not closed by a break stop code:
9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f
ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
A few examples for the five subkinds of well-formedness error kind 3
(syntax error) are shown below.
Subkind 1:
o Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
fd, fe,
Subkind 2:
o Reserved two-byte encodings of simple types: f8 00, f8 01, f8 18,
f8 1f
Subkind 3:
o Indefinite length string chunks not of the correct type: 5f 00 ff,
5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff,
7f 41 00 ff
o Indefinite length string chunks not definite length: 5f 5f 41 00
ff ff, 7f 7f 61 00 ff ff
Subkind 4:
o Break occurring on its own outside of an indefinite length item:
ff
o Break occurring in a definite length array or map or a tag: 81 ff,
82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82
9f 81 9f 9f ff ff ff ff
o Break in indefinite length map would lead to odd number of items
(break in a value position): bf 00 ff, bf 00 00 00 ff
Subkind 5:
o Major type 0, 1, 6 with additional information 31: 1f, 3f, df
Acknowledgements Acknowledgements
CBOR was inspired by MessagePack. MessagePack was developed and CBOR was inspired by MessagePack. MessagePack was developed and
promoted by Sadayuki Furuhashi ("frsyuki"). This reference to promoted by Sadayuki Furuhashi ("frsyuki"). This reference to
MessagePack is solely for attribution; CBOR is not intended as a MessagePack is solely for attribution; CBOR is not intended as a
version of or replacement for MessagePack, as it has different design version of or replacement for MessagePack, as it has different design
goals and requirements. goals and requirements.
The need for functionality beyond the original MessagePack The need for functionality beyond the original MessagePack
Specification became obvious to many people at about the same time Specification became obvious to many people at about the same time
 End of changes. 57 change blocks. 
201 lines changed or deleted 403 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/