--- 1/draft-ietf-cellar-ffv1-08.txt 2019-09-06 08:13:05.043564004 -0700 +++ 2/draft-ietf-cellar-ffv1-09.txt 2019-09-06 08:13:05.139566422 -0700 @@ -1,18 +1,18 @@ cellar M. Niedermayer Internet-Draft D. Rice Intended status: Informational J. Martinez -Expires: February 14, 2020 August 13, 2019 +Expires: March 9, 2020 September 6, 2019 FFV1 Video Coding Format Version 0, 1, and 3 - draft-ietf-cellar-ffv1-08 + draft-ietf-cellar-ffv1-09 Abstract This document defines FFV1, a lossless intra-frame video encoding format. FFV1 is designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. Status of This Memo @@ -23,21 +23,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on February 14, 2020. + This Internet-Draft will expire on March 9, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights @@ -84,24 +84,24 @@ 4.1.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 26 4.1.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 26 4.1.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 27 4.1.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 27 4.1.10. extra_plane . . . . . . . . . . . . . . . . . . . . . 27 4.1.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 27 4.1.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 28 4.1.13. quant_table_set_count . . . . . . . . . . . . . . . . 28 4.1.14. states_coded . . . . . . . . . . . . . . . . . . . . 28 4.1.15. initial_state_delta . . . . . . . . . . . . . . . . . 28 - 4.1.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 28 + 4.1.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2. Configuration Record . . . . . . . . . . . . . . . . . . 29 - 4.2.1. reserved_for_future_use . . . . . . . . . . . . . . . 29 + 4.2.1. reserved_for_future_use . . . . . . . . . . . . . . . 30 4.2.2. configuration_record_crc_parity . . . . . . . . . . . 30 4.2.3. Mapping FFV1 into Containers . . . . . . . . . . . . 30 4.3. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5. Slice Header . . . . . . . . . . . . . . . . . . . . . . 33 4.5.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.3. slice_width . . . . . . . . . . . . . . . . . . . . . 33 4.5.4. slice_height . . . . . . . . . . . . . . . . . . . . 34 4.5.5. quant_table_set_index_count . . . . . . . . . . . . . 34 @@ -318,23 +318,23 @@ log2(a) the base-two logarithm of a min(a,b) the smallest of two values a and b max(a,b) the largest of two values a and b median(a,b,c) the numerical middle value in a data set of a, b, and c, i.e. a+b+c-min(a,b,c)-max(a,b,c) - a_{b} the b-th value of a sequence of a + a_(b) the b-th value of a sequence of a - a_{b,c} the 'b,c'-th value of a sequence of a + a~b,c. the 'b,c'-th value of a sequence of a 2.2.6. Order of Operation Precedence When order of precedence is not indicated explicitly by use of parentheses, operations are evaluated in the following order (from top to bottom, operations of same precedence being evaluated from left to right). This order of operations is based on the order of operations used in Standard C. a++, a-- @@ -493,44 +493,48 @@ JPEG2000-RCT with Range Coder coder was implemented without this issue in one implementation and validated by one conformance checker. It is expected (to be confirmed) to remove this exception for the median predictor in the next version of the FFV1 bitstream. 3.4. Context Relative to any "Sample" "X", the Quantized Sample Differences "L-l", "l-tl", "tl-t", "T-t", and "t-tr" are used as context: - context = Q_{0}[l − tl] + - Q_{1}[tl − t] + - Q_{2}[t − tr] + - Q_{3}[L − l] + - Q_{4}[T − t] + context = Q_{0}[l - tl] + + Q_{1}[tl - t] + + Q_{2}[t - tr] + + Q_{3}[L - l] + + Q_{4}[T - t] + + Figure 1 If "context >= 0" then "context" is used and the difference between the "Sample" and its predicted value is encoded as is, else "-context" is used and the difference between the "Sample" and its predicted value is encoded with a flipped sign. 3.5. Quantization Table Sets The FFV1 bitstream contains 1 or more Quantization Table Sets. Each Quantization Table Set contains exactly 5 Quantization Tables with each Quantization Table corresponding to 1 of the 5 Quantized Sample Differences. For each Quantization Table, both the number of quantization steps and their distribution are stored in the FFV1 bitstream; each Quantization Table has exactly 256 entries, and the 8 least significant bits of the Quantized Sample Difference are used as index: Q_{j}[k] = quant_tables[i][j][k&255] + Figure 2 + In this formula, "i" is the Quantization Table Set index, "j" is the Quantized Table index, "k" the Quantized Sample Difference. 3.6. Quantization Table Set Indexes For each "Plane" of each slice, a Quantization Table Set is selected from an index: * For Y "Plane", "quant_table_set_index[ 0 ]" index is used @@ -589,139 +593,151 @@ An optional transparency "Plane" can be used to code transparency data. JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, green, blue) "Planes" losslessly in a modified YCbCr color space [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr and RGB use the following formulae. Cb=b-g - Cr=r-g - Y=g+(Cb+Cr)>>2 - g=Y-(Cb+Cr)>>2 - r=Cr+g b=Cb+g + Figure 3 + Exception for the JPEG2000-RCT conversion: if bits_per_raw_sample is between 9 and 15 inclusive and extra_plane is 0, the following formulae for reversible conversions between YCbCr and RGB MUST be used instead of the ones above: Cb=g-b - Cr=r-b - Y=b+(Cb+Cr)>>2 - b=Y-(Cb+Cr)>>2 - r=Cr+b - g=Cb+b + Figure 4 + Background: At the time of this writing, in all known implementations of FFV1 bitstream, when bits_per_raw_sample was between 9 and 15 inclusive and extra_plane is 0, GBR "Planes" were used as BGR "Planes" during both encoding and decoding. In the meanwhile, 16-bit JPEG2000-RCT was implemented without this issue in one implementation and validated by one conformance checker. Methods to address this exception for the transform are under consideration for the next version of the FFV1 bitstream. When FFV1 uses the JPEG2000-RCT, the horizontal "Lines" are interleaved to improve caching efficiency since it is most likely that the JPEG2000-RCT will immediately be converted to RGB during decoding. The interleaved coding order is also Y, then Cb, then Cr, and then if used transparency. As an example, a "Frame" that is two "Pixels" wide and two "Pixels" high, could be comprised of the following structure: +------------------------+------------------------+ - | Pixel[1,1] | Pixel[2,1] | - | Y[1,1] Cb[1,1] Cr[1,1] | Y[2,1] Cb[2,1] Cr[2,1] | + | Pixel(1,1) | Pixel(2,1) | + | Y(1,1) Cb(1,1) Cr(1,1) | Y(2,1) Cb(2,1) Cr(2,1) | +------------------------+------------------------+ - | Pixel[1,2] | Pixel[2,2] | - | Y[1,2] Cb[1,2] Cr[1,2] | Y[2,2] Cb[2,2] Cr[2,2] | + | Pixel(1,2) | Pixel(2,2) | + | Y(1,2) Cb(1,2) Cr(1,2) | Y(2,2) Cb(2,2) Cr(2,2) | +------------------------+------------------------+ In JPEG2000-RCT, the coding order would be left to right and then top to bottom, with values interleaved by "Lines" and stored in this order: - Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] - Cb[2,2] Cr[1,2] Cr[2,2] + Y(1,1) Y(2,1) Cb(1,1) Cb(2,1) Cr(1,1) Cr(2,1) Y(1,2) Y(2,2) Cb(1,2) + Cb(2,2) Cr(1,2) Cr(2,2) 3.8. Coding of the Sample Difference Instead of coding the n+1 bits of the Sample Difference with Huffman or Range coding (or n+2 bits, in the case of JPEG2000-RCT), only the n (or n+1, in the case of JPEG2000-RCT) least significant bits are used, since this is sufficient to recover the original "Sample". In the equation below, the term "bits" represents bits_per_raw_sample+1 for JPEG2000-RCT or bits_per_raw_sample otherwise: coder_input = - [(sample_difference + 2^(bits−1)) & (2^bits − 1)] − 2^(bits−1) + [(sample_difference + 2^(bits-1)) & (2^bits - 1)] - 2^(bits-1) + + Figure 5 3.8.1. Range Coding Mode Early experimental versions of FFV1 used the CABAC Arithmetic coder from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain patent/royalty situation, as well as its slightly worse performance, CABAC was replaced by a Range coder based on an algorithm defined by G. Nigel and N. Martin in 1979 [range-coding]. 3.8.1.1. Range Binary Values - To encode binary digits efficiently a Range coder is used. "C_{i}" - is the i-th Context. "B_{i}" is the i-th byte of the bytestream. - "b_{i}" is the i-th Range coded binary value, "S_{0,i}" is the i-th - initial state. The length of the bytestream encoding n binary - symbols is "j_{n}" bytes. + To encode binary digits efficiently a Range coder is used. "C~i~" is + the i-th Context. "B~i~" is the i-th byte of the bytestream. "b~i~" + is the i-th Range coded binary value, "S~0,i~" is the i-th initial + state. The length of the bytestream encoding n binary symbols is + "j~n~" bytes. r_{i} = floor( ( R_{i} * S_{i,C_{i}} ) / 2^8 ) + Figure 6 + S_{i+1,C_{i}} = zero_state_{S_{i,C_{i}}} XOR l_i = L_i XOR t_i = R_i - r_i <== b_i = 0 <==> L_i < R_i - r_i S_{i+1,C_{i}} = one_state_{S_{i,C_{i}}} XOR l_i = L_i - R_i + r_i XOR t_i = r_i <== b_i = 1 <==> L_i >= R_i - r_i + Figure 7 + S_{i+1,k} = S_{i,k} <== C_i != k + + Figure 8 + R_{i+1} = 2^8 * t_{i} XOR L_{i+1} = 2^8 * l_{i} + B_{j_{i}} XOR j_{i+1} = j_{i} + 1 <== t_{i} < 2^8 R_{i+1} = t_{i} XOR L_{i+1} = l_{i} XOR j_{i+1} = j_{i} <== t_{i} >= 2^8 + Figure 9 + R_{0} = 65280 + Figure 10 + L_{0} = 2^8 * B_{0} + B_{1} + Figure 11 + j_{0} = 2 + Figure 12 + 3.8.1.1.1. Termination The range coder can be used in 3 modes. * In "Open mode" when decoding, every symbol the reader attempts to read is available. In this mode arbitrary data can have been appended without affecting the range coder output. This mode is not used in FFV1. * In "Closed mode" the length in bytes of the bytestream is provided @@ -772,23 +788,27 @@ 3.8.1.3. Initial Values for the Context Model At keyframes all Range coder state variables are set to their initial state. 3.8.1.4. State Transition Table one_state_{i} = default_state_transition_{i} + state_transition_delta_{i} + Figure 13 + zero_state_{i} = 256 - one_state_{256-i} + Figure 14 3.8.1.5. default_state_transition + 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, @@ -1173,27 +1193,27 @@ Table 10 * Encoders MUST NOT store bits_per_raw_sample = 0 Decoders SHOULD accept and interpret bits_per_raw_sample = 0 as 8. 4.1.8. log2_h_chroma_subsample "log2_h_chroma_subsample" indicates the subsample factor, stored in powers to which the number 2 must be raised, between luma and chroma - width ("chroma_width = 2^(-log2_h_chroma_subsample) * luma_width"). + width ("chroma_width = 2^-log2_h_chroma_subsample^ * luma_width"). 4.1.9. log2_v_chroma_subsample "log2_v_chroma_subsample" indicates the subsample factor, stored in powers to which the number 2 must be raised, between luma and chroma - height ("chroma_height=2^(-log2_v_chroma_subsample) * luma_height"). + height ("chroma_height=2^-log2_v_chroma_subsample^ * luma_height"). 4.1.10. extra_plane "extra_plane" indicates if an extra "Plane" is present. +-------+------------------------------+ | value | presence | +=======+==============================+ | 0 | extra "Plane" is not present | +-------+------------------------------+ @@ -1241,25 +1261,29 @@ | 1 | initial states are present | +-------+--------------------------------+ Table 12 4.1.15. initial_state_delta "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range coder state, it is encoded using "k" as context index and - pred = j ? initial_states[ i ][j - 1][ k ] : 128 + pred = j ? initial_states[ i ][j - 1][ k ] + + Figure 15 initial_state[ i ][ j ][ k ] = ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 + Figure 16 + 4.1.16. ec "ec" indicates the error detection/correction type. +-------+--------------------------------------------+ | value | error detection/correction type | +=======+============================================+ | 0 | 32-bit CRC on the global header | +-------+--------------------------------------------+ | 1 | 32-bit CRC per slice and the global header | @@ -1290,21 +1314,21 @@ Table 14 4.2. Configuration Record In the case of a FFV1 bitstream with "version >= 3", a "Configuration Record" is stored in the underlying "Container", at the track header level. It contains the "Parameters" used for all instances of "Frame". The size of the "Configuration Record", "NumBytes", is supplied by the underlying "Container". - pseudo-code | type --------------------------------------------------------------|----- ConfigurationRecord( NumBytes ) { | ConfigurationRecordIsPresent = 1 | Parameters( ) | while (remaining_symbols_in_syntax(NumBytes - 4)) { | reserved_for_future_use | br/ur/sr } | configuration_record_crc_parity | u(32) } | + pseudo-code | type -----------------------------------------------------------|----- ConfigurationRecord( NumBytes ) { | ConfigurationRecordIsPresent = 1 | Parameters( ) | while (remaining_symbols_in_syntax(NumBytes - 4)) { | reserved_for_future_use | br/ur/sr } | configuration_record_crc_parity | u(32) } | 4.2.1. reserved_for_future_use "reserved_for_future_use" has semantics that are reserved for future use. Encoders conforming to this version of this specification SHALL NOT write this value. Decoders conforming to this version of this specification SHALL @@ -1571,22 +1595,22 @@ chroma_planes ? 2 : 0 ) ]" value is "slice_pixel_height". If "chroma_planes" is set to 1, "plane_pixel_height[ 1 ]" and "plane_pixel_height[ 2 ]" value is "ceil(slice_pixel_height / log2_v_chroma_subsample)". 4.6.3. slice_pixel_height "slice_pixel_height" is the height in pixels of the slice. - Its value is "floor(( slice_y + slice_height ) * slice_pixel_height / - num_v_slices) - slice_pixel_y". + Its value is "floor( ( slice_y + slice_height ) * slice_pixel_height + / num_v_slices ) - slice_pixel_y". 4.6.4. slice_pixel_y "slice_pixel_y" is the slice vertical position in pixels. Its value is "floor(slice_y * frame_pixel_height / num_v_slices)". 4.7. Line A "Line" is a list of the sample differences (relative to the @@ -1879,23 +1903,23 @@ 10. Changelog See https://github.com/FFmpeg/FFV1/commits/master (https://github.com/FFmpeg/FFV1/commits/master) 11. Normative References [I-D.ietf-cellar-ffv1] Niedermayer, M., Rice, D., and J. Martinez, "FFV1 Video Coding Format Version 0, 1, and 3", draft-ietf-cellar- - ffv1-07 (work in progress), February 6, 2019, + ffv1-08 (work in progress), August 13, 2019, . + ffv1-08>. [ISO.15444-1.2016] International Organization for Standardization, "Information technology -- JPEG 2000 image coding system: Core coding system", October 2016. [ISO.9899.1990] International Organization for Standardization, "Programming languages - C", 1990. @@ -1918,24 +1942,24 @@ September 2012, . [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, January 2013, . 12. Informative References [Address-Sanitizer] - The Clang Team, "ASAN AddressSanitizer website", August + The Clang Team, "ASAN AddressSanitizer website", September 2019, . - [AVI] Microsoft, "AVI RIFF File Reference", August 2019, + [AVI] Microsoft, "AVI RIFF File Reference", September 2019, . [FFV1_V0] Niedermayer, M., "Commit to mark FFV1 version 0 as non- experimental", April 2006, . [FFV1_V1] Niedermayer, M., "Commit to release FFV1 version 1", April 2009, @@ -1972,26 +1996,27 @@ matroska/>. [NUT] Niedermayer, M., "NUT Open Container Format", December 2013, . [range-coding] Nigel, G. and N. Martin, "Range encoding: an algorithm for removing redundancy from a digitised message.", July 1979. [REFIMPL] Niedermayer, M., "The reference FFV1 implementation / the - FFV1 codec in FFmpeg", August 2019, . + FFV1 codec in FFmpeg", September 2019, + . - [VALGRIND] Valgrind Developers, "Valgrind website", August 2019, + [VALGRIND] Valgrind Developers, "Valgrind website", September 2019, . - [YCbCr] Wikipedia, "YCbCr", August 2019, + [YCbCr] Wikipedia, "YCbCr", September 2019, . Authors' Addresses Michael Niedermayer Email: michael@niedermayer.cc Dave Rice