--- 1/draft-ietf-cellar-ffv1-05.txt 2018-10-18 04:14:12.943479501 -0700 +++ 2/draft-ietf-cellar-ffv1-06.txt 2018-10-18 04:14:13.027481498 -0700 @@ -1,20 +1,20 @@ cellar M. Niedermayer Internet-Draft Intended status: Informational D. Rice -Expires: March 29, 2019 +Expires: April 21, 2019 J. Martinez - September 25, 2018 + October 18, 2018 FFV1 Video Coding Format Version 0, 1, and 3 - draft-ietf-cellar-ffv1-05 + draft-ietf-cellar-ffv1-06 Abstract This document defines FFV1, a lossless intra-frame video encoding format. FFV1 is designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. Status of This Memo @@ -25,21 +25,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on March 29, 2019. + This Internet-Draft will expire on April 21, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -57,97 +57,97 @@ 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 6 2.2.2. Arithmetic Operators . . . . . . . . . . . . . . . . 6 2.2.3. Assignment Operators . . . . . . . . . . . . . . . . 6 2.2.4. Comparison Operators . . . . . . . . . . . . . . . . 7 2.2.5. Mathematical Functions . . . . . . . . . . . . . . . 7 2.2.6. Order of Operation Precedence . . . . . . . . . . . . 8 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 8 2.2.9. Bitstream Functions . . . . . . . . . . . . . . . . . 8 - 3. General Description . . . . . . . . . . . . . . . . . . . . . 9 + 3. Sample Coding . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3. Median Predictor . . . . . . . . . . . . . . . . . . . . 10 3.4. Context . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.5. Quantization Table Sets . . . . . . . . . . . . . . . . . 11 - 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 11 + 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 12 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 12 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 12 - 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 14 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 14 - 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 17 + 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 18 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.1. Parameters . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.1.1. version . . . . . . . . . . . . . . . . . . . . . . . 21 - 4.1.2. micro_version . . . . . . . . . . . . . . . . . . . . 22 - 4.1.3. coder_type . . . . . . . . . . . . . . . . . . . . . 22 - 4.1.4. state_transition_delta . . . . . . . . . . . . . . . 23 - 4.1.5. colorspace_type . . . . . . . . . . . . . . . . . . . 23 - 4.1.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 23 - 4.1.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 23 - 4.1.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 24 - 4.1.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 24 - 4.1.10. alpha_plane . . . . . . . . . . . . . . . . . . . . . 24 - 4.1.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 24 - 4.1.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 24 - 4.1.13. quant_table_set_count . . . . . . . . . . . . . . . . 25 - 4.1.14. states_coded . . . . . . . . . . . . . . . . . . . . 25 - 4.1.15. initial_state_delta . . . . . . . . . . . . . . . . . 25 - 4.1.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 25 - 4.1.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 25 - 4.2. Configuration Record . . . . . . . . . . . . . . . . . . 26 - 4.2.1. reserved_for_future_use . . . . . . . . . . . . . . . 26 - 4.2.2. configuration_record_crc_parity . . . . . . . . . . . 26 - 4.2.3. Mapping FFV1 into Containers . . . . . . . . . . . . 27 - 4.3. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 4.4. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 4.5. Slice Header . . . . . . . . . . . . . . . . . . . . . . 29 - 4.5.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 30 - 4.5.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 30 - 4.5.3. slice_width . . . . . . . . . . . . . . . . . . . . . 30 - 4.5.4. slice_height . . . . . . . . . . . . . . . . . . . . 30 - 4.5.5. quant_table_set_index_count . . . . . . . . . . . . . 30 - 4.5.6. quant_table_set_index . . . . . . . . . . . . . . . . 30 - 4.5.7. picture_structure . . . . . . . . . . . . . . . . . . 31 - 4.5.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 31 - 4.5.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 31 - 4.6. Slice Content . . . . . . . . . . . . . . . . . . . . . . 31 - 4.6.1. primary_color_count . . . . . . . . . . . . . . . . . 32 - 4.6.2. plane_pixel_height . . . . . . . . . . . . . . . . . 32 - 4.6.3. slice_pixel_height . . . . . . . . . . . . . . . . . 32 - 4.6.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 32 - 4.7. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 32 - 4.7.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 32 - 4.7.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 33 - 4.7.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 33 - 4.7.4. sample_difference . . . . . . . . . . . . . . . . . . 33 - 4.8. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 33 - 4.8.1. slice_size . . . . . . . . . . . . . . . . . . . . . 33 - 4.8.2. error_status . . . . . . . . . . . . . . . . . . . . 33 - 4.8.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 34 - 4.9. Quantization Table Set . . . . . . . . . . . . . . . . . 34 - 4.9.1. quant_tables . . . . . . . . . . . . . . . . . . . . 35 - 4.9.2. context_count . . . . . . . . . . . . . . . . . . . . 35 - 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 35 - 6. Security Considerations . . . . . . . . . . . . . . . . . . . 36 - 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 36 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 - 9. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 38 - 9.1. Decoder implementation suggestions . . . . . . . . . . . 38 - 9.1.1. Multi-threading Support and Independence of Slices . 38 - 10. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 39 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 - 11.1. Normative References . . . . . . . . . . . . . . . . . . 39 - 11.2. Informative References . . . . . . . . . . . . . . . . . 40 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 + 4.1. Parameters . . . . . . . . . . . . . . . . . . . . . . . 21 + 4.1.1. version . . . . . . . . . . . . . . . . . . . . . . . 22 + 4.1.2. micro_version . . . . . . . . . . . . . . . . . . . . 23 + 4.1.3. coder_type . . . . . . . . . . . . . . . . . . . . . 23 + 4.1.4. state_transition_delta . . . . . . . . . . . . . . . 24 + 4.1.5. colorspace_type . . . . . . . . . . . . . . . . . . . 24 + 4.1.6. chroma_planes . . . . . . . . . . . . . . . . . . . . 24 + 4.1.7. bits_per_raw_sample . . . . . . . . . . . . . . . . . 24 + 4.1.8. log2_h_chroma_subsample . . . . . . . . . . . . . . . 25 + 4.1.9. log2_v_chroma_subsample . . . . . . . . . . . . . . . 25 + 4.1.10. alpha_plane . . . . . . . . . . . . . . . . . . . . . 25 + 4.1.11. num_h_slices . . . . . . . . . . . . . . . . . . . . 25 + 4.1.12. num_v_slices . . . . . . . . . . . . . . . . . . . . 25 + 4.1.13. quant_table_set_count . . . . . . . . . . . . . . . . 26 + 4.1.14. states_coded . . . . . . . . . . . . . . . . . . . . 26 + 4.1.15. initial_state_delta . . . . . . . . . . . . . . . . . 26 + 4.1.16. ec . . . . . . . . . . . . . . . . . . . . . . . . . 26 + 4.1.17. intra . . . . . . . . . . . . . . . . . . . . . . . . 26 + 4.2. Configuration Record . . . . . . . . . . . . . . . . . . 27 + 4.2.1. reserved_for_future_use . . . . . . . . . . . . . . . 27 + 4.2.2. configuration_record_crc_parity . . . . . . . . . . . 27 + 4.2.3. Mapping FFV1 into Containers . . . . . . . . . . . . 28 + 4.3. Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 29 + 4.4. Slice . . . . . . . . . . . . . . . . . . . . . . . . . . 29 + 4.5. Slice Header . . . . . . . . . . . . . . . . . . . . . . 30 + 4.5.1. slice_x . . . . . . . . . . . . . . . . . . . . . . . 31 + 4.5.2. slice_y . . . . . . . . . . . . . . . . . . . . . . . 31 + 4.5.3. slice_width . . . . . . . . . . . . . . . . . . . . . 31 + 4.5.4. slice_height . . . . . . . . . . . . . . . . . . . . 31 + 4.5.5. quant_table_set_index_count . . . . . . . . . . . . . 31 + 4.5.6. quant_table_set_index . . . . . . . . . . . . . . . . 31 + 4.5.7. picture_structure . . . . . . . . . . . . . . . . . . 32 + 4.5.8. sar_num . . . . . . . . . . . . . . . . . . . . . . . 32 + 4.5.9. sar_den . . . . . . . . . . . . . . . . . . . . . . . 32 + 4.6. Slice Content . . . . . . . . . . . . . . . . . . . . . . 32 + 4.6.1. primary_color_count . . . . . . . . . . . . . . . . . 33 + 4.6.2. plane_pixel_height . . . . . . . . . . . . . . . . . 33 + 4.6.3. slice_pixel_height . . . . . . . . . . . . . . . . . 33 + 4.6.4. slice_pixel_y . . . . . . . . . . . . . . . . . . . . 33 + 4.7. Line . . . . . . . . . . . . . . . . . . . . . . . . . . 33 + 4.7.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 34 + 4.7.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 34 + 4.7.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 34 + 4.7.4. sample_difference . . . . . . . . . . . . . . . . . . 34 + 4.8. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 34 + 4.8.1. slice_size . . . . . . . . . . . . . . . . . . . . . 35 + 4.8.2. error_status . . . . . . . . . . . . . . . . . . . . 35 + 4.8.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 35 + 4.9. Quantization Table Set . . . . . . . . . . . . . . . . . 35 + 4.9.1. quant_tables . . . . . . . . . . . . . . . . . . . . 36 + 4.9.2. context_count . . . . . . . . . . . . . . . . . . . . 37 + 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 37 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 37 + 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 38 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 + 9. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . 40 + 9.1. Decoder implementation suggestions . . . . . . . . . . . 40 + 9.1.1. Multi-threading Support and Independence of Slices . 40 + 10. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 40 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 40 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 40 + 11.2. Informative References . . . . . . . . . . . . . . . . . 41 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 1. Introduction This document describes FFV1, a lossless video encoding format. The design of FFV1 considers the storage of image characteristics, data fixity, and the optimized use of encoding time and storage requirements. FFV1 is designed to support a wide range of lossless video applications such as long-term audiovisual preservation, scientific imaging, screen recording, and other video encoding scenarios that seek to avoid the generational loss of lossy video @@ -182,57 +182,57 @@ [YCbCr]. 2. Notation and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2.1. Definitions - "Frame": An encoded representation of a complete static image. - - "Slice": A spatial sub-section of a "Frame" that is encoded - separately from an other region of the same frame. - - "Container": Format that encapsulates "Frames" and (when required) a - "Configuration Record" into a bitstream. + "Container": Format that encapsulates "Frames" (see Section 4.3) and + (when required) a "Configuration Record" into a bitstream. "Sample": The smallest addressable representation of a color - component or a luma component in a "Frame". Examples of sample are + component or a luma component in a "Frame". Examples of "Sample" are Luma, Blue Chrominance, Red Chrominance, Alpha, Red, Green, and Blue. + "Plane": A discrete component of a static image comprised of + "Samples" that represent a specific quantification of "Samples" of + that image. + "Pixel": The smallest addressable representation of a color in a - "Frame". It is composed of 1 or more samples. + "Frame". It is composed of 1 or more "Samples". "ESC": An ESCape symbol to indicate that the symbol to be stored is too large for normal storage and that an alternate storage method. "MSB": Most Significant Bit, the bit that can cause the largest change in magnitude of the symbol. "RCT": Reversible Color Transform, a near linear, exactly reversible integer transform that converts between RGB and YCbCr representations - of a Pixel. + of a "Pixel". "VLC": Variable Length Code, a code that maps source symbols to a variable number of bits. - "RGB": A reference to the method of storing the value of a Pixel by + "RGB": A reference to the method of storing the value of a "Pixel" by using three numeric values that represent Red, Green, and Blue. - "YCbCr": A reference to the method of storing the value of a Pixel by - using three numeric values that represent the luma of the Pixel (Y) - and the chrominance of the Pixel (Cb and Cr). YCbCr word is used for - historical reasons and currently references any color space relying - on 1 luma sample and 2 chrominance samples e.g. YCbCr, YCgCo or - ICtCp. Exact meaning of the three numeric values is unspecified. + "YCbCr": A reference to the method of storing the value of a "Pixel" + by using three numeric values that represent the luma of the "Pixel" + (Y) and the chrominance of the "Pixel" (Cb and Cr). YCbCr word is + used for historical reasons and currently references any color space + relying on 1 luma "Sample" and 2 chrominance "Samples" e.g. YCbCr, + YCgCo or ICtCp. Exact meaning of the three numeric values is + unspecified. "TBA": To Be Announced. Used in reference to the development of future iterations of the FFV1 specification. 2.2. Conventions 2.2.1. Pseudo-code The FFV1 bitstream is described in this document using pseudo-code. Note that the pseudo-code is used for clarity in order to illustrate the structure of FFV1 and not intended to specify any particular @@ -372,193 +372,208 @@ "byte_aligned( )" is true if "remaining_bits_in_bitstream( NumBytes )" is a multiple of 8, otherwise false. 2.2.9.3. get_bits "get_bits( i )" is the action to read the next "i" bits in the bitstream, from most significant bit to least significant bit, and to return the corresponding value. The pointer is increased by "i". -3. General Description +3. Sample Coding - Samples within a plane are coded in raster scan order (left->right, - top->bottom). Each sample is predicted by the median predictor from - samples in the same plane and the difference is stored see - Section 3.8. + For each "Slice" (as described in Section 4.4) of a "Frame", the + "Planes", "Lines", and "Samples" are coded in an order determined by + the "Color Space" (see Section 3.7). Each "Sample" is predicted by + the median predictor as described in Section 3.3 from other "Samples" + within the same "Plane" and the difference is stored using the method + described in Section 3.8. 3.1. Border - A border is assumed for each coded slice for the purpose of the - predictor and context according to the following rules: + A border is assumed for each coded "Slice" for the purpose of the + median predictor and context according to the following rules: - o one column of samples to the left of the coded slice is assumed as - identical to the samples of the leftmost column of the coded slice - shifted down by one row. The value of the topmost sample of the - column of samples to the left of the coded slice is assumed to be - "0" + o one column of "Samples" to the left of the coded slice is assumed + as identical to the "Samples" of the leftmost column of the coded + slice shifted down by one row. The value of the topmost "Sample" + of the column of "Samples" to the left of the coded slice is + assumed to be "0" - o one column of samples to the right of the coded slice is assumed - as identical to the samples of the rightmost column of the coded + o one column of "Samples" to the right of the coded slice is assumed + as identical to the "Samples" of the rightmost column of the coded slice - o an additional column of samples to the left of the coded slice and - two rows of samples above the coded slice are assumed to be "0" + o an additional column of "Samples" to the left of the coded slice + and two rows of "Samples" above the coded slice are assumed to be + "0" - The following table depicts a slice of samples "a,b,c,d,e,f,g,h,i" - along with its assumed border. + The following table depicts a slice of 9 "Samples" + "a,b,c,d,e,f,g,h,i" in a 3x3 arrangement along with its assumed + border. +---+---+---+---+---+---+---+---+ | 0 | 0 | | 0 | 0 | 0 | | 0 | +---+---+---+---+---+---+---+---+ | 0 | 0 | | 0 | 0 | 0 | | 0 | +---+---+---+---+---+---+---+---+ | | | | | | | | | +---+---+---+---+---+---+---+---+ | 0 | 0 | | a | b | c | | c | +---+---+---+---+---+---+---+---+ | 0 | a | | d | e | f | | f | +---+---+---+---+---+---+---+---+ | 0 | d | | g | h | i | | i | +---+---+---+---+---+---+---+---+ 3.2. Samples - Positions used for context and median predictor are: + Relative to any "Sample" "X", six other relatively positioned + "Samples" from the coded "Samples" and presumed border are identified + according to the labels used in the following diagram. The labels + for these relatively positioned "Samples" are used within the median + predictor and context. +---+---+---+---+ | | | T | | +---+---+---+---+ | |tl | t |tr | +---+---+---+---+ | L | l | X | | +---+---+---+---+ - "X" is the current processed Sample. The identifiers are made of the - first letters of the words Top, Left and Right. + The labels for these relative "Samples" are made of the first letters + of the words Top, Left and Right. 3.3. Median Predictor - The prediction for any sample value at position "X" may be computed + The prediction for any "Sample" value at position "X" may be computed based upon the relative neighboring values of "l", "t", and "tl" via this equation: "median(l, t, l + t - tl)". Note, this prediction template is also used in [ISO.14495-1.1999] and [HuffYUV]. - Exception for the media predictor: if "colorspace_type == 0 && + Exception for the median predictor: if "colorspace_type == 0 && bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )", - the following media predictor MUST be used: + the following median predictor MUST be used: "median(left16s, top16s, left16s + top16s - diag16s)" where: left16s = l >= 32768 ? ( l - 65536 ) : l top16s = t >= 32768 ? ( t - 65536 ) : t diag16s = tl >= 32768 ? ( tl - 65536 ) : tl Background: a two's complement signed 16-bit signed integer was used - for storing sample values in all known implementations of FFV1 + for storing "Sample" values in all known implementations of FFV1 bitstream. So in some circumstances, the most significant bit was wrongly interpreted (used as a sign bit instead of the 16th bit of an unsigned integer). Note that when the issue is discovered, the only configuration of all known implementations being impacted is 16-bit YCbCr with no Pixel transformation with Range Coder coder, as other potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with Range Coder coder, or 16-bit content with Golomb Rice coder) were implemented nowhere [ISO.15444-1.2016]. In the meanwhile, 16-bit JPEG2000-RCT with Range Coder coder was implemented without this issue in one implementation and validated by one conformance checker. It is expected (to be confirmed) to remove this exception for the - media predictor in the next version of the FFV1 bitstream. + median predictor in the next version of the FFV1 bitstream. 3.4. Context - Relative to any sample "X", the Quantized Sample Differences "L-l", + Relative to any "Sample" "X", the Quantized Sample Differences "L-l", "l-tl", "tl-t", "T-t", and "t-tr" are used as context: context = Q_{0}[l - tl] + Q_{1}[tl - t] + Q_{2}[t - tr] + Q_{3}[L - l] + Q_{4}[T - t] If "context >= 0" then "context" is used and the difference between - the sample and its predicted value is encoded as is, else "-context" - is used and the difference between the sample and its predicted value - is encoded with a flipped sign. + the "Sample" and its predicted value is encoded as is, else + "-context" is used and the difference between the "Sample" and its + predicted value is encoded with a flipped sign. 3.5. Quantization Table Sets The FFV1 bitstream contains 1 or more Quantization Table Sets. Each - Quantization Table Set contains exactly 5 Quantization Tables, each - Quantization Table corresponding to 1 of the 5 Quantized Sample + Quantization Table Set contains exactly 5 Quantization Tables with + each Quantization Table corresponding to 1 of the 5 Quantized Sample Differences. For each Quantization Table, both the number of quantization steps and their distribution are stored in the FFV1 bitstream; each Quantization Table has exactly 256 entries, and the 8 least significant bits of the Quantized Sample Difference are used as index: Q_{j}[k] = quant_tables[i][j][k&255] In this formula, "i" is the Quantization Table Set index, "j" is the Quantized Table index, "k" the Quantized Sample Difference. 3.6. Quantization Table Set Indexes - For each plane of each slice, a Quantization Table Set is selected + For each "Plane" of each slice, a Quantization Table Set is selected from an index: - o For Y plane, "quant_table_set_index [ 0 ]" index is used + o For Y "Plane", "quant_table_set_index [ 0 ]" index is used - o For Cb and Cr planes, "quant_table_set_index [ 1 ]" index is used + o For Cb and Cr "Planes", "quant_table_set_index [ 1 ]" index is + used - o For Alpha plane, "quant_table_set_index [ (version <= 3 || + o For Alpha "Plane", "quant_table_set_index [ (version <= 3 || chroma_planes) ? 2 : 1 ]" index is used Background: in first implementations of FFV1 bitstream, the index for - Cb and Cr planes was stored even if it is not used (chroma_planes set - to 0), this index is kept for version <= 3 in order to keep + Cb and Cr "Planes" was stored even if it is not used (chroma_planes + set to 0), this index is kept for version <= 3 in order to keep compatibility with FFV1 bitstreams in the wild. 3.7. Color spaces FFV1 supports two color spaces: YCbCr and RGB. Both color spaces - allow an optional Alpha plane that can be used to code transparency + allow an optional Alpha "Plane" that can be used to code transparency data. + The FFV1 bitstream interleaves data in an order determined by the + color space. In YCbCr for each "Plane", each "Line" is coded from + top to bottom and for each "Line", each "Sample" is coded from left + to right. In JPEG2000-RCT for each "Line" from top to bottom, each + "Plane" is coded and for each "Plane", each "Sample" is encoded from + left to right. + 3.7.1. YCbCr - In YCbCr color space, the Cb and Cr planes are optional, but if used - then MUST be used together. Omitting the Cb and Cr planes codes the - frames in grayscale without color data. An FFV1 "Frame" using YCbCr - MUST use one of the following arrangements: + In YCbCr color space, the Cb and Cr "Planes" are optional, but if + used then MUST be used together. Omitting the Cb and Cr "Planes" + codes the frames in grayscale without color data. An FFV1 "Frame" + using YCbCr MUST use one of the following arrangements: o Y - o Y, Alpha o Y, Cb, Cr o Y, Cb, Cr, Alpha - The Y plane MUST be coded first. If the Cb and Cr planes are used - then they MUST be coded after the Y plane. If an Alpha - (transparency) plane is used, then it MUST be coded last. + The Y "Plane" MUST be coded first. If the Cb and Cr "Planes" are + used then they MUST be coded after the Y "Plane". If an Alpha + (transparency) "Plane" is used, then it MUST be coded last. 3.7.2. RGB JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, - green, blue) planes losslessly in a modified YCbCr color space + green, blue) "Planes" losslessly in a modified YCbCr color space [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr and RGB use the following formulae. Cb=b-g Cr=r-g Y=g+(Cb+Cr)>>2 g=Y-(Cb+Cr)>>2 @@ -579,58 +594,59 @@ Y=b+(Cb+Cr)>>2 b=Y-(Cb+Cr)>>2 r=Cr+b g=Cb+b Background: At the time of this writing, in all known implementations of FFV1 bitstream, when bits_per_raw_sample was between 9 and 15 - inclusive and alpha_plane is 0, GBR planes were used as BGR planes - during both encoding and decoding. In the meanwhile, 16-bit + inclusive and alpha_plane is 0, GBR "Planes" were used as BGR + "Planes" during both encoding and decoding. In the meanwhile, 16-bit JPEG2000-RCT was implemented without this issue in one implementation and validated by one conformance checker. Methods to address this exception for the transform are under consideration for the next version of the FFV1 bitstream. - When FFV1 uses the JPEG2000-RCT, the horizontal lines are interleaved - to improve caching efficiency since it is most likely that the RCT - will immediately be converted to RGB during decoding. The - interleaved coding order is also Y, then Cb, then Cr, and then if - used Alpha. + When FFV1 uses the JPEG2000-RCT, the horizontal "Lines" are + interleaved to improve caching efficiency since it is most likely + that the JPEG2000-RCT will immediately be converted to RGB during + decoding. The interleaved coding order is also Y, then Cb, then Cr, + and then if used Alpha. - As an example, a "Frame" that is two pixels wide and two pixels high, - could be comprised of the following structure: + As an example, a "Frame" that is two "Pixels" wide and two "Pixels" + high, could be comprised of the following structure: +------------------------+------------------------+ | Pixel[1,1] | Pixel[2,1] | | Y[1,1] Cb[1,1] Cr[1,1] | Y[2,1] Cb[2,1] Cr[2,1] | +------------------------+------------------------+ | Pixel[1,2] | Pixel[2,2] | | Y[1,2] Cb[1,2] Cr[1,2] | Y[2,2] Cb[2,2] Cr[2,2] | +------------------------+------------------------+ In JPEG2000-RCT, the coding order would be left to right and then top - to bottom, with values interleaved by lines and stored in this order: + to bottom, with values interleaved by "Lines" and stored in this + order: Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] Cb[2,2] Cr[1,2] Cr[2,2] 3.8. Coding of the Sample Difference Instead of coding the n+1 bits of the Sample Difference with Huffman - or Range coding (or n+2 bits, in the case of RCT), only the n (or - n+1) least significant bits are used, since this is sufficient to - recover the original sample. In the equation below, the term "bits" - represents bits_per_raw_sample+1 for RCT or bits_per_raw_sample - otherwise: + or Range coding (or n+2 bits, in the case of JPEG2000-RCT), only the + n (or n+1, in the case of JPEG2000-RCT) least significant bits are + used, since this is sufficient to recover the original "Sample". In + the equation below, the term "bits" represents bits_per_raw_sample+1 + for JPEG2000-RCT or bits_per_raw_sample otherwise: coder_input = [(sample_difference + 2^(bits-1)) & (2^bits - 1)] - 2^(bits-1) 3.8.1. Range Coding Mode Early experimental versions of FFV1 used the CABAC Arithmetic coder from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain patent/royalty situation, as well as its slightly worse performance, CABAC was replaced by a Range coder based on an algorithm defined by @@ -754,25 +770,25 @@ 210,211,212,213,215,215,216,217,218,219,220,220,222,223,224,225, 226,227,227,229,229,230,231,232,234,234,235,236,237,238,239,240, 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 3.8.1.6. Alternative State Transition Table The alternative state transition table has been built using iterative minimization of frame sizes and generally performs better than the - default. To use it, the coder_type MUST be set to 2 and the - difference to the default MUST be stored in the "Parameters", see - Section 4.1. The reference implementation of FFV1 in FFmpeg uses - this table by default at the time of this writing when Range coding - is used. + default. To use it, the coder_type (see Section 4.1.3) MUST be set + to 2 and the difference to the default MUST be stored in the + "Parameters", see Section 4.1. The reference implementation of FFV1 + in FFmpeg uses this table by default at the time of this writing when + Range coding is used. 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, 87, 82, 71, 97, 73, 73, 82, 75,111, 77, 94, 78, 87, 81, 83, 97, @@ -846,22 +861,22 @@ Run mode is entered when the context is 0 and left as soon as a non-0 difference is found. The level is identical to the predicted one. The run and the first different level are coded. 3.8.2.5. Run Length Coding The run value is encoded in 2 parts, the prefix part stores the more significant part of the run as well as adjusting the run_index that determines the number of bits in the less significant part of the run. The 2nd part of the value stores the less significant part of - the run as it is. The run_index is reset for each plane and slice to - 0. + the run as it is. The run_index is reset for each "Plane" and slice + to 0. pseudo-code | type --------------------------------------------------------------|----- log2_run[41]={ | 0, 0, 0, 0, 1, 1, 1, 1, | 2, 2, 2, 2, 3, 3, 3, 3, | 4, 4, 5, 5, 6, 6, 7, 7, | 8, 9,10,11,12,13,14,15, | 16,17,18,19,20,21,22,23, | 24, | @@ -888,58 +903,70 @@ 3.8.2.6. Level Coding Level coding is identical to the normal difference coding with the exception that the 0 value is removed as it cannot occur: if (diff>0) diff--; encode(diff); Note, this is different from JPEG-LS, which doesn't use prediction in run mode and uses a different encoding and context model for the last - difference On a small set of test samples the use of prediction + difference On a small set of test "Samples" the use of prediction slightly improved the compression rate. 4. Bitstream + An FFV1 bitstream is composed of a series of 1 or more "Frames" and + (when required) a "Configuration Record". + + Within the following sub-sections, pseudo-code is used to explain the + structure of each FFV1 bitstream component, as described in + Section 2.2.1. The following table lists symbols used to annotate + that pseudo-code in order to define the storage of the data + referenced in that line of pseudo-code. + +--------+----------------------------------------------------------+ | Symbol | Definition | +--------+----------------------------------------------------------+ | u(n) | unsigned big endian integer using n bits | | sg | Golomb Rice coded signed scalar symbol coded with the | | | method described in Section 3.8.2 | | br | Range coded Boolean (1-bit) symbol with the method | | | described in Section 3.8.1.1 | | ur | Range coded unsigned scalar symbol coded with the method | | | described in Section 3.8.1.2 | | sr | Range coded signed scalar symbol coded with the method | | | described in Section 3.8.1.2 | +--------+----------------------------------------------------------+ The same context that is initialized to 128 is used for all fields in the header. The following MUST be provided by external means during initialization of the decoder: - "frame_pixel_width" is defined as "Frame" width in pixels. + "frame_pixel_width" is defined as "Frame" width in "Pixels". - "frame_pixel_height" is defined as "Frame" height in pixels. + "frame_pixel_height" is defined as "Frame" height in "Pixels". Default values at the decoder initialization phase: "ConfigurationRecordIsPresent" is set to 0. 4.1. Parameters - The "Parameters" section contains significant characteristics used - for all instances of "Frame". The pseudo-code below describes the - contents of the bitstream. + The "Parameters" section contains significant characteristics about + the decoding configuration used for all instances of "Frame" (in FFV1 + version 0 and 1) or the whole FFV1 bitstream (other versions), + including the stream version, color configuration, and quantization + tables. The pseudo-code below describes the contents of the + bitstream. pseudo-code | type --------------------------------------------------------------|----- Parameters( ) { | version | ur if (version >= 3) | micro_version | ur coder_type | ur if (coder_type > 1) | for (i = 1; i < 256; i++) | @@ -1033,87 +1060,89 @@ 4.1.4. state_transition_delta "state_transition_delta" specifies the Range coder custom state transition table. If state_transition_delta is not present in the FFV1 bitstream, all Range coder custom state transition table elements are assumed to be 0. 4.1.5. colorspace_type - "colorspace_type" specifies color space losslessly encoded, Pixel - transformation used by the encoder, as well as interleave method. + "colorspace_type" specifies the color space losslessly encoded, the + Pixel transformation used by the encoder, as well as interleave + method. +-------+---------------------+------------------+------------------+ | value | color space | transformation | interleave | | | losslessly encoded | | method | +-------+---------------------+------------------+------------------+ - | 0 | YCbCr | No Pixel | plane then line | - | | | transformation | | - | 1 | RGB | JPEG2000-RCT | line then plane | + | 0 | YCbCr | No Pixel | "Plane" then | + | | | transformation | "Line" | + | 1 | RGB | JPEG2000-RCT | "Line" then | + | | | | "Plane" | | Other | reserved for future | reserved for | reserved for | | | use | future use | future use | +-------+---------------------+------------------+------------------+ Restrictions: If "colorspace_type" is 1, then "chroma_planes" MUST be 1, "log2_h_chroma_subsample" MUST be 0, and "log2_v_chroma_subsample" MUST be 0. 4.1.6. chroma_planes - "chroma_planes" indicates if chroma (color) planes are present. + "chroma_planes" indicates if chroma (color) "Planes" are present. - +-------+-------------------------------+ + +-------+---------------------------------+ | value | presence | - +-------+-------------------------------+ - | 0 | chroma planes are not present | - | 1 | chroma planes are present | - +-------+-------------------------------+ + +-------+---------------------------------+ + | 0 | chroma "Planes" are not present | + | 1 | chroma "Planes" are present | + +-------+---------------------------------+ 4.1.7. bits_per_raw_sample - "bits_per_raw_sample" indicates the number of bits for each sample. + "bits_per_raw_sample" indicates the number of bits for each "Sample". Inferred to be 8 if not present. - +-------+---------------------------------+ + +-------+-----------------------------------+ | value | bits for each sample | - +-------+---------------------------------+ + +-------+-----------------------------------+ | 0 | reserved* | - | Other | the actual bits for each sample | - +-------+---------------------------------+ + | Other | the actual bits for each "Sample" | + +-------+-----------------------------------+ * Encoders MUST NOT store bits_per_raw_sample = 0 Decoders SHOULD accept and interpret bits_per_raw_sample = 0 as 8. 4.1.8. log2_h_chroma_subsample "log2_h_chroma_subsample" indicates the subsample factor, stored in powers to which the number 2 must be raised, between luma and chroma width ("chroma_width = 2^(-log2_h_chroma_subsample) * luma_width"). 4.1.9. log2_v_chroma_subsample "log2_v_chroma_subsample" indicates the subsample factor, stored in powers to which the number 2 must be raised, between luma and chroma height ("chroma_height=2^(-log2_v_chroma_subsample) * luma_height"). 4.1.10. alpha_plane - "alpha_plane" indicates if a transparency plane is present. + "alpha_plane" indicates if a transparency "Plane" is present. - +-------+-----------------------------------+ + +-------+-------------------------------------+ | value | presence | - +-------+-----------------------------------+ - | 0 | transparency plane is not present | - | 1 | transparency plane is present | - +-------+-----------------------------------+ + +-------+-------------------------------------+ + | 0 | transparency "Plane" is not present | + | 1 | transparency "Plane" is present | + +-------+-------------------------------------+ 4.1.11. num_h_slices "num_h_slices" indicates the number of horizontal elements of the slice raster. Inferred to be 1 if not present. 4.1.12. num_v_slices "num_v_slices" indicates the number of vertical elements of the slice @@ -1257,22 +1286,26 @@ versions 2 or less, the Matroska "CodecPrivate" Element SHOULD NOT be used. For FFV1 versions 3 or greater, the Matroska "CodecPrivate" Element MUST contain the FFV1 "Configuration Record" structure and no other data. See [Matroska] for more information about elements. "NumBytes" is defined as the "Element Data Size" of the "CodecPrivate" Element. 4.3. Frame + A "Frame" is an encoded representation of a complete static image. + The whole "Frame" is provided by the underlaying container. + A "Frame" consists of the keyframe field, "Parameters" (if version - <=1), and a sequence of independent slices. + <=1), and a sequence of independent slices. The pseudo-code below + describes the contents of a "Frame". pseudo-code | type --------------------------------------------------------------|----- Frame( NumBytes ) { | keyframe | br if (keyframe && !ConfigurationRecordIsPresent | Parameters( ) | while ( remaining_bits_in_bitstream( NumBytes ) ) | Slice( ) | } | @@ -1289,20 +1322,31 @@ | second slice footer | | --------------------------------------------------------------- | | ... | | --------------------------------------------------------------- | | last slice header | | last slice content | | last slice footer | +-----------------------------------------------------------------+ 4.4. Slice + + A "Slice" is an independent spatial sub-section of a "Frame" that is + encoded separately from an other region of the same "Frame". The use + of more than one "Slice" per "Frame" can be useful for taking + advantage of the opportunities of multithreaded encoding and + decoding. + + A "Slice" consists of a "Slice Header" (when relevant), a "Slice + Content", and a "Slice Footer" (when relevant). The pseudo-code + below describes the contents of a "Slice". + pseudo-code | type --------------------------------------------------------------|----- Slice( ) { | if (version >= 3) | SliceHeader( ) | SliceContent( ) | if (coder_type == 0) | while (!byte_aligned()) | padding | u(1) if (version <= 1) { | @@ -1324,20 +1368,26 @@ Note in case these bits are used in a later revision of this specification: any revision of this specification SHOULD care about avoiding to add 40 bits of content after "SliceContent" for version 0 and 1 of the bitstream. Background: due to some non conforming encoders, some bitstreams where found with 40 extra bits corresponding to "error_status" and "slice_crc_parity", a decoder conforming to the revised specification could not do the difference between a revised bitstream and a buggy bitstream. 4.5. Slice Header + + A "Slice Header" provides information about the decoding + configuration of the "Slice", such as its spatial position, size, and + aspect ratio. The pseudo-code below describes the contents of the + "Slice Header". + pseudo-code | type --------------------------------------------------------------|----- SliceHeader( ) { | slice_x | ur slice_y | ur slice_width - 1 | ur slice_height - 1 | ur for( i = 0; i < quant_table_set_index_count; i++ ) | quant_table_set_index [ i ] | ur picture_structure | ur @@ -1378,47 +1428,58 @@ "quant_table_set_index" indicates the Quantization Table Set index to select the Quantization Table Set and the initial states for the slice. Inferred to be 0 if not present. 4.5.7. picture_structure "picture_structure" specifies the temporal and spatial relationship - of each line of the "Frame". + of each "Line" of the "Frame". Inferred to be 0 if not present. +-------+-------------------------+ | value | picture structure used | +-------+-------------------------+ | 0 | unknown | | 1 | top field first | | 2 | bottom field first | | 3 | progressive | | Other | reserved for future use | +-------+-------------------------+ 4.5.8. sar_num - "sar_num" specifies the sample aspect ratio numerator. + "sar_num" specifies the "Sample" aspect ratio numerator. Inferred to be 0 if not present. - MUST be 0 if sample aspect ratio is unknown. + A value of 0 means that aspect ratio is unknown. + Encoders MUST write 0 if "Sample" aspect ratio is unknown. + If "sar_den" is 0, decoders SHOULD ignore the encoded value and + consider that "sar_num" is 0. 4.5.9. sar_den - "sar_den" specifies the sample aspect ratio denominator. + "sar_den" specifies the "Sample" aspect ratio denominator. Inferred to be 0 if not present. - MUST be 0 if sample aspect ratio is unknown. + A value of 0 means that aspect ratio is unknown. + Encoders MUST write 0 if "Sample" aspect ratio is unknown. + If "sar_num" is 0, decoders SHOULD ignore the encoded value and + consider that "sar_den" is 0. 4.6. Slice Content + A "Slice Content" contains all "Line" elements part of the "Slice". + + Depending on the configuration, "Line" elements are ordered by + "Plane" then by row (YCbCr) or by row then by "Plane" (RGB). + pseudo-code | type --------------------------------------------------------------|----- SliceContent( ) { | if (colorspace_type == 0) { | for( p = 0; p < primary_color_count; p++ ) | for( y = 0; y < plane_pixel_height[ p ]; y++ ) | Line( p, y ) | } else if (colorspace_type == 1) { | for( y = 0; y < slice_pixel_height; y++ ) | for( p = 0; p < primary_color_count; p++ ) | @@ -1447,64 +1508,71 @@ Its value is "floor(( slice_y + slice_height ) * slice_pixel_height / num_v_slices) - slice_pixel_y". 4.6.4. slice_pixel_y "slice_pixel_y" is the slice vertical position in pixels. Its value is "floor(slice_y * frame_pixel_height / num_v_slices)". 4.7. Line + A "Line" is a list of the sample differences (relative to the + predictor) of primary color components. The pseudo-code below + describes the contents of the "Line". + pseudo-code | type --------------------------------------------------------------|----- Line( p, y ) { | if (colorspace_type == 0) { | for( x = 0; x < plane_pixel_width[ p ]; x++ ) | sample_difference[ p ][ y ][ x ] | } else if (colorspace_type == 1) { | for( x = 0; x < slice_pixel_width; x++ ) | sample_difference[ p ][ y ][ x ] | } | } | 4.7.1. plane_pixel_width - "plane_pixel_width[ p ]" is the width in pixels of plane p of the + "plane_pixel_width[ p ]" is the width in "Pixels" of "Plane" p of the slice. "plane_pixel_width[ 0 ]" and "plane_pixel_width[ 1 + ( chroma_planes ? 2 : 0 ) ]" value is "slice_pixel_width". - If "chroma_planes" is set to 1, "plane_pixel_width[ 1 ]" and "plane_pixel_width[ 2 ]" value is "ceil(slice_pixel_width / (1 << log2_h_chroma_subsample))". 4.7.2. slice_pixel_width - "slice_pixel_width" is the width in pixels of the slice. + "slice_pixel_width" is the width in "Pixels" of the slice. Its value is "floor(( slice_x + slice_width ) * slice_pixel_width / num_h_slices) - slice_pixel_x". 4.7.3. slice_pixel_x - "slice_pixel_x" is the slice horizontal position in pixels. + "slice_pixel_x" is the slice horizontal position in "Pixels". Its value is "floor(slice_x * frame_pixel_width / num_h_slices)". 4.7.4. sample_difference "sample_difference[ p ][ y ][ x ]" is the sample difference for - sample at plane "p", y position "y", and x position "x". The sample - value is computed based on prediction and context described in - Section 3.2. + "Sample" at "Plane" "p", y position "y", and x position "x". The + "Sample" value is computed based on median predictor and context + described in Section 3.2. 4.8. Slice Footer - Note: slice footer is always byte aligned. + A "Slice Footer" provides information about slice size and + (optionally) parity. The pseudo-code below describes the contents of + the "Slice Header". + + Note: "Slice Footer" is always byte aligned. pseudo-code | type --------------------------------------------------------------|----- SliceFooter( ) { | slice_size | u(24) if (ec) { | error_status | u(8) slice_crc_parity | u(32) } | } | @@ -1598,21 +1666,21 @@ "context_count[ i ]" indicates the count of contexts for Quantization Table Set "i". 5. Restrictions To ensure that fast multithreaded decoding is possible, starting version 3 and if frame_pixel_width * frame_pixel_height is more than 101376, slice_width * slice_height MUST be less or equal to num_h_slices * num_v_slices / 4. Note: 101376 is the frame size in - pixels of a 352x288 frame also known as CIF ("Common Intermediate + "Pixels" of a 352x288 frame also known as CIF ("Common Intermediate Format") frame size format. For each "Frame", each position in the slice raster MUST be filled by one and only one slice of the "Frame" (no missing slice position, no slice overlapping). For each "Frame" with keyframe value of 0, each slice MUST have the same value of slice_x, slice_y, slice_width, slice_height as a slice in the previous "Frame". @@ -1625,21 +1693,21 @@ Implementations of the FFV1 codec need to take appropriate security considerations into account, as outlined in [RFC4732]. It is extremely important for the decoder to be robust against malicious payloads. Malicious payloads must not cause the decoder to overrun its allocated memory or to take an excessive amount of resources to decode. Although problems in encoders are typically rarer, the same applies to the encoder. Malicious video streams must not cause the encoder to misbehave because this would allow an attacker to attack transcoding gateways. A frequent security problem in image and video - codecs is also to not check for integer overflows in Pixel count + codecs is also to not check for integer overflows in "Pixel" count computations, that is to allocate width * height without considering that the multiplication result may have overflowed the arithmetic types range. The reference implementation [REFIMPL] contains no known buffer overflow or cases where a specially crafted packet or video segment could cause a significant increase in CPU load. The reference implementation [REFIMPL] was validated in the following conditions: @@ -1765,21 +1834,21 @@ See 11. References 11.1. Normative References [I-D.ietf-cellar-ffv1] Niedermayer, M., Rice, D., and J. Martinez, "FFV1 Video Coding Format Version 0, 1, and 3", draft-ietf-cellar- - ffv1-04 (work in progress), July 2018. + ffv1-05 (work in progress), September 2018. [ISO.15444-1.2016] International Organization for Standardization, "Information technology -- JPEG 2000 image coding system: Core coding system", October 2016. [ISO.9899.1990] International Organization for Standardization, "Programming languages - C", ISO Standard 9899, 1990.