linux-kernel - Re: [PATCH v2 1/2] media: docs-rst: Document memory-to-memory video decoder interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27cd87f4-e917-905a-6cec-4323936498a1@xs4all.nl>
Date:   Wed, 23 Jan 2019 09:10:22 +0100
From:   Hans Verkuil <hverkuil@...all.nl>
To:     Tomasz Figa <tfiga@...omium.org>
Cc:     Linux Media Mailing List <linux-media@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Pawel Osciak <posciak@...omium.org>,
        Alexandre Courbot <acourbot@...omium.org>, kamil@...as.org,
        a.hajda@...sung.com, Kyungmin Park <kyungmin.park@...sung.com>,
        jtp.park@...sung.com, Philipp Zabel <p.zabel@...gutronix.de>,
        Tiffany Lin (林慧珊) 
        <tiffany.lin@...iatek.com>,
        Andrew-CT Chen (陳智迪) 
        <andrew-ct.chen@...iatek.com>,
        Stanimir Varbanov <stanimir.varbanov@...aro.org>,
        todor.tomov@...aro.org, nicolas@...fresne.ca,
        Paul Kocialkowski <paul.kocialkowski@...tlin.com>,
        Laurent Pinchart <laurent.pinchart@...asonboard.com>,
        dave.stevenson@...pberrypi.org,
        Ezequiel Garcia <ezequiel@...labora.com>,
        Maxime Jourdan <maxi.jourdan@...adoo.fr>
Subject: Re: [PATCH v2 1/2] media: docs-rst: Document memory-to-memory video
 decoder interface

On 01/23/2019 06:27 AM, Tomasz Figa wrote:
> On Tue, Jan 22, 2019 at 11:47 PM Hans Verkuil <hverkuil@...all.nl> wrote:
>>
>> On 01/22/19 11:02, Tomasz Figa wrote:
>>> On Mon, Nov 12, 2018 at 8:37 PM Hans Verkuil <hverkuil@...all.nl> wrote:
>>>>
>>>> Hi Tomasz,
>>>>
>>>> A general note for the stateful and stateless patches: they describe specific
>>>> use-cases of the more generic Codec Interface, and as such should be one
>>>> level deeper in the section hierarchy.
>>>
>>> I wonder what exactly this Codec Interface is. Is it a historical name
>>> for mem2mem? If so, perhaps it would make sense to rename it?
>>
>> Yeah, it should be renamed to "Video Memory-to-Memory Interface", and the
>> codecs are just specific instances of such an interface.
>>
> 
> Ack.
> 
>>>
>>>>
>>>> I.e. instead of being section 4.6/7/8:
>>>>
>>>> https://hverkuil.home.xs4all.nl/request-api/uapi/v4l/devices.html
>>>>
>>>> they should be 4.5.1/2/3.
>>>>
>>>
>>> FYI, the first RFC started like that, but it only made the spec
>>> difficult to navigate and the section numbers too long.
>>>
>>> Still, no strong opinion. I'm okay moving it there, if you think it's better.
>>
>> It should be moved and the interface name should be renamed. It makes a lot
>> more sense with those changes.
>>
>> I've posted a patch for this.
>>
> 
> Thanks. I've rebased on top of it.
> 
> [snip]
>>>>> +3.  Query the minimum number of buffers required for the ``CAPTURE`` queue via
>>>>> +    :c:func:`VIDIOC_G_CTRL`. This is useful if the client intends to use more
>>>>> +    buffers than the minimum required by hardware/format.
>>>>
>>>> Is this step optional or required? Can it change when a resolution change occurs?
>>>
>>> Probably not with a simple resolution change, but a case when a stream
>>> is changed on the fly would trigger what we call "resolution change"
>>> here, but what would effectively be a "source change" and that could
>>> include a change in the number of required CAPTURE buffers.
>>>
>>>> How does this relate to the checks for the minimum number of buffers that REQBUFS
>>>> does?
>>>
>>> The control returns the minimum that REQBUFS would allow, so the
>>> application can add few more buffers on top of that and improve the
>>> pipelining.
>>>
>>>>
>>>> The 'This is useful if' sentence suggests that it is optional, but I think that
>>>> sentence just confuses the issue.
>>>>
>>>
>>> It used to be optional and I didn't rephrase it after turning it into
>>> mandatory. How about:
>>>
>>>     This enables the client to request more buffers
>>>     than the minimum required by hardware/format and achieve better pipelining.
>>
>> Hmm, OK. It'll do, I guess. I never liked these MIN_BUFFERS controls, I wish they
>> would return something like the recommended number of buffers that will give you
>> decent performance.
>>
> 
> The problem here is that the kernel doesn't know what is decent for
> the application, since it doesn't know how the results of the decoding
> are used. Over-allocating would result to a waste of memory, which
> could then make it less than decent for memory-constrained
> applications.
> 
>>>
>>>>> +
>>>>> +    * **Required fields:**
>>>>> +
>>>>> +      ``id``
>>>>> +          set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
>>>>> +
>>>>> +    * **Return fields:**
>>>>> +
>>>>> +      ``value``
>>>>> +          minimum number of buffers required to decode the stream parsed in
>>>>> +          this initialization sequence.
>>>>> +
>>>>> +    .. note::
>>>>> +
>>>>> +       The minimum number of buffers must be at least the number required to
>>>>> +       successfully decode the current stream. This may for example be the
>>>>> +       required DPB size for an H.264 stream given the parsed stream
>>>>> +       configuration (resolution, level).
>>>>> +
>>>>> +    .. warning::
>>>>> +
>>>>> +       The value is guaranteed to be meaningful only after the decoder
>>>>> +       successfully parses the stream metadata. The client must not rely on the
>>>>> +       query before that happens.
>>>>> +
>>>>> +4.  **Optional.** Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
>>>>> +    the ``CAPTURE`` queue. Once the stream information is parsed and known, the
>>>>> +    client may use this ioctl to discover which raw formats are supported for
>>>>> +    given stream and select one of them via :c:func:`VIDIOC_S_FMT`.
>>>>
>>>> Can the list returned here differ from the list returned in the 'Querying capabilities'
>>>> step? If so, then I assume it will always be a subset of what was returned in
>>>> the 'Querying' step?
>>>
>>> Depends on whether you're considering just VIDIOC_ENUM_FMT or also
>>> VIDIOC_G_FMT and VIDIOC_ENUM_FRAMESIZES.
>>>
>>> The initial VIDIOC_ENUM_FMT has no way to account for any resolution
>>> constraints of the formats, so the list would include all raw pixel
>>> formats that the decoder can handle with selected coded pixel format.
>>> However, the list can be further narrowed down by using
>>> VIDIOC_ENUM_FRAMESIZES, to restrict each raw format only to the
>>> resolutions it can handle.
>>>
>>> The VIDIOC_ENUM_FMT call in this sequence (after getting the stream
>>> information) would have the knowledge about the resolution, so the
>>> list returned here would only include the formats that can be actually
>>> handled. It should match the result of the initial query using both
>>> VIDIOC_ENUM_FMT and VIDIOC_ENUM_FRAMESIZES.
>>
>> Right, so this will be a subset of the initial query taking the resolution
>> into account.
>>
> 
> Do you think it could be worth adding a note about this?

Yes, it helps clarify things.

> 
> [snip]
>>>>> +Decoding
>>>>> +========
>>>>> +
>>>>> +This state is reached after the `Capture setup` sequence finishes succesfully.
>>>>> +In this state, the client queues and dequeues buffers to both queues via
>>>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following the standard
>>>>> +semantics.
>>>>> +
>>>>> +The contents of the source ``OUTPUT`` buffers depend on the active coded pixel
>>>>> +format and may be affected by codec-specific extended controls, as stated in
>>>>> +the documentation of each format.
>>>>> +
>>>>> +Both queues operate independently, following the standard behavior of V4L2
>>>>> +buffer queues and memory-to-memory devices. In addition, the order of decoded
>>>>> +frames dequeued from the ``CAPTURE`` queue may differ from the order of queuing
>>>>> +coded frames to the ``OUTPUT`` queue, due to properties of the selected coded
>>>>> +format, e.g. frame reordering.
>>>>> +
>>>>> +The client must not assume any direct relationship between ``CAPTURE``
>>>>> +and ``OUTPUT`` buffers and any specific timing of buffers becoming
>>>>> +available to dequeue. Specifically,
>>>>> +
>>>>> +* a buffer queued to ``OUTPUT`` may result in no buffers being produced
>>>>> +  on ``CAPTURE`` (e.g. if it does not contain encoded data, or if only
>>>>> +  metadata syntax structures are present in it),
>>>>> +
>>>>> +* a buffer queued to ``OUTPUT`` may result in more than 1 buffer produced
>>>>> +  on ``CAPTURE`` (if the encoded data contained more than one frame, or if
>>>>> +  returning a decoded frame allowed the decoder to return a frame that
>>>>> +  preceded it in decode, but succeeded it in the display order),
>>>>> +
>>>>> +* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
>>>>> +  ``CAPTURE`` later into decode process, and/or after processing further
>>>>> +  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
>>>>> +  reordering is used,
>>>>> +
>>>>> +* buffers may become available on the ``CAPTURE`` queue without additional
>>>>> +  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of the
>>>>> +  ``OUTPUT`` buffers queued in the past whose decoding results are only
>>>>> +  available at later time, due to specifics of the decoding process.
>>>>> +
>>>>> +.. note::
>>>>> +
>>>>> +   To allow matching decoded ``CAPTURE`` buffers with ``OUTPUT`` buffers they
>>>>> +   originated from, the client can set the ``timestamp`` field of the
>>>>> +   :c:type:`v4l2_buffer` struct when queuing an ``OUTPUT`` buffer. The
>>>>> +   ``CAPTURE`` buffer(s), which resulted from decoding that ``OUTPUT`` buffer
>>>>> +   will have their ``timestamp`` field set to the same value when dequeued.
>>>>> +
>>>>> +   In addition to the straighforward case of one ``OUTPUT`` buffer producing
>>
>> straighforward -> straightforward
>>
> 
> Ack.
> 
>>>>> +   one ``CAPTURE`` buffer, the following cases are defined:
>>>>> +
>>>>> +   * one ``OUTPUT`` buffer generates multiple ``CAPTURE`` buffers: the same
>>>>> +     ``OUTPUT`` timestamp will be copied to multiple ``CAPTURE`` buffers,
>>>>> +
>>>>> +   * multiple ``OUTPUT`` buffers generate one ``CAPTURE`` buffer: timestamp of
>>>>> +     the ``OUTPUT`` buffer queued last will be copied,
>>>>> +
>>>>> +   * the decoding order differs from the display order (i.e. the
>>>>> +     ``CAPTURE`` buffers are out-of-order compared to the ``OUTPUT`` buffers):
>>>>> +     ``CAPTURE`` timestamps will not retain the order of ``OUTPUT`` timestamps
>>>>> +     and thus monotonicity of the timestamps cannot be guaranteed.
>>
>> I think this last point should be rewritten. The timestamp is just a value that
>> is copied, there are no monotonicity requirements for m2m devices in general.
>>
> 
> Actually I just realized the last point might not even be achievable
> for some of the decoders (s5p-mfc, mtk-vcodec), as they don't report
> which frame originates from which bitstream buffer and the driver just
> picks the most recently consumed OUTPUT buffer to copy the timestamp
> from. (s5p-mfc actually "forgets" to set the timestamp in some cases
> too...)
> 
> I need to think a bit more about this.
> 
> [snip]
>>>>> +Seek
>>>>> +====
>>>>> +
>>>>> +Seek is controlled by the ``OUTPUT`` queue, as it is the source of coded data.
>>>>> +The seek does not require any specific operation on the ``CAPTURE`` queue, but
>>>>> +it may be affected as per normal decoder operation.
>>>>> +
>>>>> +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
>>>>> +   :c:func:`VIDIOC_STREAMOFF`.
>>>>> +
>>>>> +   * **Required fields:**
>>>>> +
>>>>> +     ``type``
>>>>> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
>>>>> +
>>>>> +   * The decoder will drop all the pending ``OUTPUT`` buffers and they must be
>>>>> +     treated as returned to the client (following standard semantics).
>>>>> +
>>>>> +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
>>>>> +
>>>>> +   * **Required fields:**
>>>>> +
>>>>> +     ``type``
>>>>> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
>>>>> +
>>>>> +   * The decoder will start accepting new source bitstream buffers after the
>>>>> +     call returns.
>>>>> +
>>>>> +3. Start queuing buffers containing coded data after the seek to the ``OUTPUT``
>>>>> +   queue until a suitable resume point is found.
>>>>> +
>>>>> +   .. note::
>>>>> +
>>>>> +      There is no requirement to begin queuing coded data starting exactly
>>>>> +      from a resume point (e.g. SPS or a keyframe). Any queued ``OUTPUT``
>>>>> +      buffers will be processed and returned to the client until a suitable
>>>>> +      resume point is found.  While looking for a resume point, the decoder
>>>>> +      should not produce any decoded frames into ``CAPTURE`` buffers.
>>>>> +
>>>>> +      Some hardware is known to mishandle seeks to a non-resume point. Such an
>>>>> +      operation may result in an unspecified number of corrupted decoded frames
>>>>> +      being made available on the ``CAPTURE`` queue. Drivers must ensure that
>>>>> +      no fatal decoding errors or crashes occur, and implement any necessary
>>>>> +      handling and workarounds for hardware issues related to seek operations.
>>>>
>>>> Is there a requirement that those corrupted frames have V4L2_BUF_FLAG_ERROR set?
>>>> I.e., can userspace detect those currupted frames?
>>>>
>>>
>>> I think the question is whether the kernel driver can actually detect
>>> those corrupted frames. We can't guarantee reporting errors to the
>>> userspace, if the hardware doesn't actually report them.
>>>
>>> Could we perhaps keep this an open question and possibly address with
>>> some extension that could be an opt in for the decoders that can
>>> report errors?
>>
>> Hmm, how about: If the hardware can detect such corrupted decoded frames, then
>> it shall set V4L2_BUF_FLAG_ERROR.
>>
> 
> Sounds good to me.
> 
> Actually, let me add a paragraph about error handling in the decoding
> section, since it's a general problem, not limited to seeking. I can
> then refer to it from the Seek sequence.
> 
>>>
>>>>> +
>>>>> +   .. warning::
>>>>> +
>>>>> +      In case of the H.264 codec, the client must take care not to seek over a
>>>>> +      change of SPS/PPS. Even though the target frame could be a keyframe, the
>>>>> +      stale SPS/PPS inside decoder state would lead to undefined results when
>>>>> +      decoding. Although the decoder must handle such case without a crash or a
>>>>> +      fatal decode error, the client must not expect a sensible decode output.
>>>>> +
>>>>> +4. After a resume point is found, the decoder will start returning ``CAPTURE``
>>>>> +   buffers containing decoded frames.
>>>>> +
>>>>> +.. important::
>>>>> +
>>>>> +   A seek may result in the `Dynamic resolution change` sequence being
>>>>> +   iniitated, due to the seek target having decoding parameters different from
>>>>> +   the part of the stream decoded before the seek. The sequence must be handled
>>>>> +   as per normal decoder operation.
>>>>> +
>>>>> +.. warning::
>>>>> +
>>>>> +   It is not specified when the ``CAPTURE`` queue starts producing buffers
>>>>> +   containing decoded data from the ``OUTPUT`` buffers queued after the seek,
>>>>> +   as it operates independently from the ``OUTPUT`` queue.
>>>>> +
>>>>> +   The decoder may return a number of remaining ``CAPTURE`` buffers containing
>>>>> +   decoded frames originating from the ``OUTPUT`` buffers queued before the
>>>>> +   seek sequence is performed.
>>>>> +
>>>>> +   The ``VIDIOC_STREAMOFF`` operation discards any remaining queued
>>>>> +   ``OUTPUT`` buffers, which means that not all of the ``OUTPUT`` buffers
>>>>> +   queued before the seek sequence may have matching ``CAPTURE`` buffers
>>>>> +   produced.  For example, given the sequence of operations on the
>>>>> +   ``OUTPUT`` queue:
>>>>> +
>>>>> +     QBUF(A), QBUF(B), STREAMOFF(), STREAMON(), QBUF(G), QBUF(H),
>>>>> +
>>>>> +   any of the following results on the ``CAPTURE`` queue is allowed:
>>>>> +
>>>>> +     {A’, B’, G’, H’}, {A’, G’, H’}, {G’, H’}.
>>>>
>>>> Isn't it the case that if you would want to avoid that, then you should call
>>>> DECODER_STOP, wait for the last buffer on the CAPTURE queue, then seek and
>>>> call DECODER_START. If you do that, then you should always get {A’, B’, G’, H’}.
>>>> (basically following the Drain sequence).
>>>
>>> Yes, it is, but I think it depends on the application needs. Here we
>>> just give a primitive to change the place in the stream that's being
>>> decoded (or change the stream on the fly).
>>>
>>> Actually, with the timestamp copy, I guess we wouldn't even need to do
>>> the DECODER_STOP, as we could just discard the CAPTURE buffers until
>>> we get one that matches the timestamp of the first OUTPUT buffer after
>>> the seek.
>>>
>>>>
>>>> Admittedly, you typically want to do an instantaneous seek, so this is probably
>>>> not what you want to do normally.
>>>>
>>>> It might help to have this documented in a separate note.
>>>>
>>>
>>> The instantaneous seek is documented below. I'm not sure if there is
>>> any practical need to document the other case, but I could add a
>>> sentence like below to the warning above. What do you think?
>>>
>>>    The ``VIDIOC_STREAMOFF`` operation discards any remaining queued
>>>    ``OUTPUT`` buffers, which means that not all of the ``OUTPUT`` buffers
>>>    queued before the seek sequence may have matching ``CAPTURE`` buffers
>>>    produced.  For example, given the sequence of operations on the
>>>    ``OUTPUT`` queue:
>>>
>>>      QBUF(A), QBUF(B), STREAMOFF(), STREAMON(), QBUF(G), QBUF(H),
>>>
>>>    any of the following results on the ``CAPTURE`` queue is allowed:
>>
>> is allowed -> are allowed
>>
> 
> Only one can happen at given time, so I think singular is correct
> here? (i.e. Any [...] is allowed)

I think you are right.

> 
> [snip]
>>>>> diff --git a/Documentation/media/uapi/v4l/vidioc-g-fmt.rst b/Documentation/media/uapi/v4l/vidioc-g-fmt.rst
>>>>> index 3ead350e099f..0fc0b78a943e 100644
>>>>> --- a/Documentation/media/uapi/v4l/vidioc-g-fmt.rst
>>>>> +++ b/Documentation/media/uapi/v4l/vidioc-g-fmt.rst
>>>>> @@ -53,6 +53,13 @@ devices that is either the struct
>>>>>  member. When the requested buffer type is not supported drivers return
>>>>>  an ``EINVAL`` error code.
>>>>>
>>>>> +A stateful mem2mem decoder will not allow operations on the
>>>>> +``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
>>>>> +buffer type until the corresponding ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
>>>>> +``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE`` buffer type is configured. If such an
>>>>> +operation is attempted, drivers return an ``EACCES`` error code. Refer to
>>>>> +:ref:`decoder` for more details.
>>>>
>>>> This isn't right. EACCES is returned as long as the output format resolution is
>>>> unknown. If it is set explicitly, then this will work without an error.
>>
>> Ah, sorry, I phrased that poorly. Let me try again:
>>
>> This isn't right. EACCES is returned for CAPTURE operations as long as the
>> output format resolution is unknown or the CAPTURE format is explicitly set.
>> If the CAPTURE format is set explicitly, then this will work without an error.
> 
> We don't allow directly setting CAPTURE format explicitly either,
> because the driver wouldn't have anything to validate the format
> against. We allow the client to do it indirectly, by setting the width
> and height of the OUTPUT format, which unblocks the CAPTURE format
> operations, because the driver can then validate against the OUTPUT
> (coded) format.

You are completely right, just forget what I said.

Regards,

	Hans

> 
> Best regards,
> Tomasz
>