linux-kernel - Re: [PATCH] drm/fourcc: Add modifier definition for describing Verisilicon video framebuffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8e06d9dcc664918d91fb6e7dd80692230df9b68.camel@collabora.com>
Date: Wed, 06 Nov 2024 14:49:24 -0500
From: Nicolas Dufresne <nicolas.dufresne@...labora.com>
To: Benjamin Gaignard <benjamin.gaignard@...labora.com>, 
	maarten.lankhorst@...ux.intel.com, mripard@...nel.org, tzimmermann@...e.de,
 	airlied@...il.com, simona@...ll.ch, laurentiu.palcu@....com,
 aisheng.dong@....com,  Lucas Stach <l.stach@...gutronix.de>, Philipp Zabel
 <p.zabel@...gutronix.de>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, Collabora
 Kernel ML <kernel@...labora.com>
Subject: Re: [PATCH] drm/fourcc: Add modifier definition for describing
 Verisilicon video framebuffer

Hi,

Le mercredi 06 novembre 2024 à 16:53 +0100, Benjamin Gaignard a écrit :
> + nicolas

Thanks for the CC, I'm obviously watching kernel@...labora.com, I don't know why
it didn't make it to my mailbox.

I'm adding explicitly Lucas and Philipp, as I believe they can provide relevant
information here.

> 
> Le 06/11/2024 à 14:30, Benjamin Gaignard a écrit :
> > Verisilicon hardware video decoders can produce tiled (8x4 or 4x4)
> > and compressed video framebuffers.
> > It considerably reduces memory bandwidth while writing and reading
> > frames in memory.

I've seen for years this 8x4 references, but in reality, and I've implemented
software converters that works on all the VSI/Hantro drivers we have mainline,
there is no such thing as 8x4 tiled coming out of these chips.

Unless we have new evidence, and V4L2 patches enabling these formats, I don't
see any point of bringing what I believe is a TRM mistake, or an historical
format.

> > 
> > The underlying storage in NV12 (for 8-bit) or NV15 (for 10-bit).
> > 
> > Display controllers, like imx DCSS, could use these modifier definition
> > as input for overlay planes.
> > 
> > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@...labora.com>
> > ---
> > The original code comes from:
> > https://github.com/nxp-imx/linux-imx/commit/ab01b7fe82d5a11dfb533cfbd08c4dfa140815de
> > I have port it and modify DRM_FORMAT_MOD_VENDOR_VSI value.
> > 
> >   include/uapi/drm/drm_fourcc.h | 27 +++++++++++++++++++++++++++
> >   1 file changed, 27 insertions(+)
> > 
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 78abd819fd62..31d09a98d0d7 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -421,6 +421,7 @@ extern "C" {
> >   #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> >   #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> >   #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> > +#define DRM_FORMAT_MOD_VENDOR_VSI     0x0b
> >   
> >   /* add more to the end as needed */
> >   
> > @@ -1607,6 +1608,32 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
> >   #define AMD_FMT_MOD_CLEAR(field) \
> >   	(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
> >   
> > +/* Verisilicon framebuffer modifiers */
> > +
> > +/*
> > + * Verisilicon 8x4 tiling layout
> > + *
> > + * This is G1 VPU tiled layout using tiles of 8x4 pixels in a row-major
> > + * layout.
> > + */
> > +#define DRM_FORMAT_MOD_VSI_G1_TILED fourcc_mod_code(VSI, 1)

I have code in GStreamer mainline that handle the tiled G1 output in software
and its 4x4, no doubt here ... 

> > +
> > +/*
> > + * Verisilicon 4x4 tiling layout
> > + *
> > + * This is G2 VPU tiled layout using tiles of 4x4 pixels in a row-major
> > + * layout.
> > + */
> > +#define DRM_FORMAT_MOD_VSI_G2_TILED fourcc_mod_code(VSI, 2)

... Meaning this split make no sense, G2 shares the same format in V4L2 and
works well in GStreamer software converters. In fact, in the NXP TRM, you should
notice that in the G1 section, they document G2 tile and G1 tile is ignored.
Perhaps the DCSS do implement some 8x4 tiling, but considering we don't have
evidence of anything producing that, we should probably not have it.

>From the documentation from NXP, G2 Tile is identical to
DRM_FORMAT_MOD_VIVANTE_TILED, so we really should reuse the existing definition.
If not, I'd really like to know in the text why. Running Etnaviv patched with
NV12 support (which I think got merged now), I can see that the combination is
supported (but haven't tested it though since its not mapped properly in
GStreamer yet):


| NV12         | NV12                    | external only |
|              | NV12:0x0600000000000001 | external only |
|              | NV12:0x0600000000000002 | external only |

Having a GL path is very efficient, but also gives compositors a composition
fallback which is needed to enable a format and guarantee it can be rendered at
all time.

> > +
> > +/*
> > + * Verisilicon 4x4 tiling with compression layout
> > + *
> > + * This is G2 VPU tiled layout using tiles of 4x4 pixels in a row-major
> > + * layout with compression.
> > + */
> > +#define DRM_FORMAT_MOD_VSI_G2_TILED_COMPRESSED fourcc_mod_code(VSI, 3)

This one needs definition, but the doc provided does not seem to match what I
can see in IMX8M TRM (see 14.2.2.3 Compressed format), here's a snippet, I don't
yet fully understand it. They only explain the storage, not the compression
technique, but this should be enough to understand its not a simple row-major
4x4 tiling.

   When compression is enabled, the picture is divided into CBS rows, which is
   further divided into continuous CBS groups. Each CBS group is composed of 16
   CBS. The luminance CBS is composed of 1 8x8 coded block (CB), which one
   chrominance CBS is composed of two 8x4 coded blocks, with cb CB first then cr
   CB following. The compression is performed to each CB in the CBS. The first
   CB's compressed data saved from the same offset of that CBS in raster scan
   buffer. And the compressed output of following CBs in the CBS row is saved
   continuously. That means there may be gaps between the compressed data of
   each CBS row.

Nicolas

> > +
> >   #if defined(__cplusplus)
> >   }
> >   #endif