lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <HE1PR06MB401167BD655F537D4136737AACB90@HE1PR06MB4011.eurprd06.prod.outlook.com>
Date:   Tue, 3 Sep 2019 20:13:08 +0000
From:   Jonas Karlman <jonas@...boo.se>
To:     Philipp Zabel <p.zabel@...gutronix.de>,
        Ezequiel Garcia <ezequiel@...labora.com>
CC:     Mauro Carvalho Chehab <mchehab@...nel.org>,
        Hans Verkuil <hverkuil@...all.nl>,
        Boris Brezillon <boris.brezillon@...labora.com>,
        Paul Kocialkowski <paul.kocialkowski@...tlin.com>,
        "linux-media@...r.kernel.org" <linux-media@...r.kernel.org>,
        "linux-rockchip@...ts.infradead.org" 
        <linux-rockchip@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 03/12] media: hantro: Fix H264 motion vector buffer offset

On 2019-09-03 12:58, Philipp Zabel wrote:
> Hi Jonas,
>
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>> A decoded 8-bit 4:2:0 frame need memory for up to 448 macroblocks
>> and is laid out in memory as follow:
> Do you mean "A decoded 8-bit 4:2:0 frame needs up to 448 bytes per
> macroblock"?
>
> A 1280x720 frame already consists of 3600 macroblocks (each 16x16 Y +
> 2x8x8 Cb,Cr).

You are correct, thanks for pointing out, I will change in a v2.

>
>> +-------------------+
>>> Y-plane   256 MBs |
> So that looks like it should be 256 bytes * number of macroblocks
> instead, same for the following two.

Ack.

>
>> +-------------------+
>>> UV-plane  128 MBs |
>> +-------------------+
>>> MV buffer  64 MBs |
>> +-------------------+
>>
>> The motion vector buffer offset is currently correct for 4:2:0 because
>> the extra space for motion vectors is overallocated with an extra 64 MBs.
>>
>> Wrong offset for both destination and motion vector buffer are used
>> for the bottom field of field encoded content, wrong offset is
>> also used for 4:0:0 (monochrome) content.
>>
>> Fix this by always setting the motion vector address to the expected
>> 384 MBs offset for 4:2:0 and 256 MBs offset for 4:0:0 content.
> Expected by whom? For example, could these be placed in separate buffers
> instead of appended to the VB2 allocated buffers?

From what I understand main and high profile decoding have hw constraints in that
the direct mode motion vectors buffer must be located continuously after the YUV buffer.

I also observed instances where the current requirement for profile_idc > 66 caused issues
for some streams, e.g. big_buck_bunny_1080p_H264_AAC_25fps_7200K.mp4

Because of this it was just easier to always configure the motion vector buffer address.

>
>> Also use correct destination and motion vector buffer offset
>> for the bottom field of field encoded content.
>>
>> While at it also extend the check for 4:0:0 (monochrome) to include an
>> additional check for High Profile (100).
>>
>> Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on G1")
>> Signed-off-by: Jonas Karlman <jonas@...boo.se>
>> ---
>>  .../staging/media/hantro/hantro_g1_h264_dec.c | 33 +++++++++++--------
>>  1 file changed, 19 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> index 7ab534936843..159bd67e0a36 100644
>> --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
>> @@ -19,6 +19,9 @@
>>  #include "hantro_hw.h"
>>  #include "hantro_v4l2.h"
>>  
>> +#define MV_OFFSET_420	384
>> +#define MV_OFFSET_400	256
>> +
>>  static void set_params(struct hantro_ctx *ctx)
>>  {
>>  	const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls;
>> @@ -49,8 +52,8 @@ static void set_params(struct hantro_ctx *ctx)
>>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0);
>>  
>>  	/* Decoder control register 1. */
>> -	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(sps->pic_width_in_mbs_minus1 + 1) |
>> -	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(sps->pic_height_in_map_units_minus1 + 1) |
>> +	reg = G1_REG_DEC_CTRL1_PIC_MB_WIDTH(H264_MB_WIDTH(ctx->dst_fmt.width)) |
>> +	      G1_REG_DEC_CTRL1_PIC_MB_HEIGHT_P(H264_MB_HEIGHT(ctx->dst_fmt.height)) |
>>  	      G1_REG_DEC_CTRL1_REF_FRAMES(sps->max_num_ref_frames);
>>  	vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL1);
>>  
>> @@ -79,7 +82,7 @@ static void set_params(struct hantro_ctx *ctx)
>>  		reg |= G1_REG_DEC_CTRL4_CABAC_E;
>>  	if (sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE)
>>  		reg |= G1_REG_DEC_CTRL4_DIR_8X8_INFER_E;
>> -	if (sps->chroma_format_idc == 0)
>> +	if (sps->profile_idc >= 100 && sps->chroma_format_idc == 0)
>>  		reg |= G1_REG_DEC_CTRL4_BLACKWHITE_E;
>>  	if (pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED)
>>  		reg |= G1_REG_DEC_CTRL4_WEIGHT_PRED_E;
>> @@ -233,6 +236,7 @@ static void set_buffers(struct hantro_ctx *ctx)
>>  	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>>  	struct hantro_dev *vpu = ctx->dev;
>>  	dma_addr_t src_dma, dst_dma;
>> +	unsigned int offset = MV_OFFSET_420;
>>  
>>  	src_buf = hantro_get_src_buf(ctx);
>>  	dst_buf = hantro_get_dst_buf(ctx);
>> @@ -243,19 +247,20 @@ static void set_buffers(struct hantro_ctx *ctx)
>>  
>>  	/* Destination (decoded frame) buffer. */
>>  	dst_dma = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
>> +	if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)
>> +		dst_dma += ALIGN(ctx->dst_fmt.width, H264_MB_DIM);
> How does this work? Does userspace decode two fields into the same
> capture buffer and the hardware writes each field with a stride of 2
> lines? I suppose this corresponds to V4L2_FIELD_INTERLACED. Could this
> also be made to support V4L2_FIELD_SEQ_TB output?

Yes, both fields are decoded into the same capture buffer, top field to odd numbered lines
and bottom field to even numbered lines, so I guess this corresponds to V4L2_FIELD_INTERLACED.
This is also how the cedrus driver handles field decoding with Jernej's h264 patches.

I do not know if it is possible to configure the hw to decode into a V4L2_FIELD_SEQ_TB type output.

Regards,
Jonas

>
> regards
> Philipp

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ