lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe113f83-fbbd-4e3b-8b42-a4f50c7c7489@linaro.org>
Date: Mon, 16 Jun 2025 18:00:19 +0300
From: Vladimir Zapolskiy <vladimir.zapolskiy@...aro.org>
To: Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
 Johan Hovold <johan@...nel.org>
Cc: Robert Foss <rfoss@...nel.org>, Todor Tomov <todor.too@...il.com>,
 Mauro Carvalho Chehab <mchehab@...nel.org>, Hans Verkuil
 <hverkuil@...all.nl>, Depeng Shao <quic_depengs@...cinc.com>,
 linux-media@...r.kernel.org, linux-arm-msm@...r.kernel.org,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org,
 Johan Hovold <johan+linaro@...nel.org>
Subject: Re: [PATCH 2/2] media: qcom: camss: vfe: Fix registration sequencing
 bug

Hi Bryan.

On 6/16/25 17:09, Bryan O'Donoghue wrote:
> On 13/06/2025 10:13, Vladimir Zapolskiy wrote:
>>
>> Per se this concurrent execution shall not lead to the encountered bug,
> 
> What does that mean ? Please re-read the commit log, the analysis is all
> there.

The concurrent execution does not state a problem, moreover it's a feature
of operating systems.

>> both an initialization of media entity pads by media_entity_pads_init()
>> and a registration of a v4l2 devnode inside msm_video_register() are
>> done under in a proper sequence, aren't they?
> 
> No, I clearly haven't explained this clearly enough in the commit log.
> 
> vfe0_rdi0 == /dev/video0 is complete. vfe0_rdi1 is not complete there is
> no /dev/video1 in user-space.

Please let me ask for a few improvements to the commit message of the next
version of the fix.

Te information like "vfe0_rdi0 == /dev/video0" etc. above vaguely assumes
so much of the context, that the statements become wrong, let's remove
ambiguity instead of its amplification.

> vfe_get() is called for an RDI in a VFE, camss_find_sensor_pad() assumes
> all RDIs are populated.
> 

This is a good and almost sufficient one line problem description.

Still there is an issue, you mention vfe_get() and camss_find_sensor_pad()
functions, however both of them are good, and the problem lays within
vfe_set_clock_rates() function, that's the exact place in the driver code,
which iterates over all VFE lines like all of them are initialized.

> We can't use any VFE mutex to synchronise this because
> 
> lock(vfe->mutex);
> lock(media->mutex);
> 
> and
> lock(media->mutex);
> lock(vfe->mutex);
> 
> happen.
> 
> So we can educate vfe_get() about the RDI it is operating on or we can
> flag that a VFE - all of it's subordinate RDIs are available.
> 
> I didn't much like teaching vfe_get() about which RDI index because the
> code looked ugly for 8916 you have to assume on one of the code paths
> that it always operates on RDI0, which is an invalid assumption.

vfe_get() and mutices are all red herring, there is no problem with
vfe_get(), there is no problem with camss_find_sensor_pad(), and there
is no expectation to find a proper fix in any of these two functions.

Johan and me pointed the way out how to fix the encoundered issue properly,
once again, and please don't hesitate to ask questions, if my short
explanations are unclear to you.

The fix is to issue any of VFE line devnodes for userspace strictly after
the completion of all media entity pads initialization. Do you have an
idea how to implement it, or should I help with it? It'd be totally okay.

> The other way to fix this is:
> 
> +++ b/drivers/media/platform/qcom/camss/camss.c
> @@ -2988,7 +2988,7 @@ struct media_pad *camss_find_sensor_pad(struct
> media_entity *entity)
> 
>           while (1) {
>                   pad = &entity->pads[0];
> -               if (!(pad->flags & MEDIA_PAD_FL_SINK))
> +               if (!pad || !(pad->flags & MEDIA_PAD_FL_SINK))
> 
> 
> But then you see that every other driver treats pad = &entity->pads[0]
> as always non-NULL.

There is another expected way with zero problems, see the comment above.

There is no proven problem with camss_find_sensor_pad() funcition, and
it should be left unmodified.

--
Best wishes,
Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ