[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c90a5fd3-f52e-4103-a979-7f155733bb59@linaro.org>
Date: Fri, 13 Jun 2025 12:13:42 +0300
From: Vladimir Zapolskiy <vladimir.zapolskiy@...aro.org>
To: Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
Robert Foss <rfoss@...nel.org>, Todor Tomov <todor.too@...il.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>, Hans Verkuil
<hverkuil@...all.nl>, Depeng Shao <quic_depengs@...cinc.com>,
Hans Verkuil <hans.verkuil@...co.com>
Cc: linux-media@...r.kernel.org, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Johan Hovold <johan+linaro@...nel.org>
Subject: Re: [PATCH 2/2] media: qcom: camss: vfe: Fix registration sequencing
bug
Hi Bryan.
On 6/12/25 11:07, Bryan O'Donoghue wrote:
> msm_vfe_register_entities loops through each Raw Data Interface input line.
> For each loop we add video device with its associated pads.
>
> Once a single /dev/video0 node has been populated it is possible for
Here is a typo, /dev/video0 should be replaced by something like /dev/videoX.
> camss_find_sensor_pad to run. This routine scans through a list of media
> entities taking a pointer pad = media_entity->pad[0] and assuming that
> pointer is always valid.
>
> It is possible for both the enumeration loop in msm_vfe_register_entities()
> and a call from user-space to run concurrently.
Here comes my insufficient understanding, please explain further.
Per se this concurrent execution shall not lead to the encountered bug,
both an initialization of media entity pads by media_entity_pads_init()
and a registration of a v4l2 devnode inside msm_video_register() are
done under in a proper sequence, aren't they?
From what I read there is no bug stated.
> Adding some deliberate sleep code into the loop in
> msm_vfe_register_entities() and constructing a user-space program to open
> every /dev/videoX node in a tight continuous loop, quickly shows the
> following error.
>
> [ 691.074558] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> [ 691.074933] Call trace:
> [ 691.074935] camss_find_sensor_pad+0x74/0x114 [qcom_camss] (P)
> [ 691.074946] camss_get_pixel_clock+0x18/0x64 [qcom_camss]
> [ 691.074956] vfe_get+0xc0/0x54c [qcom_camss]
> [ 691.074968] vfe_set_power+0x58/0xf4c [qcom_camss]
> [ 691.074978] pipeline_pm_power_one+0x124/0x140 [videodev]
> [ 691.074986] pipeline_pm_power+0x70/0x100 [videodev]
> [ 691.074992] v4l2_pipeline_pm_use+0x54/0x90 [videodev]
> [ 691.074998] v4l2_pipeline_pm_get+0x14/0x20 [videodev]
> [ 691.075005] video_open+0x74/0xe0 [qcom_camss]
> [ 691.075014] v4l2_open+0xa8/0x124 [videodev]
> [ 691.075021] chrdev_open+0xb0/0x21c
> [ 691.075031] do_dentry_open+0x138/0x4c4
> [ 691.075040] vfs_open+0x2c/0xe8
> [ 691.075044] path_openat+0x6f0/0x10a0
> [ 691.075050] do_filp_open+0xa8/0x164
> [ 691.075054] do_sys_openat2+0x94/0x104
> [ 691.075058] __arm64_sys_openat+0x64/0xc0
> [ 691.075061] invoke_syscall+0x48/0x104
> [ 691.075069] el0_svc_common.constprop.0+0x40/0xe0
> [ 691.075075] do_el0_svc+0x1c/0x28
> [ 691.075080] el0_svc+0x30/0xcc
> [ 691.075085] el0t_64_sync_handler+0x10c/0x138
> [ 691.075088] el0t_64_sync+0x198/0x19c
>
> Taking the vfe->power_lock is not possible since
> v4l2_device_register_subdev takes the mdev->graph_lock. Later on fops->open
> takes the mdev->graph_lock followed by vfe_get() -> taking vfe->power_lock.
It's unclear what is the connection between the issue and a call to
v4l2_device_register_subdev(), the latter is related to /dev/v4l-subdevX
devnodes, but all way above the talk was about /dev/videoX devnodes, no?
> Introduce a simple enumeration_complete bool which is false initially and
> only set true once in our init routine after we complete enumeration.
It might be a fix (what is the bug actually? it's still left unexplained)
at the price of the machine state complification, a much better fix would
be not to create and expose a non-ready /dev/videoX devnode by calling
video_register_device() too early.
>
> If user-space tries to interact with the VFE before complete enumeration it
> will receive -EAGAIN.
It sounds like a critical change in the kernel to userspace ABI of open(2)
syscall for CAMSS V4L2 devnodes, unfortunately... EAGAIN could be received,
if open() is called with O_NONBLOCK flag, otherwise the syscall shall be
blocked.
I believe a completion of media device entities/pads registration before
creating a devnode should solve all the issues in a proper way.
> Cc: stable@...r.kernel.org
> Fixes: 4c98a5f57f90 ("media: camss: Add VFE files")
> Reported-by: Johan Hovold <johan+linaro@...nel.org>
> Closes: https://lore.kernel.org/all/Zwjw6XfVWcufMlqM@hovoldconsulting.com
> Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
--
Best wishes,
Vladimir
Powered by blists - more mailing lists