[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aE_hlGHkRZqFFacR@hovoldconsulting.com>
Date: Mon, 16 Jun 2025 11:19:16 +0200
From: Johan Hovold <johan@...nel.org>
To: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
Cc: Robert Foss <rfoss@...nel.org>, Todor Tomov <todor.too@...il.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Hans Verkuil <hverkuil@...all.nl>,
Depeng Shao <quic_depengs@...cinc.com>,
Vladimir Zapolskiy <vladimir.zapolskiy@...aro.org>,
Hans Verkuil <hans.verkuil@...co.com>, linux-media@...r.kernel.org,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org, Johan Hovold <johan+linaro@...nel.org>
Subject: Re: [PATCH 2/2] media: qcom: camss: vfe: Fix registration sequencing
bug
On Thu, Jun 12, 2025 at 09:07:16AM +0100, Bryan O'Donoghue wrote:
> msm_vfe_register_entities loops through each Raw Data Interface input line.
> For each loop we add video device with its associated pads.
>
> Once a single /dev/video0 node has been populated it is possible for
> camss_find_sensor_pad to run. This routine scans through a list of media
> entities taking a pointer pad = media_entity->pad[0] and assuming that
> pointer is always valid.
>
> It is possible for both the enumeration loop in msm_vfe_register_entities()
> and a call from user-space to run concurrently.
>
> Adding some deliberate sleep code into the loop in
> msm_vfe_register_entities() and constructing a user-space program to open
> every /dev/videoX node in a tight continuous loop, quickly shows the
> following error.
>
> [ 691.074558] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> [ 691.074933] Call trace:
> [ 691.074935] camss_find_sensor_pad+0x74/0x114 [qcom_camss] (P)
> [ 691.074946] camss_get_pixel_clock+0x18/0x64 [qcom_camss]
> [ 691.074956] vfe_get+0xc0/0x54c [qcom_camss]
> [ 691.074968] vfe_set_power+0x58/0xf4c [qcom_camss]
> [ 691.074978] pipeline_pm_power_one+0x124/0x140 [videodev]
> [ 691.074986] pipeline_pm_power+0x70/0x100 [videodev]
> [ 691.074992] v4l2_pipeline_pm_use+0x54/0x90 [videodev]
> [ 691.074998] v4l2_pipeline_pm_get+0x14/0x20 [videodev]
> [ 691.075005] video_open+0x74/0xe0 [qcom_camss]
> [ 691.075014] v4l2_open+0xa8/0x124 [videodev]
> [ 691.075021] chrdev_open+0xb0/0x21c
> [ 691.075031] do_dentry_open+0x138/0x4c4
> [ 691.075040] vfs_open+0x2c/0xe8
> Taking the vfe->power_lock is not possible since
> v4l2_device_register_subdev takes the mdev->graph_lock. Later on fops->open
> takes the mdev->graph_lock followed by vfe_get() -> taking vfe->power_lock.
>
> Introduce a simple enumeration_complete bool which is false initially and
> only set true once in our init routine after we complete enumeration.
>
> If user-space tries to interact with the VFE before complete enumeration it
> will receive -EAGAIN.
As Vladimir also pointed out, this is at best just papering over the
issue.
You need to make sure the video device is not registered until it's
ready to be used. That is the bug that needs fixing.
Johan
Powered by blists - more mailing lists