[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251103081601.ybwv7jzpxd6mndki@hu-mojha-hyd.qualcomm.com>
Date: Mon, 3 Nov 2025 13:46:01 +0530
From: Mukesh Ojha <mukesh.ojha@....qualcomm.com>
To: Shivendra Pratap <shivendra.pratap@....qualcomm.com>
Cc: Bjorn Andersson <andersson@...nel.org>,
Bartosz Golaszewski <bartosz.golaszewski@...aro.org>,
Sebastian Reichel <sre@...nel.org>, Rob Herring <robh@...nel.org>,
Sudeep Holla <sudeep.holla@....com>,
Souvik Chakravarty <Souvik.Chakravarty@....com>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>, Andy Yan <andy.yan@...k-chips.com>,
Mark Rutland <mark.rutland@....com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Konrad Dybcio <konradybcio@...nel.org>,
cros-qcom-dts-watchers@...omium.org, Vinod Koul <vkoul@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Florian Fainelli <florian.fainelli@...adcom.com>,
Moritz Fischer <moritz.fischer@...us.com>,
John Stultz <john.stultz@...aro.org>,
Matthias Brugger <matthias.bgg@...il.com>,
Krzysztof Kozlowski <krzk@...nel.org>,
Dmitry Baryshkov <dmitry.baryshkov@....qualcomm.com>,
Stephen Boyd <swboyd@...omium.org>,
Andre Draszik <andre.draszik@...aro.org>,
Kathiravan Thirumoorthy <kathiravan.thirumoorthy@....qualcomm.com>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-arm-msm@...r.kernel.org,
Elliot Berman <quic_eberman@...cinc.com>,
Srinivas Kandagatla <srini@...nel.org>
Subject: Re: [PATCH v16 01/14] power: reset: reboot-mode: Synchronize list
traversal
On Wed, Oct 29, 2025 at 11:18:52PM +0530, Shivendra Pratap wrote:
>
>
> On 10/28/2025 9:04 AM, Bjorn Andersson wrote:
> > On Wed, Oct 15, 2025 at 10:08:16AM +0530, Shivendra Pratap wrote:
> >> List traversals must be synchronized to prevent race conditions
> >> and data corruption. The reboot-mode list is not protected by a
> >> lock currently, which can lead to concurrent access and race.
> >
> > Is it a theoretical future race or something that we can hit in the
> > current implementation?
> >
> >>
> >> Introduce a mutex lock to guard all operations on the reboot-mode
> >> list and ensure thread-safe access. The change prevents unsafe
> >> concurrent access on reboot-mode list.
> >
> > I was under the impression that these lists where created during boot
> > and then used at some later point, which at best would bring a
> > theoretical window for a race... Reviewing the code supports my
> > understanding, but perhaps I'm missing something?
> >
> >>
> >> Fixes: 4fcd504edbf7 ("power: reset: add reboot mode driver")
> >> Fixes: ca3d2ea52314 ("power: reset: reboot-mode: better compatibility with DT (replace ' ,/')")
> >>
> >
> > Skip this empty line, please.
> >
> >
> > And given that you have fixes here, I guess this is a problem today. In
> > which case, this shouldn't have been carried for 16 versions - but have
> > sent and been merged on its own already.
> >
> > So please, if this is a real issue, start your commit message with a
> > descriptive problem description, to make it clear that this needs to be
> > merged yesterday - or drop the fixes.
> >
> >> Signed-off-by: Shivendra Pratap <shivendra.pratap@....qualcomm.com>
> >> ---
> >> drivers/power/reset/reboot-mode.c | 96 +++++++++++++++++++++------------------
> >> include/linux/reboot-mode.h | 4 ++
> >> 2 files changed, 57 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/drivers/power/reset/reboot-mode.c b/drivers/power/reset/reboot-mode.c
> >> index fba53f638da04655e756b5f8b7d2d666d1379535..8fc3e14638ea757c8dc3808c240ff569cbd74786 100644
> >> --- a/drivers/power/reset/reboot-mode.c
> >> +++ b/drivers/power/reset/reboot-mode.c
> >> @@ -29,9 +29,11 @@ static unsigned int get_reboot_mode_magic(struct reboot_mode_driver *reboot,
> >> if (!cmd)
> >> cmd = normal;
> >>
> >> - list_for_each_entry(info, &reboot->head, list)
> >> - if (!strcmp(info->mode, cmd))
> >> - return info->magic;
> >> + scoped_guard(mutex, &reboot->rb_lock) {
> >> + list_for_each_entry(info, &reboot->head, list)
> >> + if (!strcmp(info->mode, cmd))
> >> + return info->magic;
> >> + }
> >>
> >> /* try to match again, replacing characters impossible in DT */
> >> if (strscpy(cmd_, cmd, sizeof(cmd_)) == -E2BIG)
> >> @@ -41,9 +43,11 @@ static unsigned int get_reboot_mode_magic(struct reboot_mode_driver *reboot,
> >> strreplace(cmd_, ',', '-');
> >> strreplace(cmd_, '/', '-');
> >>
> >> - list_for_each_entry(info, &reboot->head, list)
> >> - if (!strcmp(info->mode, cmd_))
> >> - return info->magic;
> >> + scoped_guard(mutex, &reboot->rb_lock) {
> >> + list_for_each_entry(info, &reboot->head, list)
> >> + if (!strcmp(info->mode, cmd_))
> >> + return info->magic;
> >> + }
> >>
> >> return 0;
> >> }
> >> @@ -78,46 +82,50 @@ int reboot_mode_register(struct reboot_mode_driver *reboot)
> >>
> >> INIT_LIST_HEAD(&reboot->head);
> >>
> >> - for_each_property_of_node(np, prop) {
> >> - if (strncmp(prop->name, PREFIX, len))
> >> - continue;
> >> -
> >> - info = devm_kzalloc(reboot->dev, sizeof(*info), GFP_KERNEL);
> >> - if (!info) {
> >> - ret = -ENOMEM;
> >> - goto error;
> >> - }
> >> -
> >> - if (of_property_read_u32(np, prop->name, &info->magic)) {
> >> - dev_err(reboot->dev, "reboot mode %s without magic number\n",
> >> - info->mode);
> >> - devm_kfree(reboot->dev, info);
> >> - continue;
> >> - }
> >> -
> >> - info->mode = kstrdup_const(prop->name + len, GFP_KERNEL);
> >> - if (!info->mode) {
> >> - ret = -ENOMEM;
> >> - goto error;
> >> - } else if (info->mode[0] == '\0') {
> >> - kfree_const(info->mode);
> >> - ret = -EINVAL;
> >> - dev_err(reboot->dev, "invalid mode name(%s): too short!\n",
> >> - prop->name);
> >> - goto error;
> >> + mutex_init(&reboot->rb_lock);
> >> +
> >> + scoped_guard(mutex, &reboot->rb_lock) {
> >
> > I don't see how this can race with anything, reboot_mode_register() is
> > supposed to be called from some probe function, with reboot_mode_driver
> > being a "local" object.
> >
> > The guard here "protects" &reboot->head, but that is not a shared
> > resources at this point.
> >
> >> + for_each_property_of_node(np, prop) {
> >> + if (strncmp(prop->name, PREFIX, len))
> >> + continue;
> >> +
> >> + info = devm_kzalloc(reboot->dev, sizeof(*info), GFP_KERNEL);
> >> + if (!info) {
> >> + ret = -ENOMEM;
> >> + goto error;
> >> + }
> >> +
> >> + if (of_property_read_u32(np, prop->name, &info->magic)) {
> >> + dev_err(reboot->dev, "reboot mode %s without magic number\n",
> >> + info->mode);
> >> + devm_kfree(reboot->dev, info);
> >> + continue;
> >> + }
> >> +
> >> + info->mode = kstrdup_const(prop->name + len, GFP_KERNEL);
> >> + if (!info->mode) {
> >> + ret = -ENOMEM;
> >> + goto error;
> >> + } else if (info->mode[0] == '\0') {
> >> + kfree_const(info->mode);
> >> + ret = -EINVAL;
> >> + dev_err(reboot->dev, "invalid mode name(%s): too short!\n",
> >> + prop->name);
> >> + goto error;
> >> + }
> >> +
> >> + list_add_tail(&info->list, &reboot->head);
> >> }
> >>
> >> - list_add_tail(&info->list, &reboot->head);
> >> - }
> >> -
> >> - reboot->reboot_notifier.notifier_call = reboot_mode_notify;
> >> - register_reboot_notifier(&reboot->reboot_notifier);
> >> + reboot->reboot_notifier.notifier_call = reboot_mode_notify;
> >> + register_reboot_notifier(&reboot->reboot_notifier);
> >
> > Once register_reboot_notifier() has been called, &reboot->head is
> > visible outside the specific driver instance.
> >
> > So, there's no reason to lock in reboot_mode_register().
> >
> >>
> >> - return 0;
> >> + return 0;
> >>
> >> error:
> >> - list_for_each_entry(info, &reboot->head, list)
> >> - kfree_const(info->mode);
> >> + list_for_each_entry(info, &reboot->head, list)
> >> + kfree_const(info->mode);
> >> + }
> >>
> >> return ret;
> >> }
> >> @@ -133,8 +141,10 @@ int reboot_mode_unregister(struct reboot_mode_driver *reboot)
> >>
> >> unregister_reboot_notifier(&reboot->reboot_notifier);
> >>
> >> - list_for_each_entry(info, &reboot->head, list)
> >> - kfree_const(info->mode);
> >> + scoped_guard(mutex, &reboot->rb_lock) {
> >
> > get_reboot_mode_magic() is only called from reboot_mode_notify(), which
> > is only invoked by blocking_notifier_call_chain().
> >
> > blocking_notifier_call_chain() takes a read semaphore.
> > unregister_reboot_notifier() take a write semaphore.
> >
> > So, if we're racing with a shutdown or reboot, I see two possible
> > things:
> >
> > 1) blocking_notifier_call_chain() happens first and calls
> > reboot_mode_notify(), blocking unregister_reboot_notifier(). Once it
> > returns, the unregister proceeds and we enter case #2
> >
> > 2) unregister_reboot_notifier() happens first (or after the
> > blocking_notifier_call_chain() returns). Our reboot object is removed
> > from the list and blocking_notifier_call_chain() will not invoke
> > reboot_mode_notify().
> >
> > In either case, the list has a single owner here.
> >
> >
> > As far as I can see, the only race left is if multiple concurrent calls
> > happens to blocking_notifier_call_chain(), the behavior of
> > reboot->write() might be undefined. But I think that is reasonable.
> >
> >
> > Please let me know if I'm missing something.
>
> Thank you for the detailed review. Tried to summarize below:
>
> The mutex lock was introduced in v13 following earlier discussions about
> whether the issue was a theoretical future race or something that could
> occur in the current implementation.
>
> At the time (prior to v13), we concluded that while no race condition was
> observable in the current code, there could be a potential in future
> changes or usage patterns — making it a theoretical concern.
>
> During review, there was further discussion around whether it's acceptable
> to leave the list unprotected simply because no race is currently suspected.
> The general consensus was that it's better practice to protect shared data
> structures like lists with a mutex to ensure correctness and future-proofing.
>
> Based on that feedback, the mutex lock introduced in v13.
I agree with Bjorn here, we do not need a lock here..
>
> Later in v15, the reboot-mode maintainer suggested that the patch should
> include a Fixes tag, which was incorporated accordingly.
>
> So both the mutex addition in v13 and the Fixes tag in v15 were made in
> response to upstream review comments.
>
> Need some guidance on how to take this forward or is it ok to protect all
> list operation, as done in this patch and keep the fixes tag as suggested
> in earlier reviews?
We should drop this patch.,
>
> thanks,
> Shivendra
--
-Mukesh Ojha
Powered by blists - more mailing lists