lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j6u+kWxq8XAAhkoQjBAW4xqfu-XepAULx1c8syp6N8hw@mail.gmail.com>
Date: Thu, 22 Jan 2026 14:57:57 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Yicong Yang <yang.yicong@...oheart.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Greg KH <gregkh@...uxfoundation.org>, lenb@...nel.org, 
	tglx@...nel.org, dakr@...nel.org, akpm@...ux-foundation.org, 
	apatel@...tanamicro.com, pjw@...nel.org, palmer@...belt.com, 
	aou@...s.berkeley.edu, alex@...ti.fr, geshijian@...oheart.com, 
	weidong.wd@...oheart.com, linux-acpi@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org
Subject: Re: [PATCH 1/2] ACPI: scan: Use async schedule function for acpi_scan_clear_dep_fn

On Thu, Jan 22, 2026 at 2:48 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
>
> On 1/22/26 9:21 PM, Rafael J. Wysocki wrote:
> > On Thu, Jan 22, 2026 at 1:43 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
> >>
> >> On 1/22/26 6:29 PM, Greg KH wrote:
> >>> On Thu, Jan 22, 2026 at 03:34:45PM +0800, Yicong Yang wrote:
> >>>> The device object rescan in acpi_scan_clear_dep_fn is scheduled
> >>>> in the system workqueue which is not guaranteed to be finished
> >>>> before entering userspace. This will cause the problem that
> >>>> some key devices are missed when the init task try to find them,
> >>>> e.g. console devices and root devices (PCIe nvme, etc).
> >>>> This issues is more possbile to happen on RISCV since these
> >>>> devices using GSI interrupt may depend on APLIC and will be
> >>>> scanned in acpi_scan_clear_dep_queue() after APLIC initialized.
> >>>>
> >>>> Fix this by scheduling the acpi_scan_clear_dep_queue() using async
> >>>> schedule function rather than the system workqueue. The deferred
> >>>> works will be synchronized by async_synchronize_full() before
> >>>> entering init task.
> >>>>
> >>>> Update the comment as well.
> >>>>
> >>>> Signed-off-by: Yicong Yang <yang.yicong@...oheart.com>
> >>>> ---
> >>>>  drivers/acpi/scan.c | 35 ++++++++++++++++-------------------
> >>>>  1 file changed, 16 insertions(+), 19 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> >>>> index 416d87f9bd10..bf0d8ba9ba19 100644
> >>>> --- a/drivers/acpi/scan.c
> >>>> +++ b/drivers/acpi/scan.c
> >>>> @@ -5,6 +5,7 @@
> >>>>
> >>>>  #define pr_fmt(fmt) "ACPI: " fmt
> >>>>
> >>>> +#include <linux/async.h>
> >>>>  #include <linux/module.h>
> >>>>  #include <linux/init.h>
> >>>>  #include <linux/slab.h>
> >>>> @@ -2365,39 +2366,35 @@ struct acpi_scan_clear_dep_work {
> >>>>      struct acpi_device *adev;
> >>>>  };
> >>>>
> >>>> -static void acpi_scan_clear_dep_fn(struct work_struct *work)
> >>>> +static void acpi_scan_clear_dep_fn(void *dev, async_cookie_t cookie)
> >>>>  {
> >>>> -    struct acpi_scan_clear_dep_work *cdw;
> >>>> -
> >>>> -    cdw = container_of(work, struct acpi_scan_clear_dep_work, work);
> >>>> +    struct acpi_device *adev = to_acpi_device(dev);
> >>>>
> >>>>      acpi_scan_lock_acquire();
> >>>> -    acpi_bus_attach(cdw->adev, (void *)true);
> >>>> +    acpi_bus_attach(adev, (void *)true);
> >>>>      acpi_scan_lock_release();
> >>>>
> >>>> -    acpi_dev_put(cdw->adev);
> >>>> -    kfree(cdw);
> >>>> +    acpi_dev_put(adev);
> >>>>  }
> >>>>
> >>>>  static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
> >>>>  {
> >>>> -    struct acpi_scan_clear_dep_work *cdw;
> >>>> -
> >>>>      if (adev->dep_unmet)
> >>>>              return false;
> >>>>
> >>>> -    cdw = kmalloc(sizeof(*cdw), GFP_KERNEL);
> >>>> -    if (!cdw)
> >>>> -            return false;
> >>>> -
> >>>> -    cdw->adev = adev;
> >>>> -    INIT_WORK(&cdw->work, acpi_scan_clear_dep_fn);
> >>>>      /*
> >>>> -     * Since the work function may block on the lock until the entire
> >>>> -     * initial enumeration of devices is complete, put it into the unbound
> >>>> -     * workqueue.
> >>>> +     * Async schedule the deferred acpi_scan_clear_dep_fn() since:
> >>>> +     * - acpi_bus_attach() needs to hold acpi_scan_lock which cannot
> >>>> +     *   be acquired under acpi_dep_list_lock (held here)
> >>>> +     * - the deferred work at boot stage is ensured to be finished
> >>>> +     *   before entering init task by the async_synchronize_full()
> >>>> +     *   barrier
> >>>> +     *
> >>>> +     * Use _nocall variant since it'll return on failure instead of
> >>>> +     * run the function synchronously.
> >>>>       */
> >>>> -    queue_work(system_dfl_wq, &cdw->work);
> >>>> +    if (!async_schedule_dev_nocall(acpi_scan_clear_dep_fn, &adev->dev))
> >>>> +            return false;
> >>>
> >>> This really feels wrong to me, you are taking a code path that has been
> >>> working for quite a while and changing it.  Perhaps your system ACPI
> >>> tables are the thing that is incorrect here?
> >>>
> >>> What exactly is the problem that you are seeing?  Why not start with
> >>> that and then we can work out how to solve that issue?
> >>>
> >>
> >> two issues here we met (as briefly mentioned in the commit and cover letter):
> >> 1. kernel panic due to userspace init cannot have an opened console. the
> >>    console device scanning is queued in the system_dfl_wq in above code
> >>    and not finished by the time userspace init process running, thus by
> >>    the time userspace init running, no console is created
> >> 2. entering rescue shell due to no root devices (PCIe nvme in our case)
> >>    found. same reason as above, the PCIe host bridge scanning is queued
> >>    in above and finished after init process running.
> >>
> >> The reason why these devices are created here is because they both depend
> >> on riscv-aplic irqchip to serve their interrupts (console's wired
> >> interrupt and PCI's INTx interrupts) and in order to keep the dependency
> >> these devices are scanned and created after riscv-aplic initialized. The
> >> riscv-aplic is initialized in device_initcall and invoke above codes for
> >> the scan/creation of these devices, it's close to the time running
> >> userspace init process. Since system_dfl_wq is used here and no synchronized
> >> mechanism, the issues will happen if userspace init runs before these devices
> >> are ready.
> >
> > Well, there is flush_workqueue(), isn't it there?
> >
>
> sure, of course. I implement it using async schedule but open to any
> better ones. but..
>
> >> Previous solution [1] is to advance the initialization of riscv-aplic
> >> earlier but the order still cannot be guaranteed conceptually.
> >> With async_schedule_dev_nocal() the work queued is finished
> >> before entering userspace init since we'll wait for completion at
> >> async_synchronize_full() before executing userspace init process.
> >>
> >> To solve the issue I think we should make these devices ready before
> >> entering userspace and async schedule is one way to make it. It's also
> >> using an unbound workqueue but have additional synchronization. Any
> >> corrections or suggestions?
> >
> > You can flush the dedicated workqueue at the same spot where you do
> > async_synchronize_full(), can't you?
> >
>
> the async_synchronize_full() is already there in the init code right
> before entering usersapce [1] so if we use async schedule we don't
> need to add synchronize spot ourselves. For a dedicated workqueue we
> need to call flush_workqueue() explicitly somewhere (I suppose to be
> the last level initcall like late_initcall_sync). Furthermore we need
> to allocate it and I think not all the platforms need this piece of
> code.
>
> It seems no need to have a dedicated workqueue for this since async
> schedule can satisfy the needs already (from my view). Except less
> modifications can be made for this piece of code if we use a dedicated
> workqueue, is this the consideration and mandatory?
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/init/main.c?h=v6.19-rc6#n1580

So please add the above information to the changelog of your patch
(that is, why it is better to use async for this than to use a
dedicated workqueue in your view) because it is kind of relevant.

And how does the second patch in the series depend on this one?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ