linux-kernel - Re: [PATCH 1/2] ACPI: scan: Use async schedule function for acpi_scan_clear_dep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6ac7b51b-376e-4fa9-a7a9-dc0f5da90f64@picoheart.com>
Date: Thu, 22 Jan 2026 21:48:27 +0800
From: "Yicong Yang" <yang.yicong@...oheart.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: <yang.yicong@...oheart.com>, "Greg KH" <gregkh@...uxfoundation.org>, 
	<lenb@...nel.org>, <tglx@...nel.org>, <dakr@...nel.org>, 
	<akpm@...ux-foundation.org>, <apatel@...tanamicro.com>, <pjw@...nel.org>, 
	<palmer@...belt.com>, <aou@...s.berkeley.edu>, <alex@...ti.fr>, 
	<geshijian@...oheart.com>, <weidong.wd@...oheart.com>, 
	<linux-acpi@...r.kernel.org>, <linux-kernel@...r.kernel.org>, 
	<linux-riscv@...ts.infradead.org>
Subject: Re: [PATCH 1/2] ACPI: scan: Use async schedule function for acpi_scan_clear_dep_fn

On 1/22/26 9:21 PM, Rafael J. Wysocki wrote:
> On Thu, Jan 22, 2026 at 1:43 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
>>
>> On 1/22/26 6:29 PM, Greg KH wrote:
>>> On Thu, Jan 22, 2026 at 03:34:45PM +0800, Yicong Yang wrote:
>>>> The device object rescan in acpi_scan_clear_dep_fn is scheduled
>>>> in the system workqueue which is not guaranteed to be finished
>>>> before entering userspace. This will cause the problem that
>>>> some key devices are missed when the init task try to find them,
>>>> e.g. console devices and root devices (PCIe nvme, etc).
>>>> This issues is more possbile to happen on RISCV since these
>>>> devices using GSI interrupt may depend on APLIC and will be
>>>> scanned in acpi_scan_clear_dep_queue() after APLIC initialized.
>>>>
>>>> Fix this by scheduling the acpi_scan_clear_dep_queue() using async
>>>> schedule function rather than the system workqueue. The deferred
>>>> works will be synchronized by async_synchronize_full() before
>>>> entering init task.
>>>>
>>>> Update the comment as well.
>>>>
>>>> Signed-off-by: Yicong Yang <yang.yicong@...oheart.com>
>>>> ---
>>>>  drivers/acpi/scan.c | 35 ++++++++++++++++-------------------
>>>>  1 file changed, 16 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>> index 416d87f9bd10..bf0d8ba9ba19 100644
>>>> --- a/drivers/acpi/scan.c
>>>> +++ b/drivers/acpi/scan.c
>>>> @@ -5,6 +5,7 @@
>>>>
>>>>  #define pr_fmt(fmt) "ACPI: " fmt
>>>>
>>>> +#include <linux/async.h>
>>>>  #include <linux/module.h>
>>>>  #include <linux/init.h>
>>>>  #include <linux/slab.h>
>>>> @@ -2365,39 +2366,35 @@ struct acpi_scan_clear_dep_work {
>>>>      struct acpi_device *adev;
>>>>  };
>>>>
>>>> -static void acpi_scan_clear_dep_fn(struct work_struct *work)
>>>> +static void acpi_scan_clear_dep_fn(void *dev, async_cookie_t cookie)
>>>>  {
>>>> -    struct acpi_scan_clear_dep_work *cdw;
>>>> -
>>>> -    cdw = container_of(work, struct acpi_scan_clear_dep_work, work);
>>>> +    struct acpi_device *adev = to_acpi_device(dev);
>>>>
>>>>      acpi_scan_lock_acquire();
>>>> -    acpi_bus_attach(cdw->adev, (void *)true);
>>>> +    acpi_bus_attach(adev, (void *)true);
>>>>      acpi_scan_lock_release();
>>>>
>>>> -    acpi_dev_put(cdw->adev);
>>>> -    kfree(cdw);
>>>> +    acpi_dev_put(adev);
>>>>  }
>>>>
>>>>  static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
>>>>  {
>>>> -    struct acpi_scan_clear_dep_work *cdw;
>>>> -
>>>>      if (adev->dep_unmet)
>>>>              return false;
>>>>
>>>> -    cdw = kmalloc(sizeof(*cdw), GFP_KERNEL);
>>>> -    if (!cdw)
>>>> -            return false;
>>>> -
>>>> -    cdw->adev = adev;
>>>> -    INIT_WORK(&cdw->work, acpi_scan_clear_dep_fn);
>>>>      /*
>>>> -     * Since the work function may block on the lock until the entire
>>>> -     * initial enumeration of devices is complete, put it into the unbound
>>>> -     * workqueue.
>>>> +     * Async schedule the deferred acpi_scan_clear_dep_fn() since:
>>>> +     * - acpi_bus_attach() needs to hold acpi_scan_lock which cannot
>>>> +     *   be acquired under acpi_dep_list_lock (held here)
>>>> +     * - the deferred work at boot stage is ensured to be finished
>>>> +     *   before entering init task by the async_synchronize_full()
>>>> +     *   barrier
>>>> +     *
>>>> +     * Use _nocall variant since it'll return on failure instead of
>>>> +     * run the function synchronously.
>>>>       */
>>>> -    queue_work(system_dfl_wq, &cdw->work);
>>>> +    if (!async_schedule_dev_nocall(acpi_scan_clear_dep_fn, &adev->dev))
>>>> +            return false;
>>>
>>> This really feels wrong to me, you are taking a code path that has been
>>> working for quite a while and changing it.  Perhaps your system ACPI
>>> tables are the thing that is incorrect here?
>>>
>>> What exactly is the problem that you are seeing?  Why not start with
>>> that and then we can work out how to solve that issue?
>>>
>>
>> two issues here we met (as briefly mentioned in the commit and cover letter):
>> 1. kernel panic due to userspace init cannot have an opened console. the
>>    console device scanning is queued in the system_dfl_wq in above code
>>    and not finished by the time userspace init process running, thus by
>>    the time userspace init running, no console is created
>> 2. entering rescue shell due to no root devices (PCIe nvme in our case)
>>    found. same reason as above, the PCIe host bridge scanning is queued
>>    in above and finished after init process running.
>>
>> The reason why these devices are created here is because they both depend
>> on riscv-aplic irqchip to serve their interrupts (console's wired
>> interrupt and PCI's INTx interrupts) and in order to keep the dependency
>> these devices are scanned and created after riscv-aplic initialized. The
>> riscv-aplic is initialized in device_initcall and invoke above codes for
>> the scan/creation of these devices, it's close to the time running
>> userspace init process. Since system_dfl_wq is used here and no synchronized
>> mechanism, the issues will happen if userspace init runs before these devices
>> are ready.
> 
> Well, there is flush_workqueue(), isn't it there?
> 

sure, of course. I implement it using async schedule but open to any
better ones. but..

>> Previous solution [1] is to advance the initialization of riscv-aplic
>> earlier but the order still cannot be guaranteed conceptually.
>> With async_schedule_dev_nocal() the work queued is finished
>> before entering userspace init since we'll wait for completion at
>> async_synchronize_full() before executing userspace init process.
>>
>> To solve the issue I think we should make these devices ready before
>> entering userspace and async schedule is one way to make it. It's also
>> using an unbound workqueue but have additional synchronization. Any
>> corrections or suggestions?
> 
> You can flush the dedicated workqueue at the same spot where you do
> async_synchronize_full(), can't you?
> 

the async_synchronize_full() is already there in the init code right
before entering usersapce [1] so if we use async schedule we don't
need to add synchronize spot ourselves. For a dedicated workqueue we
need to call flush_workqueue() explicitly somewhere (I suppose to be
the last level initcall like late_initcall_sync). Furthermore we need
to allocate it and I think not all the platforms need this piece of
code.

It seems no need to have a dedicated workqueue for this since async
schedule can satisfy the needs already (from my view). Except less
modifications can be made for this piece of code if we use a dedicated
workqueue, is this the consideration and mandatory?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/init/main.c?h=v6.19-rc6#n1580

thanks.