[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6ac7b51b-376e-4fa9-a7a9-dc0f5da90f64@picoheart.com>
Date: Thu, 22 Jan 2026 21:48:27 +0800
From: "Yicong Yang" <yang.yicong@...oheart.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: <yang.yicong@...oheart.com>, "Greg KH" <gregkh@...uxfoundation.org>,
<lenb@...nel.org>, <tglx@...nel.org>, <dakr@...nel.org>,
<akpm@...ux-foundation.org>, <apatel@...tanamicro.com>, <pjw@...nel.org>,
<palmer@...belt.com>, <aou@...s.berkeley.edu>, <alex@...ti.fr>,
<geshijian@...oheart.com>, <weidong.wd@...oheart.com>,
<linux-acpi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-riscv@...ts.infradead.org>
Subject: Re: [PATCH 1/2] ACPI: scan: Use async schedule function for acpi_scan_clear_dep_fn
On 1/22/26 9:21 PM, Rafael J. Wysocki wrote:
> On Thu, Jan 22, 2026 at 1:43 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
>>
>> On 1/22/26 6:29 PM, Greg KH wrote:
>>> On Thu, Jan 22, 2026 at 03:34:45PM +0800, Yicong Yang wrote:
>>>> The device object rescan in acpi_scan_clear_dep_fn is scheduled
>>>> in the system workqueue which is not guaranteed to be finished
>>>> before entering userspace. This will cause the problem that
>>>> some key devices are missed when the init task try to find them,
>>>> e.g. console devices and root devices (PCIe nvme, etc).
>>>> This issues is more possbile to happen on RISCV since these
>>>> devices using GSI interrupt may depend on APLIC and will be
>>>> scanned in acpi_scan_clear_dep_queue() after APLIC initialized.
>>>>
>>>> Fix this by scheduling the acpi_scan_clear_dep_queue() using async
>>>> schedule function rather than the system workqueue. The deferred
>>>> works will be synchronized by async_synchronize_full() before
>>>> entering init task.
>>>>
>>>> Update the comment as well.
>>>>
>>>> Signed-off-by: Yicong Yang <yang.yicong@...oheart.com>
>>>> ---
>>>> drivers/acpi/scan.c | 35 ++++++++++++++++-------------------
>>>> 1 file changed, 16 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>>>> index 416d87f9bd10..bf0d8ba9ba19 100644
>>>> --- a/drivers/acpi/scan.c
>>>> +++ b/drivers/acpi/scan.c
>>>> @@ -5,6 +5,7 @@
>>>>
>>>> #define pr_fmt(fmt) "ACPI: " fmt
>>>>
>>>> +#include <linux/async.h>
>>>> #include <linux/module.h>
>>>> #include <linux/init.h>
>>>> #include <linux/slab.h>
>>>> @@ -2365,39 +2366,35 @@ struct acpi_scan_clear_dep_work {
>>>> struct acpi_device *adev;
>>>> };
>>>>
>>>> -static void acpi_scan_clear_dep_fn(struct work_struct *work)
>>>> +static void acpi_scan_clear_dep_fn(void *dev, async_cookie_t cookie)
>>>> {
>>>> - struct acpi_scan_clear_dep_work *cdw;
>>>> -
>>>> - cdw = container_of(work, struct acpi_scan_clear_dep_work, work);
>>>> + struct acpi_device *adev = to_acpi_device(dev);
>>>>
>>>> acpi_scan_lock_acquire();
>>>> - acpi_bus_attach(cdw->adev, (void *)true);
>>>> + acpi_bus_attach(adev, (void *)true);
>>>> acpi_scan_lock_release();
>>>>
>>>> - acpi_dev_put(cdw->adev);
>>>> - kfree(cdw);
>>>> + acpi_dev_put(adev);
>>>> }
>>>>
>>>> static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
>>>> {
>>>> - struct acpi_scan_clear_dep_work *cdw;
>>>> -
>>>> if (adev->dep_unmet)
>>>> return false;
>>>>
>>>> - cdw = kmalloc(sizeof(*cdw), GFP_KERNEL);
>>>> - if (!cdw)
>>>> - return false;
>>>> -
>>>> - cdw->adev = adev;
>>>> - INIT_WORK(&cdw->work, acpi_scan_clear_dep_fn);
>>>> /*
>>>> - * Since the work function may block on the lock until the entire
>>>> - * initial enumeration of devices is complete, put it into the unbound
>>>> - * workqueue.
>>>> + * Async schedule the deferred acpi_scan_clear_dep_fn() since:
>>>> + * - acpi_bus_attach() needs to hold acpi_scan_lock which cannot
>>>> + * be acquired under acpi_dep_list_lock (held here)
>>>> + * - the deferred work at boot stage is ensured to be finished
>>>> + * before entering init task by the async_synchronize_full()
>>>> + * barrier
>>>> + *
>>>> + * Use _nocall variant since it'll return on failure instead of
>>>> + * run the function synchronously.
>>>> */
>>>> - queue_work(system_dfl_wq, &cdw->work);
>>>> + if (!async_schedule_dev_nocall(acpi_scan_clear_dep_fn, &adev->dev))
>>>> + return false;
>>>
>>> This really feels wrong to me, you are taking a code path that has been
>>> working for quite a while and changing it. Perhaps your system ACPI
>>> tables are the thing that is incorrect here?
>>>
>>> What exactly is the problem that you are seeing? Why not start with
>>> that and then we can work out how to solve that issue?
>>>
>>
>> two issues here we met (as briefly mentioned in the commit and cover letter):
>> 1. kernel panic due to userspace init cannot have an opened console. the
>> console device scanning is queued in the system_dfl_wq in above code
>> and not finished by the time userspace init process running, thus by
>> the time userspace init running, no console is created
>> 2. entering rescue shell due to no root devices (PCIe nvme in our case)
>> found. same reason as above, the PCIe host bridge scanning is queued
>> in above and finished after init process running.
>>
>> The reason why these devices are created here is because they both depend
>> on riscv-aplic irqchip to serve their interrupts (console's wired
>> interrupt and PCI's INTx interrupts) and in order to keep the dependency
>> these devices are scanned and created after riscv-aplic initialized. The
>> riscv-aplic is initialized in device_initcall and invoke above codes for
>> the scan/creation of these devices, it's close to the time running
>> userspace init process. Since system_dfl_wq is used here and no synchronized
>> mechanism, the issues will happen if userspace init runs before these devices
>> are ready.
>
> Well, there is flush_workqueue(), isn't it there?
>
sure, of course. I implement it using async schedule but open to any
better ones. but..
>> Previous solution [1] is to advance the initialization of riscv-aplic
>> earlier but the order still cannot be guaranteed conceptually.
>> With async_schedule_dev_nocal() the work queued is finished
>> before entering userspace init since we'll wait for completion at
>> async_synchronize_full() before executing userspace init process.
>>
>> To solve the issue I think we should make these devices ready before
>> entering userspace and async schedule is one way to make it. It's also
>> using an unbound workqueue but have additional synchronization. Any
>> corrections or suggestions?
>
> You can flush the dedicated workqueue at the same spot where you do
> async_synchronize_full(), can't you?
>
the async_synchronize_full() is already there in the init code right
before entering usersapce [1] so if we use async schedule we don't
need to add synchronize spot ourselves. For a dedicated workqueue we
need to call flush_workqueue() explicitly somewhere (I suppose to be
the last level initcall like late_initcall_sync). Furthermore we need
to allocate it and I think not all the platforms need this piece of
code.
It seems no need to have a dedicated workqueue for this since async
schedule can satisfy the needs already (from my view). Except less
modifications can be made for this piece of code if we use a dedicated
workqueue, is this the consideration and mandatory?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/init/main.c?h=v6.19-rc6#n1580
thanks.
Powered by blists - more mailing lists