lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <e30ee004-64f9-4003-8907-c407a76fc244@picoheart.com>
Date: Thu, 15 Jan 2026 16:31:59 +0800
From: "Yicong Yang" <yang.yicong@...oheart.com>
To: "Thomas Gleixner" <tglx@...nel.org>, 
	"Anup Patel" <apatel@...tanamicro.com>
Cc: <yang.yicong@...oheart.com>, <anup@...infault.org>, <pjw@...nel.org>, 
	<palmer@...belt.com>, <aou@...s.berkeley.edu>, <alex@...ti.fr>, 
	<linux-riscv@...ts.infradead.org>, <linux-kernel@...r.kernel.org>, 
	<geshijian@...oheart.com>, <weidong.wd@...oheart.com>, 
	"Greg Kroah-Hartman" <gregkh@...uxfoundation.org>, 
	"Rafael J. Wysocki" <rafael@...nel.org>, 
	"Danilo Krummrich" <dakr@...nel.org>
Subject: Re: [PATCH] irqchip/riscv-aplic: Register the driver prior to device creation

On 1/15/26 3:50 AM, Thomas Gleixner wrote:
> On Wed, Jan 14 2026 at 19:48, Yicong Yang wrote:
>> On 1/14/26 4:57 PM, Anup Patel wrote:
>>> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
>>>>
>>>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
>>>> other arthitecture it's initialized a bit late on ACPI based
>>>> system:
>>>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
>>>>   so the APLIC is created as platform_device when scanning DSDT
>>>> - the driver is registered and initialize the device in device_initcall
>>>>   stage
>>>>
>>>> The creation of devices depends on APLIC is deferred after the APLIC
>>>> is initialized (when the driver calls acpi_dev_clear_dependencies),
>>>> not like most other devices which is created when scanning the DSDT.
>>>> The affected devices include those declare the dependency explicitly
>>>> by ACPI _DEP method and _PRT for PCIe host bridge and those require
>>>> their interrupts as GSI. Furhtermore, the deferred creation is
>>>> performed in an async way (queued in the system_dfl_wq workqueue)
>>>> but all contend on the acpi_scan_lock.
> 
> The lock contention is irrelevant to the real underlying problem.
> 
>>>> Since the deferred devcie creation is asynchronous and will contend
>>>> for the same lock, the order and timing is not certain. And the time
>>>> is late enough for the device creation running parallel with the init
>>>> task. This will lead to below issues (also observed on our platforms):
>>>> - the console/tty device is created lately and sometimes it's not ready
>>>>   when init task check for its presence. the system will crash in the
>>>>   latter case since the init task always requires a valid console.
>>>> - the root device will by probed and registered lately (e.g. NVME,
>>>>   after the init task executed) and may run into the rescue shell if
>>>>   root device is not found.
> 
> And again, you _cannot_ solve this problem completely with initcall
> ordering;
> 
>    Deferred probing with delegation to work queues has the systemic
>    issue that there is no guarantee that all devices, which are required
>    to actually proceed to userspace, have been initialized at that
>    point.
> 
> Changing the initcall priority of a particular driver papers over the
> underlying problem to the extent that _you_ cannot observe it anymore,
> but that provides exactly _zero_ guarantee that it is correct under all
> circumstances. "Works for me" is the worst engineering principle as you
> might know already.
> 
> That said, I still refuse to take random initcall ordering patches
> unless somebody comes up with a coherent explanation of the actual
> guarantee.
> 

ok, I see the points and it's reasonable to me. thanks..

> But before you start to come up with more fairy tales, let me come back
> to your two points from above:
> 
>>>> - the console/tty device is created lately and sometimes it's not ready
>>>>   when init task check for its presence. the system will crash in the
>>>>   latter case since the init task always requires a valid console.
> 
> I assume you want to say that console_on_rootfs() fails to open
> '/dev/console', right?
> 

right.

> That's obvious because console_on_rootfs() is invoked _before_
> async_synchronize_full() is invoked which ensures that all outstanding
> initialization work has been completed.
> 

it seems problematic here to put the console_on_rootfs() before
async_synchronize_full() (as you point out), but my issue is not
caused by it directly. but I think you're right that we should
do the synchronization and make use of async_synchronize_full().

illustrate it below.

> The fix for this is obvious too and it's therefore bloody obvious that
> changing the init call priority of a random driver does not fix that at
> all, no?
> > But that's not sufficient, see below.
> 
>>>> - the root device will by probed and registered lately (e.g. NVME,
>>>>   after the init task executed) and may run into the rescue shell if
>>>>   root device is not found.
> 
> You completely fail to explain how outstanding initializations in work
> queues survive past the async_synchronize_full() synchronization
> point. You are merely describing random observations on your system, but
> you stopped right there without trying to decode the underlying root
> cause.
> 

For devices depends on the APLIC, the platform_device (tty, PCIe root) creation
will be deferred to stage where the APLIC driver called acpi_dev_clear_dependencies().
It'll iterate the dependency list and queue each device creation in the
system_dfl_wq in acpi_scan_clear_dep_queue() [1], so the later driver probe
will also be performed in the system_dfl_wq.

async_synchronize_full() will synchronize all the works in async_wq but not
other workqueues. that's the reason async_synchronize_full() failed to
synchronize these devices creation/probe before the init process.

Please correct me if there's any mistake.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/scan.c?h=v6.19-rc5#n2400

> The root cause is:
> 
>  1) as I already said above that deferred probing does not provide any
>     guarantees at all.
> 
>  2) async_synchronize_full() is obviously not the barrier which it is
>     supposed to be (the misplaced console_on_rootfs() call aside).
> 
> That needs to be fixed at the conceptual level and not hacked around
> with "works for me" patches and fairy tale change logs.
> 

so based on above, if we use async_wq (with async_schedule* APIs) in
acpi_scan_clear_dep_queue() for creating these devices, the issue
could be solved since we're sure to have these devices before entering
userspace, since the barrier of async_synchronize_full(). This should be
a solution with a conceptual support and I did a quick test on our
platform it solves the issue.

As for the order of console_on_rootfs()/async_synchronize_full(),
though our issue is not directly caused by it, it will cause the
same issue (by the console open time the async probing maybe not
finised) theoretically and needs to be fixed, is it?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ