[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877btkht2v.ffs@tglx>
Date: Wed, 14 Jan 2026 20:50:32 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Yicong Yang <yang.yicong@...oheart.com>, Anup Patel
<apatel@...tanamicro.com>
Cc: yang.yicong@...oheart.com, anup@...infault.org, pjw@...nel.org,
palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
geshijian@...oheart.com, weidong.wd@...oheart.com, Greg Kroah-Hartman
<gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Danilo Krummrich <dakr@...nel.org>
Subject: Re: [PATCH] irqchip/riscv-aplic: Register the driver prior to
device creation
On Wed, Jan 14 2026 at 19:48, Yicong Yang wrote:
> On 1/14/26 4:57 PM, Anup Patel wrote:
>> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@...oheart.com> wrote:
>>>
>>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
>>> other arthitecture it's initialized a bit late on ACPI based
>>> system:
>>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
>>> so the APLIC is created as platform_device when scanning DSDT
>>> - the driver is registered and initialize the device in device_initcall
>>> stage
>>>
>>> The creation of devices depends on APLIC is deferred after the APLIC
>>> is initialized (when the driver calls acpi_dev_clear_dependencies),
>>> not like most other devices which is created when scanning the DSDT.
>>> The affected devices include those declare the dependency explicitly
>>> by ACPI _DEP method and _PRT for PCIe host bridge and those require
>>> their interrupts as GSI. Furhtermore, the deferred creation is
>>> performed in an async way (queued in the system_dfl_wq workqueue)
>>> but all contend on the acpi_scan_lock.
The lock contention is irrelevant to the real underlying problem.
>>> Since the deferred devcie creation is asynchronous and will contend
>>> for the same lock, the order and timing is not certain. And the time
>>> is late enough for the device creation running parallel with the init
>>> task. This will lead to below issues (also observed on our platforms):
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
And again, you _cannot_ solve this problem completely with initcall
ordering;
Deferred probing with delegation to work queues has the systemic
issue that there is no guarantee that all devices, which are required
to actually proceed to userspace, have been initialized at that
point.
Changing the initcall priority of a particular driver papers over the
underlying problem to the extent that _you_ cannot observe it anymore,
but that provides exactly _zero_ guarantee that it is correct under all
circumstances. "Works for me" is the worst engineering principle as you
might know already.
That said, I still refuse to take random initcall ordering patches
unless somebody comes up with a coherent explanation of the actual
guarantee.
But before you start to come up with more fairy tales, let me come back
to your two points from above:
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
I assume you want to say that console_on_rootfs() fails to open
'/dev/console', right?
That's obvious because console_on_rootfs() is invoked _before_
async_synchronize_full() is invoked which ensures that all outstanding
initialization work has been completed.
The fix for this is obvious too and it's therefore bloody obvious that
changing the init call priority of a random driver does not fix that at
all, no?
But that's not sufficient, see below.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
You completely fail to explain how outstanding initializations in work
queues survive past the async_synchronize_full() synchronization
point. You are merely describing random observations on your system, but
you stopped right there without trying to decode the underlying root
cause.
The root cause is:
1) as I already said above that deferred probing does not provide any
guarantees at all.
2) async_synchronize_full() is obviously not the barrier which it is
supposed to be (the misplaced console_on_rootfs() call aside).
That needs to be fixed at the conceptual level and not hacked around
with "works for me" patches and fairy tale change logs.
Thanks,
tglx
Powered by blists - more mailing lists