[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251125160731.4902-1-ni_liqiang@126.com>
Date: Wed, 26 Nov 2025 00:07:31 +0800
From: niliqiang <ni_liqiang@....com>
To: maz@...nel.org
Cc: ajones@...tanamicro.com,
anup@...infault.org,
apatel@...tanamicro.com,
atishp@...shpatra.org,
bjorn@...nel.org,
conor+dt@...nel.org,
dai.hualiang@....com.cn,
deng.weixian@....com.cn,
devicetree@...r.kernel.org,
frowand.list@...il.com,
guo.chang2@....com.cn,
hu.yuye@....com.cn,
krzysztof.kozlowski+dt@...aro.org,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org,
liu.qingtao2@....com.cn,
liu.wenhong35@....com.cn,
ni.liqiang@....com.cn,
ni_liqiang@....com,
palmer@...belt.com,
paul.walmsley@...ive.com,
robh+dt@...nel.org,
saravanak@...gle.com,
sunilvl@...tanamicro.com,
tglx@...utronix.de,
wu.jiabao@....com.cn
Subject: Re: [PATCH v16 6/9] irqchip: Add RISC-V advanced PLIC driver for direct-mode
Dear Marc and all concerned,
> > > > diff --git a/drivers/irqchip/irq-riscv-aplic-main.c b/drivers/irqchip/irq-riscv-aplic-main.c
> > > > +static const struct of_device_id aplic_match[] = {
> > > > + { .compatible = "riscv,aplic" },
> > > > + {}
> > > > +};
> > > > +
> > > > +static struct platform_driver aplic_driver = {
> > > > + .driver = {
> > > > + .name = "riscv-aplic",
> > > > + .of_match_table = aplic_match,
> > > > + },
> > > > + .probe = aplic_probe,
> > > > +};
> > > > +builtin_platform_driver(aplic_driver);
> > >
> > > Dear Anup Patel and all concerned,
> > >
> > > I am writing to inquire about the historical rationale behind defining the APLIC driver's
> > > initialization priority using builtin_platform_driver in the current implementation.
> > >
> > > In our environment, we are encountering an issue where this priority level causes ACPI-based PCIe
> > > enumeration to be executed in the system_unbound_wq work queue. This parallel execution model
> > > results in PCIe devices being enumerated in an arbitrary order rather than strictly following the
> > > sequence defined in the ACPI DSDT table.
> > >
> > > The random enumeration order is adversely affecting customer experience, particularly in scenarios
> > > where device ordering is critical for proper system operation or application compatibility.
> > >
> > > We are considering modifying the APLIC driver's initialization priority to ensure PCIe enumeration
> > > occurs sequentially according to the DSDT specification. However, before proceeding with such
> > > changes, we wanted to consult with you regarding:
> > >
> > > 1. Were there specific technical considerations that led to the current priority selection?
> > > 2. Are there any potential side effects or broader impacts that we might have overlooked?
> > > 3. Would you support such a priority adjustment, or do you have alternative suggestions to
> > > address the enumeration order issue?
> > >
> > > We greatly appreciate your insights and expertise on this matter, as it will help us make an
> > > informed decision while maintaining system stability and compatibility.
> > >
> > > Thank you for your time and consideration.
> > >
> >
> > IRQ subsystem maintainers rejected the idea of relying on initcalls to
> > enforce probe order because initcalls do not guarantee ordering. The
> > Linux driver model instead ensures probe order through device
> > dependencies. Since PCI INTx depends on the APLIC being probed first,
> > the PCI host bridge probe cannot occur until after the APLIC probe
> > completes. This requirement and behavior are the same for both DT and
> > ACPI. In DT, the driver model uses fw_devlink to establish probe
> > ordering, while in ACPI this is handled through either an explicit _DEP
> > or, on RISC-V, the GSI mapping.
> > Typically, this dependency appears in the DSDT only for the PCI host
> > bridge. Individual PCIe devices are enumerated through the standard PCI
> > scan once the host bridge has been probed. Therefore, I’m not sure what
> > you meant by a probe sequence defined in the DSDT for PCIe devices.
> > Regards,
> > Sunil
>
> I understand the scenario you described with a single PCI host bridge, where devices are enumerated
> through standard PCIe scanning after the host bridge completes probing. However, in ARM and RISC-V
> architectures, systems often have multiple PCI host bridges. We're currently facing an issue in a
> 6-host-bridge system where all bridges depend on the APLIC driver. They must wait until the APLIC
> driver completes and callsacpi_dev_clear_dependenciesto resolve dependencies, after which they're
> sequentially added to the system_unbound_wq work queue. However, during execution in the work queue,
> these 6 host bridges undergo parallel enumeration, preventing them from following the order defined
> in the firmware's ACPI DSDT table. Specifically:
> 1. The ACPI DSDT table declares the 6 host bridges in a fixed sequence
> Device(PC06) {
> Name(_HID, "PNP0A08")
> Name(_CID, "PNP0A03")
> Name(_UID, 0x6)
> ......
> }
> Device(PC07) {
> Name(_HID, "PNP0A08")
> Name(_CID, "PNP0A03")
> Name(_UID, 0x7)
> ......
> }
> Device(PC08) {
> Name(_HID, "PNP0A08")
> Name(_CID, "PNP0A03")
> Name(_UID, 0x8)
> .....
> }
> ...
> Device(PC11) {
> Name(_HID, "PNP0A08")
> Name(_CID, "PNP0A03")
> Name(_UID, 0xB)
> ......
> }
>
> 2. But the OS enumerates them in random order upon each boot (first boot sequence ≠ second boot sequence)
> first boot sequence ~ # dmesg |grep -i "PCI Root"
> [ 8794.588531] ACPI: PCI Root Bridge [PC08] (domain 0006 [bus 80-ff])
> [ 8794.624478] ACPI: PCI Root Bridge [PC06] (domain 0005 [bus 00-ff])
> [ 8794.672741] ACPI: PCI Root Bridge [PC10] (domain 0008 [bus 00-ff])
> [ 8794.696680] ACPI: PCI Root Bridge [PC07] (domain 0006 [bus 00-7f])
> [ 8794.728234] ACPI: PCI Root Bridge [PC11] (domain 0009 [bus 00-ff])
> [ 8794.755098] ACPI: PCI Root Bridge [PC09] (domain 0007 [bus 00-ff])
> second boot sequence ~ # dmesg |grep -i "PCI Root"
> [ 8794.588531] ACPI: PCI Root Bridge [PC09] (domain 0007 [bus 00-ff])
> [ 8794.624478] ACPI: PCI Root Bridge [PC06] (domain 0005 [bus 00-ff])
> [ 8794.672741] ACPI: PCI Root Bridge [PC08] (domain 0006 [bus 80-ff])
> [ 8794.696680] ACPI: PCI Root Bridge [PC11] (domain 0009 [bus 00-ff])
> [ 8794.728234] ACPI: PCI Root Bridge [PC07] (domain 0006 [bus 00-7f])
> [ 8794.755098] ACPI: PCI Root Bridge [PC10] (domain 0008 [bus 00-ff])
>
> This creates a critical issue: when NVMe devices are connected to these host bridges, the
> unpredictable kernel scanning sequence causes device identifiers (e.g., /dev/nvme0n1, /dev/nvme1n1)
> to change across reboots. In server environments, such device naming instability is unacceptable as
> it breaks storage configuration reliability and consistency.
>
> So far, we've only observed this disorderly enumeration in RISC-V multi-host-bridge scenarios, where
> APLIC dependency leads to enumeration via system_unbound_wq. We'd like to consult kernel experts:
> 1. Has the impact of enumeration disorder in multi-host-bridge scenarios been considered?
> 2. Are there viable solutions to address the random enumeration caused by system_unbound_wq?
Thank you for your previous suggestion regarding the use of udev. We have carefully implemented this
approach, but found that while udev can create symbolic links, it doesn't resolve the underlying
issue of device ordering in lsblk output, which remains important for our use case.
In our large-scale server deployment comprising hundreds of machines, the udev approach presents
several operational challenges:
1. Management Complexity: Maintaining per-machine udev rules across our entire infrastructure is not scalable.
2. Customer Workflow Dependencies: Our customers rely on consistent lsblk output and direct device
names for their operational workflows and automated scripts.
3. Scalability Limitations: We use standardized hardware configurations where consistent behavior
should be the default, rather than requiring per-machine customization.
We've noticed that PCIe enumeration order tends to vary across system reboots (for example: first
boot showed PC08->PC06->PC10->PC07->PC11->PC09, while second boot showed
PC09->PC06->PC08->PC11->PC07->PC10), even though the ACPI firmware consistently reports the root
bridge sequence as PC06->PC07->PC08->PC09->PC10->PC11.
In our testing, we found that adjusting the registration priority of the aplic driver seems to help
ensure the interrupt controller initializes before PCI enumeration, leading to more consistent
device ordering.
We'd be grateful for your expert advice on a few questions:
1. Are there any dependencies or implications we might have missed with this priority adjustment?
2. Would you happen to know of any other kernel-level approaches that could help with enumeration ordering?
3. We're curious if there might be any ongoing discussions about making enumeration more
deterministic for standardized server environments?
We fully understand there are good reasons behind the current design, but the variable ordering has
been somewhat challenging for our users. We'd appreciate any guidance you might offer on the best
way forward.
Thank you for your time and valuable insights.
Best regards,
Liqiang
Powered by blists - more mailing lists