[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhSdy27gaVJaXBrx8GB+Xr4ZTvp8hd0Jg8JokzehgC-=5pOmA@mail.gmail.com>
Date: Wed, 4 Dec 2024 09:13:24 +0530
From: Anup Patel <anup@...infault.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Andrew Jones <ajones@...tanamicro.com>, iommu@...ts.linux.dev,
kvm-riscv@...ts.infradead.org, kvm@...r.kernel.org,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
tjeznach@...osinc.com, zong.li@...ive.com, joro@...tes.org, will@...nel.org,
robin.murphy@....com, atishp@...shpatra.org, alex.williamson@...hat.com,
paul.walmsley@...ive.com, palmer@...belt.com, aou@...s.berkeley.edu
Subject: Re: [RFC PATCH 01/15] irqchip/riscv-imsic: Use hierarchy to reach irq_set_affinity
On Wed, Dec 4, 2024 at 4:29 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Tue, Dec 03 2024 at 21:55, Thomas Gleixner wrote:
> > On Tue, Dec 03 2024 at 22:07, Anup Patel wrote:
> >> On Tue, Dec 3, 2024 at 7:23 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> >>> Sorry, I missed that when reviewing the original IMSIC MSI support.
> >>>
> >>> The whole IMSIC MSI support can be moved over to MSI LIB which makes all
> >>> of this indirection go away and your intermediate domain will just fit
> >>> in.
> >>>
> >>> Uncompiled patch below. If that works, it needs to be split up properly.
> >>>
> >>> Note, this removes the setup of the irq_retrigger callback, but that's
> >>> fine because on hierarchical domains irq_chip_retrigger_hierarchy() is
> >>> invoked anyway. See try_retrigger().
> >>
> >> The IMSIC driver was merged one kernel release before common
> >> MSI LIB was merged.
> >
> > Ah indeed.
> >
> >> We should definitely update the IMSIC driver to use MSI LIB, I will
> >> try your suggested changes (below) and post a separate series.
> >
> > Pick up the delta patch I gave Andrew...
>
> As I was looking at something else MSI related I had a look at
> imsic_irq_set_affinity() again.
>
> It's actually required to have the message write in that function and
> not afterwards as you invoke imsic_vector_move() from that function.
>
> That's obviously not true for the remap case as that will not change the
> message address/data pair because the remap table entry is immutable -
> at least I assume so for my mental sanity sake :)
>
> But that brings me to a related question. How is this supposed to work
> with non-atomic message updates? PCI/MSI does not necessarily provide
> masking, and the write of the address/data pair is done in bits and
> pieces. So you can end up with an intermediate state seen by the device
> which ends up somewhere in interrupt nirvana space.
>
> See the dance in msi_set_affinity() and commit 6f1a4891a592
> ("x86/apic/msi: Plug non-maskable MSI affinity race") for further
> explanation.
>
> The way how the IMSIC driver works seems to be pretty much the same as
> the x86 APIC mess:
>
> @address is the physical address of the per CPU MSI target
> address and @data is the vector ID on that CPU.
>
> So the non-atomic update in case of non-maskable MSI suffers from the
> same problem. It works most of the time, but if it doesn't you might
> stare at the occasionally lost interrupt and the stale device in
> disbelief for quite a while :)
Yes, we have the same challenges as x86 APIC when changing
MSI affinity.
>
> I might be missing something which magically prevent that though :)
>
Your understanding is correct. In fact, the IMSIC msi_set_affinity()
handling is inspired from x86 APIC approach due to similarity in
the overall MSI controller.
The high-level idea of imsic_irq_set_affinity() is as follows:
1) Allocate new_vector (new CPU IMSIC address + new ID on that CPU)
2) Update the MSI address and data programmed in the device
based on new_vector (see imsic_msi_update_msg())
3) At this point the device points to the new_vector but old_vector
(old CPU IMSIC address + old ID on that CPU) is still enabled and
we might have received MSI on old_vector while we were busy
setting up a new_vector for the device. To address this, we call
imsic_vector_move().
4) The imsic_vector_move() marks the old_vector as being
moved and schedules a lazy timer on the old CPU.
5) The lazy timer expires on the old CPU and results in
__imsic_local_sync() being called on the old CPU.
6) If there was a pending MSI on the old vector then the
__imsic_local_sync() function injects an MSI to the
new_vector using an MMIO write.
It is very unlikely that an MSI from device will be dropped
(unless I am missing something) but the unsolved issue
is that handling of in-flight MSI received on the old_vector
during the MSI re-programming is delayed which may have
side effects on the device driver side.
I believe in the future RISC-V AIA v2.0 (whenever that
happens) will address the gaps in AIA v1.0 (like this one).
Regards,
Anup
Powered by blists - more mailing lists