[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251124104836.3685533-1-lrizzo@google.com>
Date: Mon, 24 Nov 2025 10:48:36 +0000
From: Luigi Rizzo <lrizzo@...gle.com>
To: jacob.jun.pan@...ux.intel.com, lrizzo@...gle.com, rizzo.unipi@...il.com,
seanjc@...gle.com, tglx@...utronix.de
Cc: a.manzanares@...sung.com, acme@...nel.org, ashok.raj@...el.com,
axboe@...nel.dk, baolu.lu@...ux.intel.com, bp@...en8.de,
dan.j.williams@...el.com, dave.hansen@...el.com, guang.zeng@...el.com,
helgaas@...nel.org, hpa@...or.com, iommu@...ts.linux.dev,
jim.harris@...sung.com, joro@...tes.org, kevin.tian@...el.com,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org, maz@...nel.org,
mingo@...hat.com, oliver.sang@...el.com, paul.e.luse@...el.com,
peterz@...radead.org, robert.hoo.linux@...il.com, robin.murphy@....com,
x86@...nel.org
Subject: Re: [PATCH v3 00/12] Coalesced Interrupt Delivery with posted MSI
I think there is an inherent race condition when intremap=posted_msi
and the IRQ subsystem resends pending interrupts via __apic_send_IPI().
In detail:
intremap=posted_msi does not process vectors for which the
corresponding bit in the PIR register is set.
Now say that, for whatever reason, the IRQ infrastructure intercepts
an interrupt marking it as PENDING. . handle_edge_irq() and many other
places in kernel/irq have sections of code like this:
if (!irq_may_run(desc)) {
desc->istate |= IRQS_PENDING;
mask_ack_irq(desc);
goto out_unlock;
}
Then eventually check_irq_resend() will try to resend pending interrupts
desc->istate &= ~IRQS_PENDING;
if (!try_retrigger(desc))
err = irq_sw_resend(desc);
try_retrigger() on x86 eventually calls apic_retrigger_irq() which
uses __apic_send_IPI(). Unfortunately the latter does not seem to
set the 'vector' bit in the PIR (nor sends the POSTED_MSI interrupt)
thus potentially causing a lost interrupt unless there is some other
spontaneous interrupt coming from the device.
I could verify the stall (forcing the path that sets IRQS_PENDING),
and could verify that the patch below fixes the problem
static int apic_retrigger_irq(struct irq_data *irqd)
{
struct apic_chip_data *apicd = apic_chip_data(irqd);
unsigned long flags;
+ uint vec;
raw_spin_lock_irqsave(&vector_lock, flags);
+ vec = apicd->vector;
+ if (posted_msi_supported() &&
+ vec >= FIRST_EXTERNAL_VECTOR && vec < FIRST_SYSTEM_VECTOR) {
+ struct pi_desc *pid = per_cpu_ptr(&posted_msi_pi_desc, apicd->cpu);
+ set_bit(vec, (unsigned long *)pid->pir64);
+ __apic_send_IPI(apicd->cpu, POSTED_MSI_NOTIFICATION_VECTOR);
+ } else {
__apic_send_IPI(apicd->cpu, apicd->vector);
+ }
raw_spin_unlock_irqrestore(&vector_lock, flags);
return 1;
}
Am I missing something ? any better fix ?
cheers
luigi
Powered by blists - more mailing lists