[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231115125624.GF3818@noisy.programming.kicks-ass.net>
Date: Wed, 15 Nov 2023 13:56:24 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc: LKML <linux-kernel@...r.kernel.org>, X86 Kernel <x86@...nel.org>,
iommu@...ts.linux.dev, Thomas Gleixner <tglx@...utronix.de>,
Lu Baolu <baolu.lu@...ux.intel.com>, kvm@...r.kernel.org,
Dave Hansen <dave.hansen@...el.com>,
Joerg Roedel <joro@...tes.org>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
Ingo Molnar <mingo@...hat.com>,
Raj Ashok <ashok.raj@...el.com>,
"Tian, Kevin" <kevin.tian@...el.com>, maz@...nel.org,
seanjc@...gle.com, Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH RFC 09/13] x86/irq: Install posted MSI notification
handler
On Sat, Nov 11, 2023 at 08:16:39PM -0800, Jacob Pan wrote:
> +static __always_inline inline void handle_pending_pir(struct pi_desc *pid, struct pt_regs *regs)
> +{
__always_inline means that... (A)
> + int i, vec = FIRST_EXTERNAL_VECTOR;
> + u64 pir_copy[4];
> +
> + /*
> + * Make a copy of PIR which contains IRQ pending bits for vectors,
> + * then invoke IRQ handlers for each pending vector.
> + * If any new interrupts were posted while we are processing, will
> + * do again before allowing new notifications. The idea is to
> + * minimize the number of the expensive notifications if IRQs come
> + * in a high frequency burst.
> + */
> + for (i = 0; i < 4; i++)
> + pir_copy[i] = raw_atomic64_xchg((atomic64_t *)&pid->pir_l[i], 0);
> +
> + /*
> + * Ideally, we should start from the high order bits set in the PIR
> + * since each bit represents a vector. Higher order bit position means
> + * the vector has higher priority. But external vectors are allocated
> + * based on availability not priority.
> + *
> + * EOI is included in the IRQ handlers call to apic_ack_irq, which
> + * allows higher priority system interrupt to get in between.
> + */
> + for_each_set_bit_from(vec, (unsigned long *)&pir_copy[0], 256)
> + call_irq_handler(vec, regs);
> +
> +}
> +
> +/*
> + * Performance data shows that 3 is good enough to harvest 90+% of the benefit
> + * on high IRQ rate workload.
> + * Alternatively, could make this tunable, use 3 as default.
> + */
> +#define MAX_POSTED_MSI_COALESCING_LOOP 3
> +
> +/*
> + * For MSIs that are delivered as posted interrupts, the CPU notifications
> + * can be coalesced if the MSIs arrive in high frequency bursts.
> + */
> +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
> +{
> + struct pt_regs *old_regs = set_irq_regs(regs);
> + struct pi_desc *pid;
> + int i = 0;
> +
> + pid = this_cpu_ptr(&posted_interrupt_desc);
> +
> + inc_irq_stat(posted_msi_notification_count);
> + irq_enter();
> +
> + while (i++ < MAX_POSTED_MSI_COALESCING_LOOP) {
> + handle_pending_pir(pid, regs);
> +
> + /*
> + * If there are new interrupts posted in PIR, do again. If
> + * nothing pending, no need to wait for more interrupts.
> + */
> + if (is_pir_pending(pid))
So this reads those same 4 words we xchg in handle_pending_pir(), right?
> + continue;
> + else
> + break;
> + }
> +
> + /*
> + * Clear outstanding notification bit to allow new IRQ notifications,
> + * do this last to maximize the window of interrupt coalescing.
> + */
> + pi_clear_on(pid);
> +
> + /*
> + * There could be a race of PI notification and the clearing of ON bit,
> + * process PIR bits one last time such that handling the new interrupts
> + * are not delayed until the next IRQ.
> + */
> + if (unlikely(is_pir_pending(pid)))
> + handle_pending_pir(pid, regs);
(A) ... we get _two_ copies of that thing in this function. Does that
make sense ?
> +
> + apic_eoi();
> + irq_exit();
> + set_irq_regs(old_regs);
> +}
> #endif /* X86_POSTED_MSI */
Would it not make more sense to write things something like:
bool handle_pending_pir()
{
bool handled = false;
u64 pir_copy[4];
for (i = 0; i < 4; i++) {
if (!pid-pir_l[i]) {
pir_copy[i] = 0;
continue;
}
pir_copy[i] = arch_xchg(&pir->pir_l[i], 0);
handled |= true;
}
if (!handled)
return handled;
for_each_set_bit()
....
return handled.
}
sysvec_posted_blah_blah()
{
bool done = false;
bool handled;
for (;;) {
handled = handle_pending_pir();
if (done)
break;
if (!handled || ++loops > MAX_LOOPS) {
pi_clear_on(pid);
/* once more after clear_on */
done = true;
}
}
}
Hmm?
Powered by blists - more mailing lists