[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240209094307.4e7eacd0@jacob-builder>
Date: Fri, 9 Feb 2024 09:43:07 -0800
From: Jacob Pan <jacob.jun.pan@...ux.intel.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: LKML <linux-kernel@...r.kernel.org>, X86 Kernel <x86@...nel.org>, Peter
Zijlstra <peterz@...radead.org>, iommu@...ts.linux.dev, Thomas Gleixner
<tglx@...utronix.de>, Lu Baolu <baolu.lu@...ux.intel.com>,
kvm@...r.kernel.org, Dave Hansen <dave.hansen@...el.com>, Joerg Roedel
<joro@...tes.org>, "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov
<bp@...en8.de>, Ingo Molnar <mingo@...hat.com>, Paul Luse
<paul.e.luse@...el.com>, Dan Williams <dan.j.williams@...el.com>, Raj Ashok
<ashok.raj@...el.com>, "Tian, Kevin" <kevin.tian@...el.com>,
maz@...nel.org, seanjc@...gle.com, Robin Murphy <robin.murphy@....com>,
jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH 00/15] Coalesced Interrupt Delivery with posted MSI
Hi Jens,
On Thu, 8 Feb 2024 08:34:55 -0700, Jens Axboe <axboe@...nel.dk> wrote:
> Hi Jacob,
>
> I gave this a quick spin, using 4 gen2 optane drives. Basic test, just
> IOPS bound on the drive, and using 1 thread per drive for IO. Random
> reads, using io_uring.
>
> For reference, using polled IO:
>
> IOPS=20.36M, BW=9.94GiB/s, IOS/call=31/31
> IOPS=20.36M, BW=9.94GiB/s, IOS/call=31/31
> IOPS=20.37M, BW=9.95GiB/s, IOS/call=31/31
>
> which is abount 5.1M/drive, which is what they can deliver.
>
> Before your patches, I see:
>
> IOPS=14.37M, BW=7.02GiB/s, IOS/call=32/32
> IOPS=14.38M, BW=7.02GiB/s, IOS/call=32/31
> IOPS=14.38M, BW=7.02GiB/s, IOS/call=32/31
> IOPS=14.37M, BW=7.02GiB/s, IOS/call=32/32
>
> at 2.82M ints/sec. With the patches, I see:
>
> IOPS=14.73M, BW=7.19GiB/s, IOS/call=32/31
> IOPS=14.90M, BW=7.27GiB/s, IOS/call=32/31
> IOPS=14.90M, BW=7.27GiB/s, IOS/call=31/32
>
> at 2.34M ints/sec. So a nice reduction in interrupt rate, though not
> quite at the extent I expected. Booted with 'posted_msi' and I do see
> posted interrupts increasing in the PMN in /proc/interrupts,
>
The ints/sec reduction is not as high as I expected either, especially
at this high rate. Which means not enough coalescing going on to get the
performance benefits.
The opportunity of IRQ coalescing is also dependent on how long the
driver's hardirq handler executes. In the posted MSI demux loop, it does
not wait for more MSIs to come before existing the pending IRQ polling
loop. So if the hardirq handler finishes very quickly, it may not coalesce
as much. Perhaps, we need to find more "useful" work to do to maximize the
window for coalescing.
I am not familiar with optane driver, need to look into how its hardirq
handler work. I have only tested NVMe gen5 in terms of storage IO, i saw
30-50% ints/sec reduction at even lower IRQ rate (200k/sec).
> Probably want to fold this one in:
>
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index 8e09d40ea928..a289282f1cf9 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -393,7 +393,7 @@ void intel_posted_msi_init(void)
> * instead of:
> * read, xchg, read, xchg, read, xchg, read, xchg
> */
> -static __always_inline inline bool handle_pending_pir(u64 *pir, struct
> pt_regs *regs) +static __always_inline bool handle_pending_pir(u64 *pir,
> struct pt_regs *regs) {
> int i, vec = FIRST_EXTERNAL_VECTOR;
> unsigned long pir_copy[4];
>
Good catch! will do.
Thanks,
Jacob
Powered by blists - more mailing lists