lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9171c554-50d2-142b-96ae-1357952fce52@huawei.com>
Date:   Thu, 19 Mar 2020 12:31:49 +0000
From:   John Garry <john.garry@...wei.com>
To:     Marc Zyngier <maz@...nel.org>, <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>
CC:     chenxiang <chenxiang66@...ilicon.com>,
        Zhou Wang <wangzhou1@...ilicon.com>,
        Ming Lei <ming.lei@...hat.com>,
        Jason Cooper <jason@...edaemon.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        "luojiaxing@...wei.com" <luojiaxing@...wei.com>,
        Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across
 CPUs

On 16/03/2020 11:54, Marc Zyngier wrote:
> When mapping a LPI, the ITS driver picks the first possible
> affinity, which is in most cases CPU0, assuming that if
> that's not suitable, someone will come and set the affinity
> to something more interesting.
> 
> It apparently isn't the case, and people complain of poor
> performance when many interrupts are glued to the same CPU.
> So let's place the interrupts by finding the "least loaded"
> CPU (that is, the one that has the fewer LPIs mapped to it).
> So called 'managed' interrupts are an interesting case where
> the affinity is actually dictated by the kernel itself, and
> we should honor this.
> 
> * From v2:
>    - Split accounting from CPU selection
>    - Track managed and unmanaged interrupts separately
> 
> Marc Zyngier (2):
>    irqchip/gic-v3-its: Track LPI distribution on a per CPU basis
>    irqchip/gic-v3-its: Balance initial LPI affinity across CPUs
> 
>   drivers/irqchip/irq-gic-v3-its.c | 153 +++++++++++++++++++++++++------
>   1 file changed, 127 insertions(+), 26 deletions(-)
> 

Hi Marc,

Initial results look good. We have 3x NVMe drives now, as opposed to 2x 
previously, which is better for this test.

Before: ~1.3M IOPs fio read
After: ~1.8M IOPs fio read

So a ~50% gain in throughput.

We also did try NVMe with nvme.use_threaded_interrupts=1. As you may 
remember, the NVMe interrupt handling can cause lockups, as they handle 
all completions in interrupt context by default.

Before: ~1.2M IOPs fio read
After: ~1.2M IOPs fio read

So they were about the same. I would have hoped for an improvement here, 
considering before we would have all the per-queue threaded handlers 
running on the single CPU handling the hard irq.

But we will retest all this tomorrow, so please consider these 
provisional for now.

Thanks to Luo Jiaxing for testing.

Cheers,
john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ