linux-kernel - Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1803091531000.1364@nanos.tec.linutronix.de>
Date:   Fri, 9 Mar 2018 16:08:19 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ming Lei <ming.lei@...hat.com>
cc:     Artem Bityutskiy <dedekind1@...il.com>,
        Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@...radead.org>,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        Laurence Oberman <loberman@...hat.com>
Subject: Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online
 CPUs as far as possible

On Fri, 9 Mar 2018, Ming Lei wrote:
> On Fri, Mar 09, 2018 at 11:08:54AM +0100, Thomas Gleixner wrote:
> > > > So my understanding is that these irq patches are enhancements and not bug
> > > > fixes. I'll queue them for 4.17 then.
> > > 
> > > Wrt. this IO hang issue, these patches shouldn't be bug fix, but they may
> > > fix performance regression[1] for some systems caused by 84676c1f21 ("genirq/affinity:
> > > assign vectors to all possible CPUs").
> > > 
> > > [1] https://marc.info/?l=linux-block&m=152050347831149&w=2
> > 
> > Hmm. The patches are rather large for urgent and evtl. backporting. Is
> > there a simpler way to address that performance issue?
> 
> Not thought of a simpler solution. The problem is that number of active msix vector
> is decreased a lot by commit 84676c1f21.

It's reduced in cases where the number of possible CPUs is way larger than
the number of online CPUs.

Now, if you look at the number of present CPUs on such systems it's
probably the same as the number of online CPUs.

It only differs on machines which support physical hotplug, but that's not
the normal case. Those systems are more special and less wide spread.

So the obvious simple fix for this regression issue is to spread out the
vectors accross present CPUs and not accross possible CPUs.

I'm not sure if there is a clear indicator whether physcial hotplug is
supported or not, but the ACPI folks (x86) and architecture maintainers
should be able to answer that question. I have a machine which says:

   smpboot: Allowing 128 CPUs, 96 hotplug CPUs

There is definitely no way to hotplug anything on that machine and sure the
existing spread algorithm will waste vectors to no end.

Sure then there is virt, which can pretend to have a gazillion of possible
hotpluggable CPUs, but virt is an insanity on its own. Though someone might
come up with reasonable heuristics for that as well.

Thoughts?

Thanks,

	tglx