linux-kernel - Re: >10% performance degradation since 2.6.18

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 04 Jul 2009 05:19:28 -0400
From:	Jeff Garzik <jeff@...zik.org>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Arjan van de Ven <arjan@...radead.org>,
	Matthew Wilcox <matthew@....cx>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org,
	"Styner, Douglas W" <douglas.w.styner@...el.com>,
	Chinang Ma <chinang.ma@...el.com>,
	"Prickett, Terry O" <terry.o.prickett@...el.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	Eric.Moore@....com, DL-MPTFusionLinux@....com,
	NetDev <netdev@...r.kernel.org>
Subject: Re: >10% performance degradation since 2.6.18

Andi Kleen wrote:
>> for networking, especially for incoming data such as new connections,
>> that isn't the case.. that's more or less randomly (well hash based)
>> distributed.
> 
> Ok. Still binding them all to a single CPU all is quite dumb. It
> makes MSI-X quite useless and probably even harmful.
> 
> We don't default to socket power saving for normal scheduling either, 
> but only when you specify a special knob. I don't see why interrupts
> should be different.

In the pre-MSI-X days, you'd have cachelines bouncing all over the place 
if you distributed networking interrupts across CPUs, particularly given 
that NAPI would run some things on a single CPU anyway.

Today, machines are faster, we have multiple interrupts per device, and 
we have multiple RX/TX queues.  I would be interested to see hard 
numbers (as opposed to guesses) about various new ways to distributed 
interrupts across CPUs.

What's the best setup for power usage?
What's the best setup for performance?
Are they the same?
Is it most optimal to have the interrupt for socket $X occur on the same 
CPU as where the app is running?
If yes, how to best handle when the scheduler moves app to another CPU?
Should we reprogram the NIC hardware flow steering mechanism at that point?

Interesting questions, and I hope we'd see some hard number comparisons 
before solutions start flowing into the kernel.

	Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/