netdev - Re: rps perfomance WAS(Re: rps: question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100416133707.GZ18855@one.firstfloor.org>
Date:	Fri, 16 Apr 2010 15:37:07 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	jamal <hadi@...erus.ca>
Cc:	Andi Kleen <andi@...stfloor.org>, Changli Gao <xiaosuo@...il.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Rick Jones <rick.jones2@...com>,
	David Miller <davem@...emloft.net>, therbert@...gle.com,
	netdev@...r.kernel.org, robert@...julf.net
Subject: Re: rps perfomance WAS(Re: rps: question

On Fri, Apr 16, 2010 at 09:27:35AM -0400, jamal wrote:
> On Fri, 2010-04-16 at 09:15 +0200, Andi Kleen wrote:
> 
> > > resched IPI, apparently. But it is async absolutely. and its IRQ
> > > handler is lighter.
> > 
> > It shouldn't be a lot lighter than the new fancy "queued smp_call_function"
> > that's in the tree for a few releases. So it would surprise me if it made
> > much difference. In the old days when there was only a single lock for
> > s_c_f() perhaps...
> 
> So you are saying that the old implementation of IPI (likely what i
> tried pre-napi and as recent as 2-3 years ago) was bad because of a
> single lock?

Yes.

The old implementation of smp_call_function. Also in the really old
days there was no smp_call_function_single() so you tended to broadcast.

Jens did a lot of work on this for his block device work IPI implementation.

> On IPIs:
> Is anyone familiar with what is going on with Nehalem? Why is it this
> good? I expect things will get a lot nastier with other hardware like
> xeon based or even Nehalem with rps going across QPI.

Nehalem is just fast. I don't know why it's fast in your specific
case. It might be simply because it has lots of bandwidth everywhere.
Atomic operations are also faster than on previous Intel CPUs.


> Here's why i think IPIs are bad, please correct me if i am wrong:
> - they are synchronous. i.e an IPI issuer has to wait for an ACK (which
> is in the form of an IPI).

In the hardware there's no ack, but in the Linux implementation there
is usually (because need to know when to free the stack state used
to pass information)

However there's also now support for queued IPI
with a special API (I believe Tom is using that)

> - data cache has to be synced to main memory
> - the instruction pipeline is flushed

At least on Nehalem data transfer can be often through the cache.

IPIs involve APIC accesses which are not very fast (so overall
it's far more than a pipeline worth of work), but it's still
not a incredible expensive operation.

There's also X2APIC now which should be slightly faster, but it's 
likely not in your Nehalem (this is only in the highend Xeon versions)

> Do you know any specs i could read up which will tell me a little more?

If you're just interested in IPI and cache line transfer performance it's
probably best to just measure it.

Some general information is always in the Intel optimization guide.

-Andi
-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html