lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49219D36.5020801@cosmosbay.com>
Date:	Mon, 17 Nov 2008 17:35:02 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	David Miller <davem@...emloft.net>, rjw@...k.pl,
	linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
	cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: [Bug #11308] tbench regression on each kernel release from	2.6.22
 -&gt; 2.6.28

Ingo Molnar a écrit :
> * Eric Dumazet <dada1@...mosbay.com> wrote:
> 
>>> It all looks like pure old-fashioned straight overhead in the 
>>> networking layer to me. Do we still touch the same global cacheline 
>>> for every localhost packet we process? Anything like that would 
>>> show up big time.
>> Yes we do, I find strange we dont see dst_release() in your NMI 
>> profile
>>
>> I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387 
>> net: make sure struct dst_entry refcount is aligned on 64 bytes) (in 
>> net-next-2.6 tree) to properly align struct dst_entry refcounter and 
>> got 4% speedup on tbench on my machine.
> 
> Ouch, +4% from a oneliner networking change? That's a _huge_ speedup 
> compared to the things we were after in scheduler land. A lot of 
> scheduler folks worked hard to squeeze the last 1-2% out of the 
> scheduler fastpath (which was not trivial at all). The _full_ 
> scheduler accounts for only about 7% of the total system overhead here 
> on a 16-way box...

4% on my machine, but apparently my machine is sooooo special (see oprofile thread),
so maybe its cpus have a hard time playing with a contended cache line.

It definitly needs more testing on other machines.

Maybe you'll discover patch is bad on your machines, this is why it's in
net-next-2.6

> 
> So why should we be handling this anything but a plain networking 
> performance regression/weakness? The localhost scalability bottleneck 
> has been reported a _long_ time ago.
> 

struct dst_entry problem was already discovered a _long_ time ago
and probably solved at this time.

(commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
Thu, 13 Mar 2008 05:52:37 +0000 (22:52 -0700)
[NET]: Fix tbench regression in 2.6.25-rc1)

Then, a gremlin came and broke the thing.

They are many contended cache lines in the system, we can do our
best to try to make them disappear. Thats not always possible.

Another contended cache line is the rwlock in iptables.
I remember Stephen had a patch to make the thing use RCU.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ