linux-kernel - Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -> 2.6.28

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 17 Nov 2008 18:33:10 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	David Miller <davem@...emloft.net>, rjw@...k.pl,
	linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
	cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: [Bug #11308] tbench regression on each kernel release from	2.6.22
 -&gt; 2.6.28

Ingo Molnar a écrit :
> * Ingo Molnar <mingo@...e.hu> wrote:
> 
>>> 4% on my machine, but apparently my machine is sooooo special (see 
>>> oprofile thread), so maybe its cpus have a hard time playing with 
>>> a contended cache line.
>>>
>>> It definitly needs more testing on other machines.
>>>
>>> Maybe you'll discover patch is bad on your machines, this is why 
>>> it's in net-next-2.6
>> ok, i'll try it on my testbox too, to check whether it has any effect 
>> - find below the port to -git.
> 
> it gives a small speedup of ~1% on my box:
> 
>    before:      Throughput 3437.65 MB/sec 64 procs
>    after:       Throughput 3473.99 MB/sec 64 procs

Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"

> 
> ... although that's still a bit close to the natural tbench noise 
> range so it's not conclusive and not like a smoking gun IMO.
> 
> But i think this change might just be papering over the real 
> scalability problem that this workload has in my opinion: that there's 
> a single localhost route/dst/device that millions of packets are 
> squeezed through every second:

Yes, this point was mentioned on netdev a while back.

> 
>  phoenix:~> ifconfig lo
>  lo        Link encap:Local Loopback  
>            inet addr:127.0.0.1  Mask:255.0.0.0
>            UP LOOPBACK RUNNING  MTU:16436  Metric:1
>            RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:0 
>            RX bytes:679809512144 (633.1 GiB)  TX bytes:679809512144 (633.1 GiB)
> 
> There does not seem to be any per CPU ness in localhost networking - 
> it has a globally single-threaded rx/tx queue AFAICS even if both the 
> client and server task is on the same CPU - how is that supposed to 
> perform well? (but i might be missing something)

Stephen had a patch for this one too, but we got tbench noise too with this patch

http://kerneltrap.org/mailarchive/linux-netdev/2008/11/5/3926034


> 
> What kind of test-system do you have - one with P4 style Xeon CPUs 
> perhaps where dirty-cacheline cachemisses to DRAM were particularly 
> expensive?

Its a HP BL460c g1

Dual quad-core cpus Intel E5450  @3.00GHz

So 8 logical cpus. My bench was "tbench 8"


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/