lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 17 Nov 2008 18:25:49 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	David Miller <davem@...emloft.net>, rjw@...k.pl,
	linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
	cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: [Bug #11308] tbench regression on each kernel release from
	2.6.22 -&gt; 2.6.28


* Ingo Molnar <mingo@...e.hu> wrote:

> > 4% on my machine, but apparently my machine is sooooo special (see 
> > oprofile thread), so maybe its cpus have a hard time playing with 
> > a contended cache line.
> >
> > It definitly needs more testing on other machines.
> >
> > Maybe you'll discover patch is bad on your machines, this is why 
> > it's in net-next-2.6
> 
> ok, i'll try it on my testbox too, to check whether it has any effect 
> - find below the port to -git.

it gives a small speedup of ~1% on my box:

   before:      Throughput 3437.65 MB/sec 64 procs
   after:       Throughput 3473.99 MB/sec 64 procs

... although that's still a bit close to the natural tbench noise 
range so it's not conclusive and not like a smoking gun IMO.

But i think this change might just be papering over the real 
scalability problem that this workload has in my opinion: that there's 
a single localhost route/dst/device that millions of packets are 
squeezed through every second:

 phoenix:~> ifconfig lo
 lo        Link encap:Local Loopback  
           inet addr:127.0.0.1  Mask:255.0.0.0
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
           TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0 
           RX bytes:679809512144 (633.1 GiB)  TX bytes:679809512144 (633.1 GiB)

There does not seem to be any per CPU ness in localhost networking - 
it has a globally single-threaded rx/tx queue AFAICS even if both the 
client and server task is on the same CPU - how is that supposed to 
perform well? (but i might be missing something)

What kind of test-system do you have - one with P4 style Xeon CPUs 
perhaps where dirty-cacheline cachemisses to DRAM were particularly 
expensive?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ