[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081117172549.GA27974@elte.hu>
Date: Mon, 17 Nov 2008 18:25:49 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: David Miller <davem@...emloft.net>, rjw@...k.pl,
linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
Linus Torvalds <torvalds@...ux-foundation.org>,
Stephen Hemminger <shemminger@...tta.com>
Subject: Re: [Bug #11308] tbench regression on each kernel release from
2.6.22 -> 2.6.28
* Ingo Molnar <mingo@...e.hu> wrote:
> > 4% on my machine, but apparently my machine is sooooo special (see
> > oprofile thread), so maybe its cpus have a hard time playing with
> > a contended cache line.
> >
> > It definitly needs more testing on other machines.
> >
> > Maybe you'll discover patch is bad on your machines, this is why
> > it's in net-next-2.6
>
> ok, i'll try it on my testbox too, to check whether it has any effect
> - find below the port to -git.
it gives a small speedup of ~1% on my box:
before: Throughput 3437.65 MB/sec 64 procs
after: Throughput 3473.99 MB/sec 64 procs
... although that's still a bit close to the natural tbench noise
range so it's not conclusive and not like a smoking gun IMO.
But i think this change might just be papering over the real
scalability problem that this workload has in my opinion: that there's
a single localhost route/dst/device that millions of packets are
squeezed through every second:
phoenix:~> ifconfig lo
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:679809512144 (633.1 GiB) TX bytes:679809512144 (633.1 GiB)
There does not seem to be any per CPU ness in localhost networking -
it has a globally single-threaded rx/tx queue AFAICS even if both the
client and server task is on the same CPU - how is that supposed to
perform well? (but i might be missing something)
What kind of test-system do you have - one with P4 style Xeon CPUs
perhaps where dirty-cacheline cachemisses to DRAM were particularly
expensive?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists