[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <395864833.20090430014946@gemenii.ro>
Date: Thu, 30 Apr 2009 01:49:46 +0300
From: Calin Velea <calin.velea@...enii.ro>
To: Radu Rendec <radu.rendec@...s.ro>
CC: Jarek Poplawski <jarkao2@...il.com>,
Jesper Dangaard Brouer <hawk@...u.dk>,
Denys Fedoryschenko <denys@...p.net.lb>,
netdev <netdev@...r.kernel.org>
Subject: Re: htb parallelism on multi-core platforms
Wednesday, April 29, 2009, 7:21:11 PM, you wrote:
> I finally managed to disable NAPI on e1000e - apparently it can only be
> done on the "official" Intel driver (downloaded from their website), by
> compiling with "make CFLAGS_EXTRA=-DE1000E_NO_NAPI". This doesn't seem
> to be available in the (2.6.29) kernel driver.
> With NAPI disabled, 4 (of 8) cores go to 100% (instead of only one), but
> overall throughput *decreases* from ~110K pps (with NAPI) to ~80K pps.
> This makes sense, since h/w interrupt is much more time consuming than
> polling (that's the whole idea behind NAPI anyway).
> Radu Rendec
I tested with e1000 only, on a single quad-core CPU - the L2 cache was
shared between the cores.
For 8 cores I suppose you have 2 quad-core CPUs. If the cores actually
used belong to different physical CPUs, L2 cache sharing does not occur -
maybe this could explain the performance drop in your case.
Or there may be other explanation...
Anyway - coming back to David Miller's words:
"HTB acts upon global state, so anything that goes into a particular
device's HTB ruleset is going to be single threaded.
There really isn't any way around this. "
It could be the only way to get more power is to increase the number
of devices where you are shaping. You could split the IP space into 4 groups
and direct the trafic to 4 IMQ devices with 4 iptables rules -
-d 0.0.0.0/2 -j IMQ --todev imq0,
-d 64.0.0.0/2 -j IMQ --todev imq1, etc...
Or you can customize the split depeding on the traffic distribution.
ipset nethash match can also be used.
The 4 devices can have the same htb ruleset, only the right parts
of it will match.
You should test with 4 flows that use all the devices simultaneously and
see what is the aggregate throughput.
The performance gained through parallelism might be a lot higher than the
added overhead of iptables and/or ipset nethash match. Anyway - this is more of
a "hack" than a clean solution :)
p.s.: latest IMQ at http://www.linuximq.net/ is for 2.6.26 so you will need to try with that
--
Best regards,
Calin mailto:calin.velea@...enii.ro
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists