lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AAFE4B7.50606@gmail.com>
Date:	Tue, 15 Sep 2009 21:02:15 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Christoph Lameter <cl@...ux-foundation.org>
CC:	netdev@...r.kernel.org
Subject: Re: UDP regression with packets rates < 10k per sec

Christoph Lameter a écrit :
> On Tue, 15 Sep 2009, Eric Dumazet wrote:
> 
>> Once I understood my 2.6.31 kernel had much more features than 2.6.22 and that I tuned
>> it to :
>>
>> - Let cpu run at full speed (3GHz instead of 2GHz) : before tuning, 2.6.31 was
>> using "ondemand" governor and my cpus were running at 2GHz, while they where
>> running at 3GHz on my 2.6.22 config
> 
> My kernel did not have support for any governors compiled in.
> 
>> - Dont let cpus enter C2/C3 wait states (idle=mwait)
> 
> Ok. Trying idle=mwait.
> 
>> - Correctly affine cpu to ethX irq (2.6.22 was running ethX irq on one cpu, while
>>  on 2.6.31, irqs were distributed to all online cpus)
> 
> Interrupts of both 2.6.22 and 2.6.31 go to cpu 0. Does it matter for
> loopback?

No of course, loopback triggers softirq on the local cpu, no special setup
to respect.

> 
>> Then, your mcast test gives same results, at 10pps, 100pps, 1000pps, 10000pps
> 
> loopback via mcast -Ln1 -r <rate>
> 
> 		10pps	100pps	1000pps	10000pps
> 2.6.22(32bit)	7.36	7.28	7.15	7.16
> 2.6.31(64bit)	9.28	10.27	9.70	9.79
> 
> What a difference. Now the initial latency rampup for 2.6.31 is gone. So
> even w/o governors the kernel does something to increase the latencies.
> 
> We sacrificed 2 - 3 microseconds per message to kernel features, bloat and
> 64 bitness?

Well, I dont know, I mainly use 32bits kernel, but yes, using 64bits has a cost,
since skbs for example are bigger, sockets are bigger, so we touch more cache lines
per transaction...


You could precisely compute number of cycles per transaction, with "perf" tools
(only on 2.6.31), between 64bit and 32bit kernels, benching 100000 pps for example
and counting number of perf counter irqs per second
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ