netdev - Re: OFT - reserving CPU's for networking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BDB1E46.6050106@athenacr.com>
Date:	Fri, 30 Apr 2010 14:15:34 -0400
From:	Brian Bloniarz <bmb@...enacr.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, Andi Kleen <andi@...stfloor.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: OFT - reserving CPU's for networking

Eric Dumazet wrote:
> Le jeudi 29 avril 2010 à 21:19 +0200, Thomas Gleixner a écrit :
> 
>> Say thanks to Intel/AMD for providing us timers which stop in lower
>> c-states.
>>
>> Not much we can do about the broadcast lock when several cores are
>> going idle and we need to setup a global timer to work around the
>> lapic timer stops in C2/C3 issue.
>>
>> Simply the C-state timer broadcasting does not scale. And it was never
>> meant to scale. It's a workaround for laptops to have functional NOHZ.
>>
>> There are several ways to work around that on larger machines:
>>
>>  - Restrict c-states
>>  - Disable NOHZ and highres timers
>>  - idle=poll is definitely the worst of all possible solutions
>>
>>> I keep getting asked about taking some core's away from clock and scheduler
>>> to be reserved just for network processing. Seeing this kind of stuff
>>> makes me wonder if maybe that isn't a half bad idea.
>> This comes up every few month and we pointed out several times what
>> needs to be done to make this work w/o these weird hacks which put a
>> core offline and then start some magic undebugable binary blob on it.
>> We have not seen anyone working on this, but the "set cores aside and
>> let them do X" idea seems to stick in peoples heads.
>>
>> Seriously, that's not a solution. It's going to be some hacked up
>> nightmare which is completely unmaintainable.
>>
>> Aside of that I seriously doubt that you can do networking w/o time
>> and timers.
>>
> 
> Thanks a lot !
> 
> booting with processor.max_cstate=1 solves the problem
> 
> (I already had a CONFIG_NO_HZ=no conf, but highres timer enabled)
> 
> Even with _carefuly_ chosen crazy configuration (receiving a packet on a
> cpu, then transfert it to another cpu, with a full 16x16 matrix
> involved), generating 700.000 IPI per second on the machine seems fine
> now.

FYI you can also restrict c=states at runtime with PM QoS:
Documentation/power/pm_qos_interface.txt

On my machine, /sys/devices/system/cpu/cpu0/cpuidle/state2/latency
is 205usec, so configuring a PM QoS request for <= 205usec latency
should prevent it being entered:

#!/usr/bin/python
import os;
import struct;
import signal;

latency_rec_usec = 100
f = os.open("/dev/cpu_dma_latency", os.O_WRONLY);
os.write(f, struct.pack("=i", latency_rec_usec));
signal.pause();
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html