linux-kernel - Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49C78BE0.9090107@fujitsu-siemens.com>
Date:	Mon, 23 Mar 2009 14:17:20 +0100
From:	Martin Wilck <martin.wilck@...itsu-siemens.com>
To:	Corey Minyard <minyard@....org>
CC:	Greg KH <greg@...ah.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"openipmi-developer@...ts.sourceforge.net" 
	<openipmi-developer@...ts.sourceforge.net>
Subject: Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)

Hi Corey, hi Greg, hi all,

first of all I need to apologize, because _the first patch I sent was 
broken_. The attached patch should work better.

I did some benchmarking with this patch. In short:

1. The kipmid_max_busy parameter is a tunable that behaves reasonably. 
2. Low values of this parameter use up almost as little CPU as the 
"force_kipmid=0" case, but perform better.
3. It is important to distinguish cases with and without CPU load.
4. To offer this tunable to make a balance between max. CPU load of 
kipmid and performance appears to be worthwhile for many users.

Now the details ... The following tables are in CSV format. The 
benchmark used was a script using ipmitool to read all SDRs and all SEL 
events from the BMC 10x in a loop. This takes 22s with the default 
driver (using nearly 100% CPU), and almost 30x longer without kipmid 
(force_kipmid=off). The "busy cycles" in the table were calculated from 
oprofile CPU_CLK_UNHALTED counts; the "kipmid CPU%" are output from "ps 
-eo pcpu". The tested kernel was an Enterprise Linux kernel with HZ=1000.

"Results without load"
         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)" 
  "CPU busy cycles (%)"
"default       "        22      1       32      103.15
"force_kipmid=0"        621     28.23   0       12.7
"kipmid_max_busy=5000"  21      0.95    34      100.16
"kipmid_max_busy=2000"  22      1       34      94.04
"kipmid_max_busy=1000"  27      1.23    25      26.89
"kipmid_max_busy=500"   24      1.09    0       69.44
"kipmid_max_busy=200"   42      1.91    0       46.72
"kipmid_max_busy=100"   68      3.09    0       17.87
"kipmid_max_busy=50"    101     4.59    0       22.91
"kipmid_max_busy=20"    163     7.41    0       19.98
"kipmid_max_busy=10"    213     9.68    0       13.19

As expected, kipmid_max_busy > 1000 has almost no effect (with HZ=1000). 
kipmid_max_busy=500 saves 30% busy time losing only 10% performance. 
With kipmid_max_busy=10, the performance result is 3x better than just 
switching kipmid totally off, with almost the same amount of CPU busy 
cycles. Note that the %CPU displayed by "ps", "top" etc drops to 0 for 
kipmid_max_busy < HZ. This effect is an artefact caused by the CPU time 
being measured only at timer interrupts. But it will also make user 
complains about kipmid drop to 0 - think about it ;-)

I took another run with a system under 100% CPU load by other processes. 
Now there is hardly any performance difference any more. As expected,
the kipmid runs are all only slightly faster than the interrupt-driven 
run which isn't affected by the CPU load. In this case, recording the 
CPU load from kipmid makes no sense (it is ~0 anyway).

         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)"
"Results with 100% CPU load"
"default       "        500     22.73
"force_kipmid=0"        620     28.18
"kipmid_max_busy=1000"  460     20.91
"kipmid_max_busy=500"   500     22.73
"kipmid_max_busy=200"   530     24.09
"kipmid_max_busy=100"   570     25.91

As I said initially, these are results taken on a single system. On this 
system the KCS response times (from start to end of the 
SI_SM_CALL_WITH_DELAY loop) are between 200 and 2000 us:

us	%wait finished until
200	0%
400	21%
600	39%
800	44%
1000	55%
1200	89%
1400	94%
1600	97%

This may well be different on other systems, depending on the BMC, 
number of sensors, etc. Therefore I think this should remain a tunable, 
because finding an optimal value for arbitrary systems will be hard. Of 
course, the impi driver could implement some sort of self-tuning logic, 
but that would be overengineered to my taste. kipmid_max_busy would give 
HW vendors a chance to determine an optimal value for a given system and 
give a respective recommendation to users.

Best regards
Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@...itsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

View attachment "ipmi_si_max_busy-fixed-2.6.29-rc8.diff" of type "text/x-patch" (3299 bytes)