lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 30 Jul 2009 21:43:00 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Andreas Mohr <andi@...as.de>
CC:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
	Corrado Zoccolo <czoccolo@...il.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-acpi@...r.kernel.org
Subject: Re: Dynamic configure max_cstate

On 07/28/2009 04:11 AM, Andreas Mohr wrote:
> Hi,
>
> On Tue, Jul 28, 2009 at 05:00:35PM +0800, Zhang, Yanmin wrote:
>> I tried different clocksources. For exmaple, I could get a better (30%) result with
>> hpet. With hpet, cpu utilization is about 5~8%. Function hpet_read uses too much cpu
>> time. With tsc, cpu utilization is about 2~3%. I think more cpu utilization causes fewer
>> C state transitions.
>>
>> With idle=poll, the result is about 10% better than the one of hpet. If using idle=poll,
>> I didn't find result difference among different clocksources.
>
> IOW, this seems to clearly point to ACPI Cx causing it.
>
> Both Corrado and me have been thinking that one should try skipping all
> bigger-latency ACPI Cx states whenever there's an ongoing I/O request where an
> immediate reply interrupt is expected.
>
> I've been investigating this a bit, and interesting parts would perhaps include
>    . kernel/pm_qos_params.c
>    . drivers/cpuidle/governors/menu.c (which acts on the ACPI _cx state
> structs as configured by drivers/acpi/processor_idle.c)
>    . and e.g. the wait_for_completion_timeout() part in drivers/ata/libata-core.c
>      (or other sources in case of other disk I/O mechanisms)
>
> One way to do some quick (and dirty!!) testing would be to set a flag
> before calling wait_for_completion_timeout() and testing for this flag in
> drivers/cpuidle/governors/menu.c and then skip deeper Cx states
> conditionally.
>
> As a very quick test, I tried a
> while :; do :; done
> loop in shell and renicing shell to 19 (to keep my CPU out of ACPI idle),
> but bonnie -s 100 results initially looked promising yet turned out to
> be inconsistent. The real way to test this would be idle=poll.
> My test system was Athlon XP with /proc/acpi/processor/CPU0/power
> latencies of 000 and 100 (the maximum allowed value, BTW) for C1/C2.
>
> If the wait_for_completion_timeout() flag testing turns out to help,
> then one might intend to use the pm_qos infrastructure to indicate
> these conditions, however it might be too bloated for such a
> purpose, a relatively simple (read: fast) boolean flag mechanism
> could be better.
>
> Plus one could then create a helper function which figures out a
> "pretty fast" Cx state (independent of specific latency times!).
> But when introducing this mechanism, take care to not ignore the
> requirements defined by pm_qos settings!
>
> Oh, and about the places which submit I/O requests where one would have to
> flag this: are they in any way correlated with the scheduler I/O wait
> value? Would the I/O wait mechanism be a place to more easily and centrally
> indicate that we're waiting for a request to come back in "very soon"?
> OTOH I/O requests may have vastly differing delay expectations,
> thus specifically only short-term expected I/O replies should be flagged,
> otherwise we're wasting lots of ACPI deep idle opportunities.

Did the results show a big difference in performance between maximum C2 
and maximum C3? Thing with C3 is that it likely will have some 
interference with bus-master DMA activity as the CPU has to wake up at 
least partially before the SATA controller can complete DMA operations, 
which will likely stall the controller for some period of time. There 
would be an argument for avoiding going into deep C-states which can't 
handle snooping while IO is in progress and DMA will shortly be occurring..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ