linux-kernel - Re: Dynamic configure max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 28 Jul 2009 15:47:13 -0400 (EDT)
From:	Len Brown <lenb@...nel.org>
To:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
Cc:	LKML <linux-kernel@...r.kernel.org>, linux-acpi@...r.kernel.org,
	yakui_zhao <yakui.zhao@...el.com>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Dynamic configure max_cstate

> When running a fio workload, I found sometimes cpu C state has
> big impact on the result. Mostly, fio is a disk I/O workload
> which doesn't spend much time with cpu, so cpu switch to C2/C3
> freqently and the latency is big.
> 
> If I start kernel with idle=poll or processor.max_cstate=1,
> the result is quite good. Consider a scenario that machine is
> busy at daytime and free at night. Could we add a dynamic
> configuration interface for processor.max_cstate or something
> similiar with sysfs? So user applications could change the
> max_cstate dynamically? For example, we could add a new
> parameter to function cpuidle_governor->select to mark the
> highest c state.

max_cstate is a debug param.  It isn't a run-time API and never will be.
User-space shouldn't need to know or care about C-states,
and if it appears it needs to, then we have a bug we need to fix.

The interface in Documentation/power/pm_qos_interface.txt
is supposed to handle this.  Though if the underlying code
is not noticing IO interrupts, then it can't help.

Another thing to look at is processor.latency_factor
which you can change at run-time in
/sys/module/processor/parameters/latency_factor

We multiply the advertised exit latency by this
before deciding to enter a C-state.  The concept
is that ACPI reports a performance number, but what
we really want is a power break-even.  Anyway,
we know the default mulitple is too low, and will be
raising it shortly.

Of course if the current code is not predicting any
IO interrupts on your IO-only workload, this, like
pm_qos, will not help.

cheers,
-Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/