lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3164861.qsE2XkMKPC@aspire.rjw.lan>
Date:   Thu, 09 Mar 2017 16:32:23 +0100
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Jonathan Corbet <corbet@....net>
Cc:     Viresh Kumar <viresh.kumar@...aro.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Documentation <linux-doc@...r.kernel.org>
Subject: Re: [PATCH] cpufreq: User/admin documentation update and consolidation

+linux-doc (sorry for omitting it in the first place)

On Thursday, March 09, 2017 04:28:32 PM Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> The user/admin documentation of cpufreq is badly outdated.  It
> conains stale and/or inaccurate information along with things
> that are not particularly useful.  Also, some of the important
> pieces are missing from it.
> 
> For this reason, add a new user/admin document for cpufreq
> containing current information to admin-guide and drop the old
> outdated .txt documents it is replacing.
> 
> Since there will be more PM documents in admin-guide going forward,
> create a separate directory for them and put the cpufreq document
> in there right away.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Acked-by: Viresh Kumar <viresh.kumar@...aro.org>
> ---
> 
> Hi Jon,
> 
> This hasn't changed since it was sent last time as an RFC
> (https://patchwork.kernel.org/patch/9583783/) and it has not received any
> comments since then too, so from my perspective it is good to go.
> 
> Please apply.
> 
> Thanks,
> Rafael
> 
> ---
>  Documentation/admin-guide/index.rst      |    1 
>  Documentation/admin-guide/pm/cpufreq.rst |  700 +++++++++++++++++++++++++++++++
>  Documentation/admin-guide/pm/index.rst   |   15 
>  Documentation/cpu-freq/boost.txt         |   93 ----
>  Documentation/cpu-freq/governors.txt     |  301 -------------
>  Documentation/cpu-freq/index.txt         |    7 
>  Documentation/cpu-freq/user-guide.txt    |  226 ----------
>  7 files changed, 716 insertions(+), 627 deletions(-)
> 
> Index: linux-pm/Documentation/admin-guide/pm/cpufreq.rst
> ===================================================================
> --- /dev/null
> +++ linux-pm/Documentation/admin-guide/pm/cpufreq.rst
> @@ -0,0 +1,700 @@
> +.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>`
> +
> +=======================
> +CPU Performance Scaling
> +=======================
> +
> +::
> +
> + Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> +
> +The Concept of CPU Performance Scaling
> +======================================
> +
> +The majority of modern processors are capable of operating in a number of
> +different clock frequency and voltage configurations, often referred to as
> +Operating Performance Points or P-states (in ACPI terminology).  As a rule,
> +the higher the clock frequency and the higher the voltage, the more instructions
> +can be retired by the CPU over a unit of time, but also the higher the clock
> +frequency and the higher the voltage, the more energy is consumed over a unit of
> +time (or the more power is drawn) by the CPU in the given P-state.  Therefore
> +there is a natural tradeoff between the CPU capacity (the number of instructions
> +that can be executed over a unit of time) and the power drawn by the CPU.
> +
> +In some situations it is desirable or even necessary to run the program as fast
> +as possible and then there is no reason to use any P-states different from the
> +highest one (i.e. the highest-performance frequency/voltage configuration
> +available).  In some other cases, however, it may not be necessary to execute
> +instructions so quickly and maintaining the highest available CPU capacity for a
> +relatively long time without utilizing it entirely may be regarded as wasteful.
> +It also may not be physically possible to maintain maximum CPU capacity for too
> +long for thermal or power supply capacity reasons or similar.  To cover those
> +cases, there are hardware interfaces allowing CPUs to be switched between
> +different frequency/voltage configurations or (in the ACPI terminology) to be
> +put into different P-states.
> +
> +Typically, they are used along with algorithms to estimate the required CPU
> +capacity, so as to decide which P-states to put the CPUs into.  Of course, since
> +the utilization of the system generally changes over time, that has to be done
> +repeatedly on a regular basis.  The activity by which this happens is referred
> +to as CPU performance scaling or CPU frequency scaling (because it involves
> +adjusting the CPU clock frequency).
> +
> +
> +CPU Performance Scaling in Linux
> +================================
> +
> +The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
> +(CPU Frequency scaling) subsystem that consists of three layers of code: the
> +core, scaling governors and scaling drivers.
> +
> +The ``CPUFreq`` core provides the common code infrastructure and user space
> +interfaces for all platforms that support CPU performance scaling.  It defines
> +the basic framework in which the other components operate.
> +
> +Scaling governors implement algorithms to estimate the required CPU capacity.
> +As a rule, each governor implements one, possibly parametrized, scaling
> +algorithm.
> +
> +Scaling drivers talk to the hardware.  They provide scaling governors with
> +information on the available P-states (or P-state ranges in some cases) and
> +access platform-specific hardware interfaces to change CPU P-states as requested
> +by scaling governors.
> +
> +In principle, all available scaling governors can be used with every scaling
> +driver.  That design is based on the observation that the information used by
> +performance scaling algorithms for P-state selection can be represented in a
> +platform-independent form in the majority of cases, so it should be possible
> +to use the same performance scaling algorithm implemented in exactly the same
> +way regardless of which scaling driver is used.  Consequently, the same set of
> +scaling governors should be suitable for every supported platform.
> +
> +However, that observation may not hold for performance scaling algorithms
> +based on information provided by the hardware itself, for example through
> +feedback registers, as that information is typically specific to the hardware
> +interface it comes from and may not be easily represented in an abstract,
> +platform-independent way.  For this reason, ``CPUFreq`` allows scaling drivers
> +to bypass the governor layer and implement their own performance scaling
> +algorithms.  That is done by the ``intel_pstate`` scaling driver.
> +
> +
> +``CPUFreq`` Policy Objects
> +==========================
> +
> +In some cases the hardware interface for P-state control is shared by multiple
> +CPUs.  That is, for example, the same register (or set of registers) is used to
> +control the P-state of multiple CPUs at the same time and writing to it affects
> +all of those CPUs simultaneously.
> +
> +Sets of CPUs sharing hardware P-state control interfaces are represented by
> +``CPUFreq`` as |struct cpufreq_policy| objects.  For consistency,
> +|struct cpufreq_policy| is also used when there is only one CPU in the given
> +set.
> +
> +The ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for
> +every CPU in the system, including CPUs that are currently offline.  If multiple
> +CPUs share the same hardware P-state control interface, all of the pointers
> +corresponding to them point to the same |struct cpufreq_policy| object.
> +
> +``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design
> +of its user space interface is based on the policy concept.
> +
> +
> +CPU Initialization
> +==================
> +
> +First of all, a scaling driver has to be registered for ``CPUFreq`` to work.
> +It is only possible to register one scaling driver at a time, so the scaling
> +driver is expected to be able to handle all CPUs in the system.
> +
> +The scaling driver may be registered before or after CPU registration.  If
> +CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
> +take a note of all of the already registered CPUs during the registration of the
> +scaling driver.  In turn, if any CPUs are registered after the registration of
> +the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
> +at their registration time.
> +
> +In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
> +has not seen so far as soon as it is ready to handle that CPU.  [Note that the
> +logical CPU may be a physical single-core processor, or a single core in a
> +multicore processor, or a hardware thread in a physical processor or processor
> +core.  In what follows "CPU" always means "logical CPU" unless explicitly stated
> +otherwise and the word "processor" is used to refer to the physical part
> +possibly including multiple logical CPUs.]
> +
> +Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
> +for the given CPU and if so, it skips the policy object creation.  Otherwise,
> +a new policy object is created and initialized, which involves the creation of
> +a new policy directory in ``sysfs``, and the policy pointer corresponding to
> +the given CPU is set to the new policy object's address in memory.
> +
> +Next, the scaling driver's ``->init()`` callback is invoked with the policy
> +pointer of the new CPU passed to it as the argument.  That callback is expected
> +to initialize the performance scaling hardware interface for the given CPU (or,
> +more precisely, for the set of CPUs sharing the hardware interface it belongs
> +to, represented by its policy object) and, if the policy object it has been
> +called for is new, to set parameters of the policy, like the minimum and maximum
> +frequencies supported by the hardware, the table of available frequencies (if
> +the set of supported P-states is not a continuous range), and the mask of CPUs
> +that belong to the same policy (including both online and offline CPUs).  That
> +mask is then used by the core to populate the policy pointers for all of the
> +CPUs in it.
> +
> +The next major initialization step for a new policy object is to attach a
> +scaling governor to it (to begin with, that is the default scaling governor
> +determined by the kernel configuration, but it may be changed later
> +via ``sysfs``).  First, a pointer to the new policy object is passed to the
> +governor's ``->init()`` callback which is expected to initialize all of the
> +data structures necessary to handle the given policy and, possibly, to add
> +a governor ``sysfs`` interface to it.  Next, the governor is started by
> +invoking its ``->start()`` callback.
> +
> +That callback it expected to register per-CPU utilization update callbacks for
> +all of the online CPUs belonging to the given policy with the CPU scheduler.
> +The utilization update callbacks will be invoked by the CPU scheduler on
> +important events, like task enqueue and dequeue, on every iteration of the
> +scheduler tick or generally whenever the CPU utilization may change (from the
> +scheduler's perspective).  They are expected to carry out computations needed
> +to determine the P-state to use for the given policy going forward and to
> +invoke the scaling driver to make changes to the hardware in accordance with
> +the P-state selection.  The scaling driver may be invoked directly from
> +scheduler context or asynchronously, via a kernel thread or workqueue, depending
> +on the configuration and capabilities of the scaling driver and the governor.
> +
> +Similar steps are taken for policy objects that are not new, but were "inactive"
> +previously, meaning that all of the CPUs belonging to them were offline.  The
> +only practical difference in that case is that the ``CPUFreq`` core will attempt
> +to use the scaling governor previously used with the policy that became
> +"inactive" (and is re-initialized now) instead of the default governor.
> +
> +In turn, if a previously offline CPU is being brought back online, but some
> +other CPUs sharing the policy object with it are online already, there is no
> +need to re-initialize the policy object at all.  In that case, it only is
> +necessary to restart the scaling governor so that it can take the new online CPU
> +into account.  That is achieved by invoking the governor's ``->stop`` and
> +``->start()`` callbacks, in this order, for the entire policy.
> +
> +As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling
> +governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
> +Consequently, if ``intel_pstate`` is used, scaling governors are not attached to
> +new policy objects.  Instead, the driver's ``->setpolicy()`` callback is invoked
> +to register per-CPU utilization update callbacks for each policy.  These
> +callbacks are invoked by the CPU scheduler in the same way as for scaling
> +governors, but in the ``intel_pstate`` case they both determine the P-state to
> +use and change the hardware configuration accordingly in one go from scheduler
> +context.
> +
> +The policy objects created during CPU initialization and other data structures
> +associated with them are torn down when the scaling driver is unregistered
> +(which happens when the kernel module containing it is unloaded, for example) or
> +when the last CPU belonging to the given policy in unregistered.
> +
> +
> +Policy Interface in ``sysfs``
> +=============================
> +
> +During the initialization of the kernel, the ``CPUFreq`` core creates a
> +``sysfs`` directory (kobject) called ``cpufreq`` under
> +:file:`/sys/devices/system/cpu/`.
> +
> +That directory contains a ``policyX`` subdirectory (where ``X`` represents an
> +integer number) for every policy object maintained by the ``CPUFreq`` core.
> +Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links
> +under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
> +that may be different from the one represented by ``X``) for all of the CPUs
> +associated with (or belonging to) the given policy.  The ``policyX`` directories
> +in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
> +attributes (files) to control ``CPUFreq`` behavior for the corresponding policy
> +objects (that is, for all of the CPUs associated with them).
> +
> +Some of those attributes are generic.  They are created by the ``CPUFreq`` core
> +and their behavior generally does not depend on what scaling driver is in use
> +and what scaling governor is attached to the given policy.  Some scaling drivers
> +also add driver-specific attributes to the policy directories in ``sysfs`` to
> +control policy-specific aspects of driver behavior.
> +
> +The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
> +are the following:
> +
> +``affected_cpus``
> +	List of online CPUs belonging to this policy (i.e. sharing the hardware
> +	performance scaling interface represented by the ``policyX`` policy
> +	object).
> +
> +``bios_limit``
> +	If the platform firmware (BIOS) tells the OS to apply an upper limit to
> +	CPU frequencies, that limit will be reported through this attribute (if
> +	present).
> +
> +	The existence of the limit may be a result of some (often unintentional)
> +	BIOS settings, restrictions coming from a service processor or another
> +	BIOS/HW-based mechanisms.
> +
> +	This does not cover ACPI thermal limitations which can be discovered
> +	through a generic thermal driver.
> +
> +	This attribute is not present if the scaling driver in use does not
> +	support it.
> +
> +``cpuinfo_max_freq``
> +	Maximum possible operating frequency the CPUs belonging to this policy
> +	can run at (in kHz).
> +
> +``cpuinfo_min_freq``
> +	Minimum possible operating frequency the CPUs belonging to this policy
> +	can run at (in kHz).
> +
> +``cpuinfo_transition_latency``
> +	The time it takes to switch the CPUs belonging to this policy from one
> +	P-state to another, in nanoseconds.
> +
> +	If unknown or if known to be so high that the scaling driver does not
> +	work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
> +	will be returned by reads from this attribute.
> +
> +``related_cpus``
> +	List of all (online and offline) CPUs belonging to this policy.
> +
> +``scaling_available_governors``
> +	List of ``CPUFreq`` scaling governors present in the kernel that can
> +	be attached to this policy or (if the ``intel_pstate`` scaling driver is
> +	in use) list of scaling algorithms provided by the driver that can be
> +	applied to this policy.
> +
> +	[Note that some governors are modular and it may be necessary to load a
> +	kernel module for the governor held by it to become available and be
> +	listed by this attribute.]
> +
> +``scaling_cur_freq``
> +	Current frequency of all of the CPUs belonging to this policy (in kHz).
> +
> +	For the majority of scaling drivers, this is the frequency of the last
> +	P-state requested by the driver from the hardware using the scaling
> +	interface provided by it, which may or may not reflect the frequency
> +	the CPU is actually running at (due to hardware design and other
> +	limitations).
> +
> +	Some scaling drivers (e.g. ``intel_pstate``) attempt to provide
> +	information more precisely reflecting the current CPU frequency through
> +	this attribute, but that still may not be the exact current CPU
> +	frequency as seen by the hardware at the moment.
> +
> +``scaling_driver``
> +	The scaling driver currently in use.
> +
> +``scaling_governor``
> +	The scaling governor currently attached to this policy or (if the
> +	``intel_pstate`` scaling driver is in use) the scaling algorithm
> +	provided by the driver that is currently applied to this policy.
> +
> +	This attribute is read-write and writing to it will cause a new scaling
> +	governor to be attached to this policy or a new scaling algorithm
> +	provided by the scaling driver to be applied to it (in the
> +	``intel_pstate`` case), as indicated by the string written to this
> +	attribute (which must be one of the names listed by the
> +	``scaling_available_governors`` attribute described above).
> +
> +``scaling_max_freq``
> +	Maximum frequency the CPUs belonging to this policy are allowed to be
> +	running at (in kHz).
> +
> +	This attribute is read-write and writing a string representing an
> +	integer to it will cause a new limit to be set (it must not be lower
> +	than the value of the ``scaling_min_freq`` attribute).
> +
> +``scaling_min_freq``
> +	Minimum frequency the CPUs belonging to this policy are allowed to be
> +	running at (in kHz).
> +
> +	This attribute is read-write and writing a string representing a
> +	non-negative integer to it will cause a new limit to be set (it must not
> +	be higher than the value of the ``scaling_max_freq`` attribute).
> +
> +``scaling_setspeed``
> +	This attribute is functional only if the `userspace`_ scaling governor
> +	is attached to the given policy.
> +
> +	It returns the last frequency requested by the governor (in kHz) or can
> +	be written to in order to set a new frequency for the policy.
> +
> +
> +Generic Scaling Governors
> +=========================
> +
> +``CPUFreq`` provides generic scaling governors that can be used with all
> +scaling drivers.  As stated before, each of them implements a single, possibly
> +parametrized, performance scaling algorithm.
> +
> +Scaling governors are attached to policy objects and different policy objects
> +can be handled by different scaling governors at the same time (although that
> +may lead to suboptimal results in some cases).
> +
> +The scaling governor for a given policy object can be changed at any time with
> +the help of the ``scaling_governor`` policy attribute in ``sysfs``.
> +
> +Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
> +algorithms implemented by them.  Those attributes, referred to as governor
> +tunables, can be either global (system-wide) or per-policy, depending on the
> +scaling driver in use.  If the driver requires governor tunables to be
> +per-policy, they are located in a subdirectory of each policy directory.
> +Otherwise, they are located in a subdirectory under
> +:file:`/sys/devices/system/cpu/cpufreq/`.  In either case the name of the
> +subdirectory containing the governor tunables is the name of the governor
> +providing them.
> +
> +``performance``
> +---------------
> +
> +When attached to a policy object, this governor causes the highest frequency,
> +within the ``scaling_max_freq`` policy limit, to be requested for that policy.
> +
> +The request is made once at that time the governor for the policy is set to
> +``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
> +policy limits change after that.
> +
> +``powersave``
> +-------------
> +
> +When attached to a policy object, this governor causes the lowest frequency,
> +within the ``scaling_min_freq`` policy limit, to be requested for that policy.
> +
> +The request is made once at that time the governor for the policy is set to
> +``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
> +policy limits change after that.
> +
> +``userspace``
> +-------------
> +
> +This governor does not do anything by itself.  Instead, it allows user space
> +to set the CPU frequency for the policy it is attached to by writing to the
> +``scaling_setspeed`` attribute of that policy.
> +
> +``schedutil``
> +-------------
> +
> +This governor uses CPU utilization data available from the CPU scheduler.  It
> +generally is regarded as a part of the CPU scheduler, so it can access the
> +scheduler's internal data structures directly.
> +
> +It runs entirely in scheduler context, although in some cases it may need to
> +invoke the scaling driver asynchronously when it decides that the CPU frequency
> +should be changed for a given policy (that depends on whether or not the driver
> +is capable of changing the CPU frequency from scheduler context).
> +
> +The actions of this governor for a particular CPU depend on the scheduling class
> +invoking its utilization update callback for that CPU.  If it is invoked by the
> +RT or deadline scheduling classes, the governor will increase the frequency to
> +the allowed maximum (that is, the ``scaling_max_freq`` policy limit).  In turn,
> +if it is invoked by the CFS scheduling class, the governor will use the
> +Per-Entity Load Tracking (PELT) metric for the root control group of the
> +given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_
> +LWN.net article for a description of the PELT mechanism).  Then, the new
> +CPU frequency to apply is computed in accordance with the formula
> +
> +	f = 1.25 * ``f_0`` * ``util`` / ``max``
> +
> +where ``util`` is the PELT number, ``max`` is the theoretical maximum of
> +``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
> +policy (if the PELT number is frequency-invariant), or the current CPU frequency
> +(otherwise).
> +
> +This governor also employs a mechanism allowing it to temporarily bump up the
> +CPU frequency for tasks that have been waiting on I/O most recently, called
> +"IO-wait boosting".  That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
> +is passed by the scheduler to the governor callback which causes the frequency
> +to go up to the allowed maximum immediately and then draw back to the value
> +returned by the above formula over time.
> +
> +This governor exposes only one tunable:
> +
> +``rate_limit_us``
> +	Minimum time (in microseconds) that has to pass between two consecutive
> +	runs of governor computations (default: 1000 times the scaling driver's
> +	transition latency).
> +
> +	The purpose of this tunable is to reduce the scheduler context overhead
> +	of the governor which might be excessive without it.
> +
> +This governor generally is regarded as a replacement for the older `ondemand`_
> +and `conservative`_ governors (described below), as it is simpler and more
> +tightly integrated with the CPU scheduler, its overhead in terms of CPU context
> +switches and similar is less significant, and it uses the scheduler's own CPU
> +utilization metric, so in principle its decisions should not contradict the
> +decisions made by the other parts of the scheduler.
> +
> +``ondemand``
> +------------
> +
> +This governor uses CPU load as a CPU frequency selection metric.
> +
> +In order to estimate the current CPU load, it measures the time elapsed between
> +consecutive invocations of its worker routine and computes the fraction of that
> +time in which the given CPU was not idle.  The ratio of the non-idle (active)
> +time to the total CPU time is taken as an estimate of the load.
> +
> +If this governor is attached to a policy shared by multiple CPUs, the load is
> +estimated for all of them and the greatest result is taken as the load estimate
> +for the entire policy.
> +
> +The worker routine of this governor has to run in process context, so it is
> +invoked asynchronously (via a workqueue) and CPU P-states are updated from
> +there if necessary.  As a result, the scheduler context overhead from this
> +governor is minimum, but it causes additional CPU context switches to happen
> +relatively often and the CPU P-state updates triggered by it can be relatively
> +irregular.  Also, it affects its own CPU load metric by running code that
> +reduces the CPU idle time (even though the CPU idle time is only reduced very
> +slightly by it).
> +
> +It generally selects CPU frequencies proportional to the estimated load, so that
> +the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of
> +1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute
> +corresponds to the load of 0, unless when the load exceeds a (configurable)
> +speedup threshold, in which case it will go straight for the highest frequency
> +it is allowed to use (the ``scaling_max_freq`` policy limit).
> +
> +This governor exposes the following tunables:
> +
> +``sampling_rate``
> +	This is how often the governor's worker routine should run, in
> +	microseconds.
> +
> +	Typically, it is set to values of the order of 10000 (10 ms).  Its
> +	default value is equal to the value of ``cpuinfo_transition_latency``
> +	for each policy this governor is attached to (but since the unit here
> +	is greater by 1000, this means that the time represented by
> +	``sampling_rate`` is 1000 times greater than the transition latency by
> +	default).
> +
> +	If this tunable is per-policy, the following shell command sets the time
> +	represented by it to be 750 times as high as the transition latency::
> +
> +	# echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
> +
> +
> +``min_sampling_rate``
> +	The minimum value of ``sampling_rate``.
> +
> +	Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
> +	:c:data:`tick_nohz_active` are both set or to 20 times the value of
> +	:c:data:`jiffies` in microseconds otherwise.
> +
> +``up_threshold``
> +	If the estimated CPU load is above this value (in percent), the governor
> +	will set the frequency to the maximum value allowed for the policy.
> +	Otherwise, the selected frequency will be proportional to the estimated
> +	CPU load.
> +
> +``ignore_nice_load``
> +	If set to 1 (default 0), it will cause the CPU load estimation code to
> +	treat the CPU time spent on executing tasks with "nice" levels greater
> +	than 0 as CPU idle time.
> +
> +	This may be useful if there are tasks in the system that should not be
> +	taken into account when deciding what frequency to run the CPUs at.
> +	Then, to make that happen it is sufficient to increase the "nice" level
> +	of those tasks above 0 and set this attribute to 1.
> +
> +``sampling_down_factor``
> +	Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
> +	the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
> +
> +	This causes the next execution of the governor's worker routine (after
> +	setting the frequency to the allowed maximum) to be delayed, so the
> +	frequency stays at the maximum level for a longer time.
> +
> +	Frequency fluctuations in some bursty workloads may be avoided this way
> +	at the cost of additional energy spent on maintaining the maximum CPU
> +	capacity.
> +
> +``powersave_bias``
> +	Reduction factor to apply to the original frequency target of the
> +	governor (including the maximum value used when the ``up_threshold``
> +	value is exceeded by the estimated CPU load) or sensitivity threshold
> +	for the AMD frequency sensitivity powersave bias driver
> +	(:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000
> +	inclusive.
> +
> +	If the AMD frequency sensitivity powersave bias driver is not loaded,
> +	the effective frequency to apply is given by
> +
> +		f * (1 - ``powersave_bias`` / 1000)
> +
> +	where f is the governor's original frequency target.  The default value
> +	of this attribute is 0 in that case.
> +
> +	If the AMD frequency sensitivity powersave bias driver is loaded, the
> +	value of this attribute is 400 by default and it is used in a different
> +	way.
> +
> +	On Family 16h (and later) AMD processors there is a mechanism to get a
> +	measured workload sensitivity, between 0 and 100% inclusive, from the
> +	hardware.  That value can be used to estimate how the performance of the
> +	workload running on a CPU will change in response to frequency changes.
> +
> +	The performance of a workload with the sensitivity of 0 (memory-bound or
> +	IO-bound) is not expected to increase at all as a result of increasing
> +	the CPU frequency, whereas workloads with the sensitivity of 100%
> +	(CPU-bound) are expected to perform much better if the CPU frequency is
> +	increased.
> +
> +	If the workload sensitivity is less than the threshold represented by
> +	the ``powersave_bias`` value, the sensitivity powersave bias driver
> +	will cause the governor to select a frequency lower than its original
> +	target, so as to avoid over-provisioning workloads that will not benefit
> +	from running at higher CPU frequencies.
> +
> +``conservative``
> +----------------
> +
> +This governor uses CPU load as a CPU frequency selection metric.
> +
> +It estimates the CPU load in the same way as the `ondemand`_ governor described
> +above, but the CPU frequency selection algorithm implemented by it is different.
> +
> +Namely, it avoids changing the frequency significantly over short time intervals
> +which may not be suitable for systems with limited power supply capacity (e.g.
> +battery-powered).  To achieve that, it changes the frequency in relatively
> +small steps, one step at a time, up or down - depending on whether or not a
> +(configurable) threshold has been exceeded by the estimated CPU load.
> +
> +This governor exposes the following tunables:
> +
> +``freq_step``
> +	Frequency step in percent of the maximum frequency the governor is
> +	allowed to set (the ``scaling_max_freq`` policy limit), between 0 and
> +	100 (5 by default).
> +
> +	This is how much the frequency is allowed to change in one go.  Setting
> +	it to 0 will cause the default frequency step (5 percent) to be used
> +	and setting it to 100 effectively causes the governor to periodically
> +	switch the frequency between the ``scaling_min_freq`` and
> +	``scaling_max_freq`` policy limits.
> +
> +``down_threshold``
> +	Threshold value (in percent, 20 by default) used to determine the
> +	frequency change direction.
> +
> +	If the estimated CPU load is greater than this value, the frequency will
> +	go up (by ``freq_step``).  If the load is less than this value (and the
> +	``sampling_down_factor`` mechanism is not in effect), the frequency will
> +	go down.  Otherwise, the frequency will not be changed.
> +
> +``sampling_down_factor``
> +	Frequency decrease deferral factor, between 1 (default) and 10
> +	inclusive.
> +
> +	It effectively causes the frequency to go down ``sampling_down_factor``
> +	times slower than it ramps up.
> +
> +
> +Frequency Boost Support
> +=======================
> +
> +Background
> +----------
> +
> +Some processors support a mechanism to raise the operating frequency of some
> +cores in a multicore package temporarily (and above the sustainable frequency
> +threshold for the whole package) under certain conditions, for example if the
> +whole chip is not fully utilized and below its intended thermal or power budget.
> +
> +Different names are used by different vendors to refer to this functionality.
> +For Intel processors it is referred to as "Turbo Boost", AMD calls it
> +"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
> +As a rule, it also is implemented differently by different vendors.  The simple
> +term "frequency boost" is used here for brevity to refer to all of those
> +implementations.
> +
> +The frequency boost mechanism may be either hardware-based or software-based.
> +If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
> +made by the hardware (although in general it requires the hardware to be put
> +into a special state in which it can control the CPU frequency within certain
> +limits).  If it is software-based (e.g. on ARM), the scaling driver decides
> +whether or not to trigger boosting and when to do that.
> +
> +The ``boost`` File in ``sysfs``
> +-------------------------------
> +
> +This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
> +the "boost" setting for the whole system.  It is not present if the underlying
> +scaling driver does not support the frequency boost mechanism (or supports it,
> +but provides a driver-specific interface for controlling it, like
> +``intel_pstate``).
> +
> +If the value in this file is 1, the frequency boost mechanism is enabled.  This
> +means that either the hardware can be put into states in which it is able to
> +trigger boosting (in the hardware-based case), or the software is allowed to
> +trigger boosting (in the software-based case).  It does not mean that boosting
> +is actually in use at the moment on any CPUs in the system.  It only means a
> +permission to use the frequency boost mechanism (which still may never be used
> +for other reasons).
> +
> +If the value in this file is 0, the frequency boost mechanism is disabled and
> +cannot be used at all.
> +
> +The only values that can be written to this file are 0 and 1.
> +
> +Rationale for Boost Control Knob
> +--------------------------------
> +
> +The frequency boost mechanism is generally intended to help to achieve optimum
> +CPU performance on time scales below software resolution (e.g. below the
> +scheduler tick interval) and it is demonstrably suitable for many workloads, but
> +it may lead to problems in certain situations.
> +
> +For this reason, many systems make it possible to disable the frequency boost
> +mechanism in the platform firmware (BIOS) setup, but that requires the system to
> +be restarted for the setting to be adjusted as desired, which may not be
> +practical at least in some cases.  For example:
> +
> +  1. Boosting means overclocking the processor, although under controlled
> +     conditions.  Generally, the processor's energy consumption increases
> +     as a result of increasing its frequency and voltage, even temporarily.
> +     That may not be desirable on systems that switch to power sources of
> +     limited capacity, such as batteries, so the ability to disable the boost
> +     mechanism while the system is running may help there (but that depends on
> +     the workload too).
> +
> +  2. In some situations deterministic behavior is more important than
> +     performance or energy consumption (or both) and the ability to disable
> +     boosting while the system is running may be useful then.
> +
> +  3. To examine the impact of the frequency boost mechanism itself, it is useful
> +     to be able to run tests with and without boosting, preferably without
> +     restarting the system in the meantime.
> +
> +  4. Reproducible results are important when running benchmarks.  Since
> +     the boosting functionality depends on the load of the whole package,
> +     single-thread performance may vary because of it which may lead to
> +     unreproducible results sometimes.  That can be avoided by disabling the
> +     frequency boost mechanism before running benchmarks sensitive to that
> +     issue.
> +
> +Legacy AMD ``cpb`` Knob
> +-----------------------
> +
> +The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
> +the global ``boost`` one.  It is used for disabling/enabling the "Core
> +Performance Boost" feature of some AMD processors.
> +
> +If present, that knob is located in every ``CPUFreq`` policy directory in
> +``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
> +``cpb``, which indicates a more fine grained control interface.  The actual
> +implementation, however, works on the system-wide basis and setting that knob
> +for one policy causes the same value of it to be set for all of the other
> +policies at the same time.
> +
> +That knob is still supported on AMD processors that support its underlying
> +hardware feature, but it may be configured out of the kernel (via the
> +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global
> +``boost`` knob is present regardless.  Thus it is always possible use the
> +``boost`` knob instead of the ``cpb`` one which is highly recommended, as that
> +is more consistent with what all of the other systems do (and the ``cpb`` knob
> +may not be supported any more in the future).
> +
> +The ``cpb`` knob is never present for any processors without the underlying
> +hardware feature (e.g. all Intel ones), even if the
> +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set.
> +
> +
> +.. _Per-entity load tracking: https://lwn.net/Articles/531853/
> Index: linux-pm/Documentation/admin-guide/pm/index.rst
> ===================================================================
> --- /dev/null
> +++ linux-pm/Documentation/admin-guide/pm/index.rst
> @@ -0,0 +1,15 @@
> +================
> +Power Management
> +================
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   cpufreq
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> Index: linux-pm/Documentation/admin-guide/index.rst
> ===================================================================
> --- linux-pm.orig/Documentation/admin-guide/index.rst
> +++ linux-pm/Documentation/admin-guide/index.rst
> @@ -60,6 +60,7 @@ configure specific aspects of kernel beh
>     mono
>     java
>     ras
> +   pm/index
>  
>  .. only::  subproject and html
>  
> Index: linux-pm/Documentation/cpu-freq/boost.txt
> ===================================================================
> --- linux-pm.orig/Documentation/cpu-freq/boost.txt
> +++ /dev/null
> @@ -1,93 +0,0 @@
> -Processor boosting control
> -
> -	- information for users -
> -
> -Quick guide for the impatient:
> ---------------------
> -/sys/devices/system/cpu/cpufreq/boost
> -controls the boost setting for the whole system. You can read and write
> -that file with either "0" (boosting disabled) or "1" (boosting allowed).
> -Reading or writing 1 does not mean that the system is boosting at this
> -very moment, but only that the CPU _may_ raise the frequency at it's
> -discretion.
> ---------------------
> -
> -Introduction
> --------------
> -Some CPUs support a functionality to raise the operating frequency of
> -some cores in a multi-core package if certain conditions apply, mostly
> -if the whole chip is not fully utilized and below it's intended thermal
> -budget. The decision about boost disable/enable is made either at hardware
> -(e.g. x86) or software (e.g ARM).
> -On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
> -in technical documentation "Core performance boost". In Linux we use
> -the term "boost" for convenience.
> -
> -Rationale for disable switch
> -----------------------------
> -
> -Though the idea is to just give better performance without any user
> -intervention, sometimes the need arises to disable this functionality.
> -Most systems offer a switch in the (BIOS) firmware to disable the
> -functionality at all, but a more fine-grained and dynamic control would
> -be desirable:
> -1. While running benchmarks, reproducible results are important. Since
> -   the boosting functionality depends on the load of the whole package,
> -   single thread performance can vary. By explicitly disabling the boost
> -   functionality at least for the benchmark's run-time the system will run
> -   at a fixed frequency and results are reproducible again.
> -2. To examine the impact of the boosting functionality it is helpful
> -   to do tests with and without boosting.
> -3. Boosting means overclocking the processor, though under controlled
> -   conditions. By raising the frequency and the voltage the processor
> -   will consume more power than without the boosting, which may be
> -   undesirable for instance for mobile users. Disabling boosting may
> -   save power here, though this depends on the workload.
> -
> -
> -User controlled switch
> -----------------------
> -
> -To allow the user to toggle the boosting functionality, the cpufreq core
> -driver exports a sysfs knob to enable or disable it. There is a file:
> -/sys/devices/system/cpu/cpufreq/boost
> -which can either read "0" (boosting disabled) or "1" (boosting enabled).
> -The file is exported only when cpufreq driver supports boosting.
> -Explicitly changing the permissions and writing to that file anyway will
> -return EINVAL.
> -
> -On supported CPUs one can write either a "0" or a "1" into this file.
> -This will either disable the boost functionality on all cores in the
> -whole system (0) or will allow the software or hardware to boost at will
> -(1).
> -
> -Writing a "1" does not explicitly boost the system, but just allows the
> -CPU to boost at their discretion. Some implementations take external
> -factors like the chip's temperature into account, so boosting once does
> -not necessarily mean that it will occur every time even using the exact
> -same software setup.
> -
> -
> -AMD legacy cpb switch
> ----------------------
> -The AMD powernow-k8 driver used to support a very similar switch to
> -disable or enable the "Core Performance Boost" feature of some AMD CPUs.
> -This switch was instantiated in each CPU's cpufreq directory
> -(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
> -Though the per CPU existence hints at a more fine grained control, the
> -actual implementation only supported a system-global switch semantics,
> -which was simply reflected into each CPU's file. Writing a 0 or 1 into it
> -would pull the other CPUs to the same state.
> -For compatibility reasons this file and its behavior is still supported
> -on AMD CPUs, though it is now protected by a config switch
> -(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
> -even with the config option set.
> -This functionality is considered legacy and will be removed in some future
> -kernel version.
> -
> -More fine grained boosting control
> -----------------------------------
> -
> -Technically it is possible to switch the boosting functionality at least
> -on a per package basis, for some CPUs even per core. Currently the driver
> -does not support it, but this may be implemented in the future.
> Index: linux-pm/Documentation/cpu-freq/governors.txt
> ===================================================================
> --- linux-pm.orig/Documentation/cpu-freq/governors.txt
> +++ /dev/null
> @@ -1,301 +0,0 @@
> -     CPU frequency and voltage scaling code in the Linux(TM) kernel
> -
> -
> -		         L i n u x    C P U F r e q
> -
> -		      C P U F r e q   G o v e r n o r s
> -
> -		   - information for users and developers -
> -
> -
> -		    Dominik Brodowski  <linux@...do.de>
> -            some additions and corrections by Nico Golde <nico@...lde.de>
> -		Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> -		   Viresh Kumar <viresh.kumar@...aro.org>
> -
> -
> -
> -   Clock scaling allows you to change the clock speed of the CPUs on the
> -    fly. This is a nice method to save battery power, because the lower
> -            the clock speed, the less power the CPU consumes.
> -
> -
> -Contents:
> ----------
> -1.   What is a CPUFreq Governor?
> -
> -2.   Governors In the Linux Kernel
> -2.1  Performance
> -2.2  Powersave
> -2.3  Userspace
> -2.4  Ondemand
> -2.5  Conservative
> -2.6  Schedutil
> -
> -3.   The Governor Interface in the CPUfreq Core
> -
> -4.   References
> -
> -
> -1. What Is A CPUFreq Governor?
> -==============================
> -
> -Most cpufreq drivers (except the intel_pstate and longrun) or even most
> -cpu frequency scaling algorithms only allow the CPU frequency to be set
> -to predefined fixed values.  In order to offer dynamic frequency
> -scaling, the cpufreq core must be able to tell these drivers of a
> -"target frequency". So these specific drivers will be transformed to
> -offer a "->target/target_index/fast_switch()" call instead of the
> -"->setpolicy()" call. For set_policy drivers, all stays the same,
> -though.
> -
> -How to decide what frequency within the CPUfreq policy should be used?
> -That's done using "cpufreq governors".
> -
> -Basically, it's the following flow graph:
> -
> -CPU can be set to switch independently	 |	   CPU can only be set
> -      within specific "limits"		 |       to specific frequencies
> -
> -                                 "CPUfreq policy"
> -		consists of frequency limits (policy->{min,max})
> -  		     and CPUfreq governor to be used
> -			 /		      \
> -			/		       \
> -		       /		       the cpufreq governor decides
> -		      /			       (dynamically or statically)
> -		     /			       what target_freq to set within
> -		    /			       the limits of policy->{min,max}
> -		   /			            \
> -		  /				     \
> -	Using the ->setpolicy call,		 Using the ->target/target_index/fast_switch call,
> -	    the limits and the			  the frequency closest
> -	     "policy" is set.			  to target_freq is set.
> -						  It is assured that it
> -						  is within policy->{min,max}
> -
> -
> -2. Governors In the Linux Kernel
> -================================
> -
> -2.1 Performance
> ----------------
> -
> -The CPUfreq governor "performance" sets the CPU statically to the
> -highest frequency within the borders of scaling_min_freq and
> -scaling_max_freq.
> -
> -
> -2.2 Powersave
> --------------
> -
> -The CPUfreq governor "powersave" sets the CPU statically to the
> -lowest frequency within the borders of scaling_min_freq and
> -scaling_max_freq.
> -
> -
> -2.3 Userspace
> --------------
> -
> -The CPUfreq governor "userspace" allows the user, or any userspace
> -program running with UID "root", to set the CPU to a specific frequency
> -by making a sysfs file "scaling_setspeed" available in the CPU-device
> -directory.
> -
> -
> -2.4 Ondemand
> -------------
> -
> -The CPUfreq governor "ondemand" sets the CPU frequency depending on the
> -current system load. Load estimation is triggered by the scheduler
> -through the update_util_data->func hook; when triggered, cpufreq checks
> -the CPU-usage statistics over the last period and the governor sets the
> -CPU accordingly.  The CPU must have the capability to switch the
> -frequency very quickly.
> -
> -Sysfs files:
> -
> -* sampling_rate:
> -
> -  Measured in uS (10^-6 seconds), this is how often you want the kernel
> -  to look at the CPU usage and to make decisions on what to do about the
> -  frequency.  Typically this is set to values of around '10000' or more.
> -  It's default value is (cmp. with users-guide.txt): transition_latency
> -  * 1000.  Be aware that transition latency is in ns and sampling_rate
> -  is in us, so you get the same sysfs value by default.  Sampling rate
> -  should always get adjusted considering the transition latency to set
> -  the sampling rate 750 times as high as the transition latency in the
> -  bash (as said, 1000 is default), do:
> -
> -  $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
> -
> -* sampling_rate_min:
> -
> -  The sampling rate is limited by the HW transition latency:
> -  transition_latency * 100
> -
> -  Or by kernel restrictions:
> -  - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
> -  - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is
> -    used, the limits depend on the CONFIG_HZ option:
> -    HZ=1000: min=20000us  (20ms)
> -    HZ=250:  min=80000us  (80ms)
> -    HZ=100:  min=200000us (200ms)
> -
> -  The highest value of kernel and HW latency restrictions is shown and
> -  used as the minimum sampling rate.
> -
> -* up_threshold:
> -
> -  This defines what the average CPU usage between the samplings of
> -  'sampling_rate' needs to be for the kernel to make a decision on
> -  whether it should increase the frequency.  For example when it is set
> -  to its default value of '95' it means that between the checking
> -  intervals the CPU needs to be on average more than 95% in use to then
> -  decide that the CPU frequency needs to be increased.
> -
> -* ignore_nice_load:
> -
> -  This parameter takes a value of '0' or '1'. When set to '0' (its
> -  default), all processes are counted towards the 'cpu utilisation'
> -  value.  When set to '1', the processes that are run with a 'nice'
> -  value will not count (and thus be ignored) in the overall usage
> -  calculation.  This is useful if you are running a CPU intensive
> -  calculation on your laptop that you do not care how long it takes to
> -  complete as you can 'nice' it and prevent it from taking part in the
> -  deciding process of whether to increase your CPU frequency.
> -
> -* sampling_down_factor:
> -
> -  This parameter controls the rate at which the kernel makes a decision
> -  on when to decrease the frequency while running at top speed. When set
> -  to 1 (the default) decisions to reevaluate load are made at the same
> -  interval regardless of current clock speed. But when set to greater
> -  than 1 (e.g. 100) it acts as a multiplier for the scheduling interval
> -  for reevaluating load when the CPU is at its top speed due to high
> -  load. This improves performance by reducing the overhead of load
> -  evaluation and helping the CPU stay at its top speed when truly busy,
> -  rather than shifting back and forth in speed. This tunable has no
> -  effect on behavior at lower speeds/lower CPU loads.
> -
> -* powersave_bias:
> -
> -  This parameter takes a value between 0 to 1000. It defines the
> -  percentage (times 10) value of the target frequency that will be
> -  shaved off of the target. For example, when set to 100 -- 10%, when
> -  ondemand governor would have targeted 1000 MHz, it will target
> -  1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0
> -  (disabled) by default.
> -
> -  When AMD frequency sensitivity powersave bias driver --
> -  drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter
> -  defines the workload frequency sensitivity threshold in which a lower
> -  frequency is chosen instead of ondemand governor's original target.
> -  The frequency sensitivity is a hardware reported (on AMD Family 16h
> -  Processors and above) value between 0 to 100% that tells software how
> -  the performance of the workload running on a CPU will change when
> -  frequency changes. A workload with sensitivity of 0% (memory/IO-bound)
> -  will not perform any better on higher core frequency, whereas a
> -  workload with sensitivity of 100% (CPU-bound) will perform better
> -  higher the frequency. When the driver is loaded, this is set to 400 by
> -  default -- for CPUs running workloads with sensitivity value below
> -  40%, a lower frequency is chosen. Unloading the driver or writing 0
> -  will disable this feature.
> -
> -
> -2.5 Conservative
> -----------------
> -
> -The CPUfreq governor "conservative", much like the "ondemand"
> -governor, sets the CPU frequency depending on the current usage.  It
> -differs in behaviour in that it gracefully increases and decreases the
> -CPU speed rather than jumping to max speed the moment there is any load
> -on the CPU. This behaviour is more suitable in a battery powered
> -environment.  The governor is tweaked in the same manner as the
> -"ondemand" governor through sysfs with the addition of:
> -
> -* freq_step:
> -
> -  This describes what percentage steps the cpu freq should be increased
> -  and decreased smoothly by.  By default the cpu frequency will increase
> -  in 5% chunks of your maximum cpu frequency.  You can change this value
> -  to anywhere between 0 and 100 where '0' will effectively lock your CPU
> -  at a speed regardless of its load whilst '100' will, in theory, make
> -  it behave identically to the "ondemand" governor.
> -
> -* down_threshold:
> -
> -  Same as the 'up_threshold' found for the "ondemand" governor but for
> -  the opposite direction.  For example when set to its default value of
> -  '20' it means that if the CPU usage needs to be below 20% between
> -  samples to have the frequency decreased.
> -
> -* sampling_down_factor:
> -
> -  Similar functionality as in "ondemand" governor.  But in
> -  "conservative", it controls the rate at which the kernel makes a
> -  decision on when to decrease the frequency while running in any speed.
> -  Load for frequency increase is still evaluated every sampling rate.
> -
> -
> -2.6 Schedutil
> --------------
> -
> -The "schedutil" governor aims at better integration with the Linux
> -kernel scheduler.  Load estimation is achieved through the scheduler's
> -Per-Entity Load Tracking (PELT) mechanism, which also provides
> -information about the recent load [1].  This governor currently does
> -load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks
> -are always run at the highest frequency.  Unlike all the other
> -governors, the code is located under the kernel/sched/ directory.
> -
> -Sysfs files:
> -
> -* rate_limit_us:
> -
> -  This contains a value in microseconds. The governor waits for
> -  rate_limit_us time before reevaluating the load again, after it has
> -  evaluated the load once.
> -
> -For an in-depth comparison with the other governors refer to [2].
> -
> -
> -3. The Governor Interface in the CPUfreq Core
> -=============================================
> -
> -A new governor must register itself with the CPUfreq core using
> -"cpufreq_register_governor". The struct cpufreq_governor, which has to
> -be passed to that function, must contain the following values:
> -
> -governor->name - A unique name for this governor.
> -governor->owner - .THIS_MODULE for the governor module (if appropriate).
> -
> -plus a set of hooks to the functions implementing the governor's logic.
> -
> -The CPUfreq governor may call the CPU processor driver using one of
> -these two functions:
> -
> -int cpufreq_driver_target(struct cpufreq_policy *policy,
> -                                 unsigned int target_freq,
> -                                 unsigned int relation);
> -
> -int __cpufreq_driver_target(struct cpufreq_policy *policy,
> -                                   unsigned int target_freq,
> -                                   unsigned int relation);
> -
> -target_freq must be within policy->min and policy->max, of course.
> -What's the difference between these two functions? When your governor is
> -in a direct code path of a call to governor callbacks, like
> -governor->start(), the policy->rwsem is still held in the cpufreq core,
> -and there's no need to lock it again (in fact, this would cause a
> -deadlock). So use __cpufreq_driver_target only in these cases. In all
> -other cases (for example, when there's a "daemonized" function that
> -wakes up every second), use cpufreq_driver_target to take policy->rwsem
> -before the command is passed to the cpufreq driver.
> -
> -4. References
> -=============
> -
> -[1] Per-entity load tracking: https://lwn.net/Articles/531853/
> -[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/
> -
> Index: linux-pm/Documentation/cpu-freq/user-guide.txt
> ===================================================================
> --- linux-pm.orig/Documentation/cpu-freq/user-guide.txt
> +++ /dev/null
> @@ -1,226 +0,0 @@
> -     CPU frequency and voltage scaling code in the Linux(TM) kernel
> -
> -
> -		         L i n u x    C P U F r e q
> -
> -			     U S E R   G U I D E
> -
> -
> -		    Dominik Brodowski  <linux@...do.de>
> -
> -
> -
> -   Clock scaling allows you to change the clock speed of the CPUs on the
> -    fly. This is a nice method to save battery power, because the lower
> -            the clock speed, the less power the CPU consumes.
> -
> -
> -Contents:
> ----------
> -1. Supported Architectures and Processors
> -1.1 ARM and ARM64
> -1.2 x86
> -1.3 sparc64
> -1.4 ppc
> -1.5 SuperH
> -1.6 Blackfin
> -
> -2. "Policy" / "Governor"?
> -2.1 Policy
> -2.2 Governor
> -
> -3. How to change the CPU cpufreq policy and/or speed
> -3.1 Preferred interface: sysfs
> -
> -
> -
> -1. Supported Architectures and Processors
> -=========================================
> -
> -1.1 ARM and ARM64
> ------------------
> -
> -Almost all ARM and ARM64 platforms support CPU frequency scaling.
> -
> -1.2 x86
> --------
> -
> -The following processors for the x86 architecture are supported by cpufreq:
> -
> -AMD Elan - SC400, SC410
> -AMD mobile K6-2+
> -AMD mobile K6-3+
> -AMD mobile Duron
> -AMD mobile Athlon
> -AMD Opteron
> -AMD Athlon 64
> -Cyrix Media GXm
> -Intel mobile PIII and Intel mobile PIII-M on certain chipsets
> -Intel Pentium 4, Intel Xeon
> -Intel Pentium M (Centrino)
> -National Semiconductors Geode GX
> -Transmeta Crusoe
> -Transmeta Efficeon
> -VIA Cyrix 3 / C3
> -various processors on some ACPI 2.0-compatible systems [*]
> -And many more
> -
> -[*] Only if "ACPI Processor Performance States" are available
> -to the ACPI<->BIOS interface.
> -
> -
> -1.3 sparc64
> ------------
> -
> -The following processors for the sparc64 architecture are supported by
> -cpufreq:
> -
> -UltraSPARC-III
> -
> -
> -1.4 ppc
> --------
> -
> -Several "PowerBook" and "iBook2" notebooks are supported.
> -
> -
> -1.5 SuperH
> -----------
> -
> -All SuperH processors supporting rate rounding through the clock
> -framework are supported by cpufreq.
> -
> -1.6 Blackfin
> -------------
> -
> -The following Blackfin processors are supported by cpufreq:
> -
> -BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher
> -BF531, BF532, BF533, Rev 0.3 or higher
> -BF534, BF536, BF537, Rev 0.2 or higher
> -BF561, Rev 0.3 or higher
> -BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher
> -
> -
> -2. "Policy" / "Governor" ?
> -==========================
> -
> -Some CPU frequency scaling-capable processor switch between various
> -frequencies and operating voltages "on the fly" without any kernel or
> -user involvement. This guarantees very fast switching to a frequency
> -which is high enough to serve the user's needs, but low enough to save
> -power.
> -
> -
> -2.1 Policy
> -----------
> -
> -On these systems, all you can do is select the lower and upper
> -frequency limit as well as whether you want more aggressive
> -power-saving or more instantly available processing power.
> -
> -
> -2.2 Governor
> -------------
> -
> -On all other cpufreq implementations, these boundaries still need to
> -be set. Then, a "governor" must be selected. Such a "governor" decides
> -what speed the processor shall run within the boundaries. One such
> -"governor" is the "userspace" governor. This one allows the user - or
> -a yet-to-implement userspace program - to decide what specific speed
> -the processor shall run at.
> -
> -
> -3. How to change the CPU cpufreq policy and/or speed
> -====================================================
> -
> -3.1 Preferred Interface: sysfs
> -------------------------------
> -
> -The preferred interface is located in the sysfs filesystem. If you
> -mounted it at /sys, the cpufreq interface is located in a subdirectory
> -"cpufreq" within the cpu-device directory
> -(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU).
> -
> -affected_cpus :			List of Online CPUs that require software
> -				coordination of frequency.
> -
> -cpuinfo_cur_freq :		Current frequency of the CPU as obtained from
> -				the hardware, in KHz. This is the frequency
> -				the CPU actually runs at.
> -
> -cpuinfo_min_freq :		this file shows the minimum operating
> -				frequency the processor can run at(in kHz) 
> -
> -cpuinfo_max_freq :		this file shows the maximum operating
> -				frequency the processor can run at(in kHz) 
> -
> -cpuinfo_transition_latency	The time it takes on this CPU to
> -				switch between two frequencies in nano
> -				seconds. If unknown or known to be
> -				that high that the driver does not
> -				work with the ondemand governor, -1
> -				(CPUFREQ_ETERNAL) will be returned.
> -				Using this information can be useful
> -				to choose an appropriate polling
> -				frequency for a kernel governor or
> -				userspace daemon. Make sure to not
> -				switch the frequency too often
> -				resulting in performance loss.
> -
> -related_cpus :			List of Online + Offline CPUs that need software
> -				coordination of frequency.
> -
> -scaling_available_frequencies : List of available frequencies, in KHz.
> -
> -scaling_available_governors :	this file shows the CPUfreq governors
> -				available in this kernel. You can see the
> -				currently activated governor in
> -
> -scaling_cur_freq :		Current frequency of the CPU as determined by
> -				the governor and cpufreq core, in KHz. This is
> -				the frequency the kernel thinks the CPU runs
> -				at.
> -
> -scaling_driver :		this file shows what cpufreq driver is
> -				used to set the frequency on this CPU
> -
> -scaling_governor,		and by "echoing" the name of another
> -				governor you can change it. Please note
> -				that some governors won't load - they only
> -				work on some specific architectures or
> -				processors.
> -
> -scaling_min_freq and
> -scaling_max_freq		show the current "policy limits" (in
> -				kHz). By echoing new values into these
> -				files, you can change these limits.
> -				NOTE: when setting a policy you need to
> -				first set scaling_max_freq, then
> -				scaling_min_freq.
> -
> -scaling_setspeed		This can be read to get the currently programmed
> -				value by the governor. This can be written to
> -				change the current frequency for a group of
> -				CPUs, represented by a policy. This is supported
> -				currently only by the userspace governor.
> -
> -bios_limit :			If the BIOS tells the OS to limit a CPU to
> -				lower frequencies, the user can read out the
> -				maximum available frequency from this file.
> -				This typically can happen through (often not
> -				intended) BIOS settings, restrictions
> -				triggered through a service processor or other
> -				BIOS/HW based implementations.
> -				This does not cover thermal ACPI limitations
> -				which can be detected through the generic
> -				thermal driver.
> -
> -If you have selected the "userspace" governor which allows you to
> -set the CPU operating frequency to a specific value, you can read out
> -the current frequency in
> -
> -scaling_setspeed.		By "echoing" a new frequency into this
> -				you can change the speed of the CPU,
> -				but only within the limits of
> -				scaling_min_freq and scaling_max_freq.
> Index: linux-pm/Documentation/cpu-freq/index.txt
> ===================================================================
> --- linux-pm.orig/Documentation/cpu-freq/index.txt
> +++ linux-pm/Documentation/cpu-freq/index.txt
> @@ -21,8 +21,6 @@ Documents in this directory:
>  
>  amd-powernow.txt -	AMD powernow driver specific file.
>  
> -boost.txt -		Frequency boosting support.
> -
>  core.txt	-	General description of the CPUFreq core and
>  			of CPUFreq notifiers.
>  
> @@ -32,17 +30,12 @@ cpufreq-nforce2.txt -	nVidia nForce2 pla
>  
>  cpufreq-stats.txt -	General description of sysfs cpufreq stats.
>  
> -governors.txt	-	What are cpufreq governors and how to
> -			implement them?
> -
>  index.txt	-	File index, Mailing list and Links (this document)
>  
>  intel-pstate.txt -	Intel pstate cpufreq driver specific file.
>  
>  pcc-cpufreq.txt -	PCC cpufreq driver specific file.
>  
> -user-guide.txt	-	User Guide to CPUFreq
> -
>  
>  Mailing List
>  ------------
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ