[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54DB1A17.9000706@linux.vnet.ibm.com>
Date: Wed, 11 Feb 2015 14:30:07 +0530
From: Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To: Steven Noonan <steven@...inklabs.net>
CC: Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, jacob.jun.pan@...el.com,
Arjan van de Ven <arjan@...ux.intel.com>,
Linux Kernel mailing List <linux-kernel@...r.kernel.org>,
Frédéric Weisbecker <fweisbec@...il.com>,
frederic@...nel.org, Daniel Lezcano <daniel.lezcano@...aro.org>,
Amit Kucheria <amit.kucheria@...aro.org>,
Eduardo Valentin <edubezval@...il.com>,
Viresh Kumar <viresh.kumar@...aro.org>, rui.zhang@...el.com
Subject: Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to
use bandwidth control mechanism
On 02/09/2015 11:44 PM, Steven Noonan wrote:
> On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan <steven@...inklabs.net> wrote:
>> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>> <preeti@...ux.vnet.ibm.com> wrote:
>>> Hi Steven,
>>>
>>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>> <preeti@...ux.vnet.ibm.com> wrote:
>>>>> The powerclamp driver injects idle periods to stay within the thermal constraints.
>>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>>>> instruction. This behavior of fake idle can confuse the other kernel subsystems.
>>>>> For instance it calls into the nohz tick handlers, which are meant to be called
>>>>> only by the idle thread. It sets the state of the tick in each cpu to idle and
>>>>> stops the tick, when there are tasks on the runqueue. As a result the callers of
>>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while the
>>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The outcome
>>>>> may be inconsistency in the scheduler/nohz states which can lead to serious
>>>>> consequences. One of them was reported on this thread:
>>>>> https://lkml.org/lkml/2014/12/11/365.
>>>>>
>>>>> Thomas posted out a patch to disable the powerclamp driver from calling into the
>>>>> tick nohz code which has taken care of the above regression for the moment. However
>>>>> powerclamp driver as a result, will not be able to inject idle periods due to the
>>>>> presence of periodic ticks. With the current design of fake idle, we cannot move
>>>>> towards a better solution.
>>>>> https://lkml.org/lkml/2014/12/18/169
>>>>>
>>>>> This patch aims at removing the concept of fake idle and instead makes the cpus
>>>>> truly idle by throttling the runqueues during the idle injection periods. The situation
>>>>> is in fact very similar to throttling of cfs_rqs when they exceed their bandwidths.
>>>>> The idle injection metrics can be mapped to the bandwidth control metrics 'quota' and
>>>>> 'period' to achieve the same result. When the powerclamping is begun or when the
>>>>> clamping controls have been modified, the bandwidth for the root task group is set.
>>>>> The 'quota' will be the amount of time that the system needs to be busy and 'period'
>>>>> will be the sum of this busy duration and the idle duration as calculated by the driver.
>>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and clamping mask since
>>>>> the thread starting powerclamping will set the bandwidth and throttling of all cpus will
>>>>> automatically fall in place. None of the other cpus need be bothered about this. This
>>>>> simplifies the design of the driver.
>>>>>
>>>>> Of course this is only if the idle injection metrics can be conveniently transformed
>>>>> into bandwidth control metrics. There are a couple of other primary concerns around if
>>>>> doing the below two in this patch is valid.
>>>>> a. This patch exports the functions to set the quota and period of task groups.
>>>>> b. This patch removes the constraint of not being able to set the root task grp's bandwidth.
>>>>>
>>>>> Signed-off-by: Preeti U Murthy <preeti@...ux.vnet.ibm.com>
>>>>
>>>> This doesn't compile.
>>>
>>> Thanks for reporting this! I realized that I had not compiled in the powerclamp driver
>>> as a module while compile testing it. I was focusing on the issues with the design and
>>> failed to cross verify this. Apologies for the inconvenience.
>>>
>>> Find the diff compile tested below.
>>>
>>> I also realized that clamp_cpus() that sets the bandwidth cannot be called from
>>> multiple places. Currently I am calling it from end_powerclamp(), when the user changes the
>>> idle clamping duration and from a queued timer. This will require synchronization between
>>> callers which is not really called for. The queued wakeup_timer alone can re-evaluate the
>>> clamping metrics after every throttle-unthrottle period and this should suffice as far
>>> as I can see. Thoughts ?
>>
>> Hmm, I've had two system lockups so far while running a kernel with
>> intel_powerclamp loaded. Both times it slowly ground to a halt and
>> processes piled up...
Hmm I see. I am not surprised because this patch is not complete yet.
The idea was to gain suggestions and review around the approach first
before I went ahead to making it robust. Let me ease the creases that I
found and repost this patch for you to test. Thanks a lot for the
testing efforts! :)
Regards
Preeti U Murthy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists