[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CBF116F.6090201@linux.vnet.ibm.com>
Date: Wed, 20 Oct 2010 21:27:35 +0530
From: Trinabh Gupta <trinabh@...ux.vnet.ibm.com>
To: Arjan van de Ven <arjan@...ux.intel.com>
CC: Venkatesh Pallipadi <venki@...gle.com>, peterz@...radead.org,
lenb@...nel.org, suresh.b.siddha@...el.com,
benh@...nel.crashing.org, linux-kernel@...r.kernel.org,
ak@...ux.intel.com
Subject: Re: [RFC V1] cpuidle: add idle routine registration and cleanup pm_idle
pointer
On 10/20/2010 08:48 PM, Arjan van de Ven wrote:
> On 10/20/2010 8:12 AM, Trinabh Gupta wrote:
>>
>>
>> On 10/20/2010 12:31 AM, Trinabh Gupta wrote:
>>>
>>>
>>> On 10/20/2010 12:19 AM, Venkatesh Pallipadi wrote:
>>>> On Tue, Oct 19, 2010 at 11:38 AM, Arjan van de Ven
>>>> <arjan@...ux.intel.com> wrote:
>>>>> On 10/19/2010 11:36 AM, Trinabh Gupta wrote:
>>>>>>
>>>>>> The core of the kernel's idle routine on x86 presently depends on an
>>>>>> exported pm_idle function pointer that is unmanaged and causing
>>>>>> hazard to various subsystems when they save and restore it.
>>>>>> The first problem is that this exported pointer can be
>>>>>> modified/flipped
>>>>>> by any subsystem. There is no tracking or notification mechanism.
>>>>>> Secondly and more importantly, various subsystems save the value of
>>>>>> this pointer, flip it and later restore to the saved value. There is
>>>>>> no guarantee that the saved value is still valid. The problem has
>>>>>> been discussed in [2,3] and Peter Zijlstra suggested removing pm_idle
>>>>>> and implementing a list based registration [1].
>>>>>>
>>>>>> This patch is an initial RFC implementation for x86 architecture
>>>>>> only. This framework can be generalised for other archs and also
>>>>>> include the current cpuidle framework for managing multiple idle
>>>>>> routines.
>>>>>>
>>>>>> Tests done with the patch:
>>>>>> ------------------------
>>>>>> 1. Build (on 2.6.36-rc7) and booted on x86 with C1E as deepest idle
>>>>>> state and current_idle was selected to be mwait_idle.
>>>>>>
>>>>>> 2. Build (on 2.5.36-rc8) and booted on x86 (Nehalem) with ACPI C3 as
>>>>>> deepest sleep state. The current_idle was selected to be
>>>>>> cpuidle_idle_call which is the cpuidle subsystem that will further
>>>>>> select idle routines from {C1,C2,C3}.
>>>>>>
>>>>>> Future implementation will try to eliminate this hirearchy and have
>>>>>> a single registration and menu/idle cpuidle governor for selection
>>>>>> of idle routine.
>>>>>
>>>>>
>>>>> looks like you're duplicating the cpuidle subsystem
>>>>>
>>>>> how about biting the bullet and just always and only use the cpuidle
>>>>> subsystem for all idle on x86 ?
>>>>>
>>>>
>>>> I agree with Arjan.
>>>> If we have a default_cpuidle driver which parses idle= params, handles
>>>> various mwait quirks in x86 process*.c and registers with cpuidle, we
>>>> can then always call cpuidle idle routine on x86.
>>>
>>> This wouldn't duplicate code. It would move parts/functionality of
>>> cpuidle into the kernel, keeping governors alone as modules.
>>>
>>> If we directly call cpuidle_idle_call() then this may be too much
>>> overhead for architectures that have single idle routine i.e cases where
>>> cpuidle is not used and will be seen as a bloat. (c.f goals 4a,b).
>>
>> Hi Venki, Arjan
>>
>> Building cpuidle into the kernel would add ~7KB for everyone, even
>> x86 architectures having only single idle state/routine.
>
> but now you're duplicating this functionality adding code for everyone.
I do not intend to duplicate code. I am trying to do what Andi
suggested i.e make the registration part lightweight and *move*
it into the kernel so that we can do away with problems such as
exported pm_idle pointer. This implementation is an *intermediate*
step.
>
> 99.999% of the people today run cpuidle... (especially embedded x86
> where they really care about power)
> all x86 going forward also has > 1 idle option anyway.
>
> and you're adding and extra layer in the middle that just duplicates the
> layer that's in use in practice above it.
As I said, this would not duplicate rather split things - minimal
registration part within the kernel (so that problems like
pm_idle are resolved) and governors as module.
-Trinabh
>
> seriously, this sounds like the wrong tradeoff to make.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists