[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <54814810.6030302@oracle.com>
Date: Fri, 05 Dec 2014 13:52:16 +0800
From: ethan zhao <ethan.zhao@...cle.com>
To: Linda Knippers <linda.knippers@...com>
CC: Kristen Carlson Accardi <kristen@...ux.intel.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
dirk.j.brandewie@...el.com, viresh.kumar@...aro.org,
corbet@....net, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
ethan.kernel@...il.com
Subject: Re: [PATCH 2/2 V7] intel_pstate: add kernel parameter to force loading
on Sun X86 servers.
Linda,
On 2014/12/5 12:56, Linda Knippers wrote:
> Hi Ethan,
>
> On 12/4/2014 10:38 PM, ethan zhao wrote:
>> Linda,
>>
>> On 2014/12/5 7:03, Linda Knippers wrote:
>>> On 12/4/2014 5:38 PM, Kristen Carlson Accardi wrote:
>>>> On Thu, 04 Dec 2014 23:10:58 +0100
>>>> "Rafael J. Wysocki" <rjw@...ysocki.net> wrote:
>>>>
>>>>> On Thursday, December 04, 2014 11:07:31 AM Ethan Zhao wrote:
>>>>>> To force loading on Oracle Sun X86 servers, provide one kernel command line
>>>>>> parameter
>>>>>>
>>>>>> intel_pstate = ora_force
>>>>> I would suggest to change the name of the option to "oracle_force" or
>>>>> "sun_force"
>>>>> for clarity.
>>>>>
>>>>> Anyway, I need an ACK from Kristen if this patch is to be applied.
>>>>>
>>>>>> For those who be aware of the risk of no power capping capabily working and
>>>>>> try to get better performance with this driver.
>>>>>>
>>>>>> Signed-off-by: Ethan Zhao <ethan.zhao@...cle.com>
>>>>>> ---
>>>>>> v2: change to hardware vendor specific naming parameter.
>>>>>> v4: refine code and doc.
>>>>>> v5&v6: fix a typo in doc.
>>>>>> v7: change enum PCC to PPC.
>>>>>>
>>>>>> Documentation/kernel-parameters.txt | 5 +++++
>>>>>> drivers/cpufreq/intel_pstate.c | 6 +++++-
>>>>>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/Documentation/kernel-parameters.txt
>>>>>> b/Documentation/kernel-parameters.txt
>>>>>> index 479f332..7d0983e 100644
>>>>>> --- a/Documentation/kernel-parameters.txt
>>>>>> +++ b/Documentation/kernel-parameters.txt
>>>>>> @@ -1446,6 +1446,11 @@ bytes respectively. Such letter suffixes can also be
>>>>>> entirely omitted.
>>>>>> disable
>>>>>> Do not enable intel_pstate as the default
>>>>>> scaling driver for the supported processors
>>>>>> + ora_force
>>>>>> + Force loading intel_pstate on Oracle Sun Servers(X86).
>>>>>> + only for those who be aware of the risk of no power capping
>>>>>> + capability working and try to get better performance with this
>>>>>> + driver.
>>>>> That is not sufficiently clear. What does "risk of no power capping capability
>>>>> working" mean, in particular?
>>>>>
>>>>>> intremap= [X86-64, Intel-IOMMU]
>>>>>> on enable Interrupt Remapping (default)
>>>>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>>>>> index 1bb62ca..2654e13 100644
>>>>>> --- a/drivers/cpufreq/intel_pstate.c
>>>>>> +++ b/drivers/cpufreq/intel_pstate.c
>>>>>> @@ -866,6 +866,7 @@ static struct cpufreq_driver intel_pstate_driver = {
>>>>>> };
>>>>>> static int __initdata no_load;
>>>>>> +static unsigned int ora_force;
>>>>>> static int intel_pstate_msrs_not_valid(void)
>>>>>> {
>>>>>> @@ -1003,7 +1004,8 @@ static bool intel_pstate_platform_pwr_mgmt_exists(void)
>>>>>> case PSS:
>>>>>> return intel_pstate_no_acpi_pss();
>>>>>> case PPC:
>>>>>> - return intel_pstate_has_acpi_ppc();
>>>>>> + return intel_pstate_has_acpi_ppc() &&
>>>>>> + (!ora_force);
>>>>>> }
>>>>>> }
>>>>>> @@ -1078,6 +1080,8 @@ static int __init intel_pstate_setup(char *str)
>>>>>> if (!strcmp(str, "disable"))
>>>>>> no_load = 1;
>>>>>> + if (!strcmp(str, "ora_force"))
>>>>>> + ora_force = 1;
>>>>>> return 0;
>>>>>> }
>>>>>> early_param("intel_pstate", intel_pstate_setup);
>>>>> And can anyone please remind me what was wrong with a "force" option that would
>>>>> work for everyone, not just Oracle/Sun?
>>>>>
>>>> That was my suggestion as well (i.e. a parameter to bypass the vendor
>>>> checks), but Linda didn't like it. My personal opinion is that unless
>>>> it's generic, I don't really feel like having a force option solely for
>>>> oracle. I'm not convinced you want this for production machines, and I
>>>> think for debug purposes I don't want a vendor specific param.
>>> I'd be happy with it if it somehow disabled what the platform is doing,
>>> but it doesn't. I don't see the point of forcing intel_pstate if you
>>> can't force the platform to stop doing power management at the same time.
>>> Even if it's for test/debug purposes, I'm not sure what you're testing
>>> when you have dueling power management.
>> Most of the power management functions is done by SP(service processor) on Sun
>> X86
>> servers, the 'force' parameter is not supposed to disable whole platform
>> working I think,
>> with intel_pstate, it doesn't do CPU power capping issued via _PPC
>> notification. but all
>> other rest parts of the power management still work. There is no scene as HP
>> proliant OS
>> mode that OS could control everything(sorry, I don't know Proliant Architecture).
>>
>> So at least, it doesn't make sense to Oracle Sun X86 servers, provide an OS
>> option to stop
>> all PM functions even disable ACPI at all.
>>
>> If the users could be aware of that the power capping doesn't work with CPUs.
>> they could
>> load intel_pstate driver, though there may be faulty in SP . they still could
>> monitor and
>> manage the power consumption of other parts in the server.
>>
>> Perhaps this is what we would test/have tested with intel_pstate.
>>
>> There is a public manual about PM command in Sun server SP may could help you
>> to understand
>> the difference.
>> https://docs.oracle.com/cd/E19121-01/sf.x4150/820-6412-12/820-6412-12.pdf
> I've tried to put the pieces together so tell me if I've got this right.
Sorry, I have a little trouble to express complex thing with English,
or something I don't
understand them thoroughly makes thing worse :>.
> Under normal circumstances, the Oracle platform wants the OS to do normal
> power management (p-state and c-state management) using the ACPI information
> that the firmware provides.
OS does the action that selects the State with the information from
ACPI and governor /users
> The firwmare or SP will potentially change the
> ACPI information on the fly for things like power capping. So normally, you
> would want the acpi_cpufreq driver. If the intel_pstate driver is loaded,
> then that's going to disregard the ACPI information, uncluding the changes
> that the firmware or SP may make when power capping.
Definitely right.
> There is no case where
> the firmware or SP will try to manage pstates or cstates itself. Is that right?
Yup,
I heard of some new CPUs could manage the states by itself ?
>
> If that's right, then I can see how the force option could make sense for
> your platforms. Sorry it took me so long to get this part.
>
> HP platforms are different. On our platforms, the platform is configurable
> and customers can choose to have the firmware manage p-states, in which case
> the pcc_cpufreq driver will be loaded to allow the OS to provide hints, or
> to have the OS provide manage p-states, in which case the intel-pstate
> driver will be loaded. In our case, forcing the intel-pstate driver if
> the platform is configured to have the firmware manage the p-states means
> that both are trying to manage the power. I don't think that ever makes
> sense. If the admin wants intel-pstate, its easy to configure the platform
> through the iLO or the BIOS/UEFI setup so that the OS manages the p-states.
>
> If the force option only works if the platform exposes _PPC, then it would
> work with Oracle platforms and not work with HP platforms. That gives us
> what we want and is also note necessarily vendor specific. And the good
> news is that's actually how your recent patches work.
So it is not necessary to bypass HP checking code and just rename the
'force' to a generic name
it should works right ?
>
> If the description said something about forcing the intel-pstate driver in
> place of the acpi_cpufreq driver, assuming the processor is supported by
> the intel-pstate driver, then I think we're good with a generic sounding
> boot option. It should also be clear that someone using force would
> not get the intel-pstate driver on older processors.
if the old CPU is not supported, the driver will report "ENODEV" error.
> There's no way
> to force that. And you could put in whatever warnings you want about
> other features, such as power capping, potentially being disabled if
> the force option is used.
>
> Does this make sense to everyone? It finally does to me. :-)
Thanks to your clarification. it is crystal clear for both HP &
Oracle X86 servers.
>
> -- ljk
>
>
>
>>> The description would need to be different too since I think on
>>> ProLiant, power capping can happen at any time, even if the
>>> system is in OS control mode and the intel_pstate driver is
>>> loaded.
>> Does that mean only the CPU power capping not work ? If so, they work the same
>> way.
>>> Can anyone suggest a description for a force option that would
>>> make sense generically?
>> the 'force' option means CPU power capping (frequency limited) not work to all,
>> right ?
>>
>> Thanks,
>> Ethan
>>> -- ljk
>>>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists