[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b3ce9e9-e805-1b8d-86c3-c8f498a4d3dd@intel.com>
Date: Wed, 16 Oct 2019 13:56:17 +0800
From: "Yin, Fengwei" <fengwei.yin@...el.com>
To: David Laight <David.Laight@...LAB.COM>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"lenb@...nel.org" <lenb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] ACPI / processor_idle: use ndelay instead of io port
access for wait
Hi David,
On 10/15/2019 7:48 PM, David Laight wrote:
> From: Yin Fengwei
>> Sent: 15 October 2019 09:04
>> In function acpi_idle_do_entry(), an ioport access is used for dummy
>> wait to guarantee hardware behavior. But it could trigger unnecessary
>> vmexit in virtualization environment.
>>
>> If we run linux as guest and export all available native C state to
>> guest, we did see many PM timer access triggered VMexit when guest
>> enter deeper C state in our environment (We used ACRN hypervisor
>> instead of kvm or xen which has PM timer emulated and exports all
>> native C state to guest).
>>
>> According to the original comments of this part of code, io port
>> access is only for dummy wait. We could use busy wait instead of io
>> port access to guarantee hardware behavior and avoid unnecessary
>> VMexit.
>
> You need some hard synchronisation instruction(s) after the inb()
> and before any kind of delay to ensure your delay code is executed
> after the inb() completes.
>
> I'm pretty sure that inb() is only synchronised with memory reads.
Thanks a lot for the comments.
I didn't find the common serializing instructions API in kernel (only
memory barrier which is used to make sure of memory access). For Intel
x86, cpuid could be used as serializing instruction. But it's not
suitable for common code here. Do you have any suggestion?
>
> ...
>> + /* profiling the time used for dummy wait op */
>> + ktime_get_real_ts64(&ts0);
>> + inl(acpi_gbl_FADT.xpm_timer_block.address);
>> + ktime_get_real_ts64(&ts1);
>
> That could be dominated by the cost of ktime_get_real_ts64().
> It also need synchronising instructions.
I did some testing. ktime_get_real_ts64() takes much less time than io
port access.
The test code is like:
1.
local_irq_save(flag);
ktime_get_real_ts64(&ts0);
inl(acpi_gbl_FADT.xpm_timer_block.address);
ktime_get_real_ts64(&ts1);
local_irq_restore(flag);
2.
local_irq_save(flag);
ktime_get_real_ts64(&ts0);
ktime_get_real_ts64(&ts1);
local_irq_restore(flag);
The delta in 1 is about 500000ns. And delta in 2 is about
2000ns. The date is gotten on Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz.
So I suppose the impact of ktime_get_real_ts64 is small.
Regards
Yin, Fengwei
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
Powered by blists - more mailing lists