[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR11MB48244A3109FA7A060AB5280ACD729@PH0PR11MB4824.namprd11.prod.outlook.com>
Date: Thu, 25 Aug 2022 11:08:14 +0000
From: "Mi, Dapeng1" <dapeng1.mi@...el.com>
To: David Laight <David.Laight@...LAB.COM>,
"rafael@...nel.org" <rafael@...nel.org>,
"daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>
CC: "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"zhenyuw@...ux.intel.com" <zhenyuw@...ux.intel.com>
Subject: RE: [PATCH] KVM: x86: use TPAUSE to replace PAUSE in halt polling
> From: David Laight <David.Laight@...LAB.COM>
> Sent: Wednesday, August 24, 2022 10:08 PM
> To: Mi, Dapeng1 <dapeng1.mi@...el.com>; rafael@...nel.org;
> daniel.lezcano@...aro.org; pbonzini@...hat.com
> Cc: linux-pm@...r.kernel.org; linux-kernel@...r.kernel.org;
> kvm@...r.kernel.org; zhenyuw@...ux.intel.com
> Subject: RE: [PATCH] KVM: x86: use TPAUSE to replace PAUSE in halt polling
>
> From: Dapeng Mi
> > Sent: 24 August 2022 10:11
> >
> > TPAUSE is a new instruction on Intel processors which can instruct
> > processor enters a power/performance optimized state. Halt polling
> > uses PAUSE instruction to wait vCPU is waked up. The polling time
> > could be long and cause extra power consumption in some cases.
> >
> > Use TPAUSE to replace the PAUSE instruction in halt polling to get a
> > better power saving and performance.
>
> What is the effect on wakeup latency?
> Quite often that is far more important than a bit of power saving.
In theory, the increased wakeup latency should be less than 1us. I thought this latency impaction should be minimal. I ever run two scheduling related benchmarks, hackbench and schbench. I didn't see this change would obviously impact the performance.
When running these two scheduling benchmarks on host, a FIO workload is running in a Linux VM simultaneously, FIO would trigger a large number of HLT VM-exit and then trigger haltpolling, then we can see how TPAUSE can impact the performance.
Here are the hackbench and schbench data on Intel ADL platform.
Hackbench base TPAUSE %delta
Group-1 0.056 0.052 7.14%
Group-4 0.165 0.164 0.61%
Group-8 0.313 0.309 1.28%
Group-16 0.834 0.842 -0.96%
Schbench - Latency percentiles (usec) base TPAUSE
./schbench -m 1
50.0th 15 13
99.0th 221 203
./schbench -m 2
50.0th 26 23
99.0th 16368 16544
./schbench -m 4
50.0th 56 60
99.0th 33984 34112
Since the schbench benchmark is not so stable, but I can see the data is on a same level.
> The automatic entry of sleep states is a PITA already.
> Block 30 RT threads in cv_wait() and then do cv_broadcast().
> Use ftrace to see just how long it takes the last thread to wake up.
I think this test is familiar with the hackbench and schbench, it should have similar result.
Anyway, performance and power is a tradeoff, it depends on which side we think is more important.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1
> 1PT, UK Registration No: 1397386 (Wales)
Powered by blists - more mailing lists