[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190918125138.GB12534@in.ibm.com>
Date: Wed, 18 Sep 2019 18:21:38 +0530
From: Gautham R Shenoy <ego@...ux.vnet.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>
Cc: "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>,
Nathan Lynch <nathanl@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>,
Tyrel Datwyler <tyreld@...ux.ibm.com>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Kamalesh Babulal <kamaleshb@...ibm.com>,
"Naveen N . Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Subject: Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of
cede_offline
On Wed, Sep 18, 2019 at 03:14:15PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com> writes:
> > From: "Gautham R. Shenoy" <ego@...ux.vnet.ibm.com>
> >
> > Currently on Pseries Linux Guests, the offlined CPU can be put to one
> > of the following two states:
> > - Long term processor cede (also called extended cede)
> > - Returned to the Hypervisor via RTAS "stop-self" call.
> >
> > This is controlled by the kernel boot parameter "cede_offline=on/off".
> >
> > By default the offlined CPUs enter extended cede.
>
> Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") (Nov 2009)
>
> Which you wrote :)
Mea Culpa! I forgot to include the "Fixes commit 3aa565f53c39" into
Patch 1 of the series.
>
> Why was that wrong?
It was wrong from the definition of what PHYP considers as
"not-active" CPU. From the point of view of that hypervisor, a CPU is
not-active iff it is in RTAS "stop-self". Thus if a CPU is offline via
extended cede, and not using any cycles, it is still considered to be
active, by PHYP. This causes PURR accounting is broken.
>
> > The PHYP hypervisor considers CPUs in extended cede to be "active"
> > since the CPUs are still under the control fo the Linux Guests. Hence, when we change the
> > SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
> > will continue to count the values for offlined CPUs in extended cede
> > as if they are online.
> >
> > One of the expectations with PURR is that the for an interval of time,
> > the sum of the PURR increments across the online CPUs of a core should
> > equal the number of timebase ticks for that interval.
> >
> > This is currently not the case.
>
> But why does that matter? It's just some accounting stuff, does it
> actually break something meaningful?
As Naveen mentioned, it breaks lparstat which the customers are using
for capacity planning. Unfortunately we discovered this 10 years after
the feature was written.
>
> Also what does this do to the latency of CPU online/offline.
It will have a slightly higher latency compared to extended cede,
since it involves an additional rtas-call for both the start and
stopping of CPU. Will measure the exact difference and post it in the
next version.
> And what does this do on KVM?
KVM doesn't seem to depend on the state of the offline VCPU as it has
an explicit way of signalling whether a CPU is online or not, via
KVM_REG_PPC_ONLINE. In commit 7aa15842c15f ("KVM: PPC: Book3S HV: Set
RWMR on POWER8 so PURR/SPURR count correctly") we use this KVM reg to
update the count of online vCPUs in a core, and use this count to set
the RWMR correctly before dispatching the core.
So, this patchset doesn't affect KVM.
>
>
> > In the following data (Generated using
> > https://github.com/gautshen/misc/blob/master/purr_tb.py):
> >
> >
> > delta tb = tb ticks elapsed in 1 second.
> > delta purr = sum of PURR increments on online CPUs of that core in 1
> > second
> >
> > SMT=off
> > ===========================================
> > Core delta tb(apprx) delta purr
> > ===========================================
> > core00 [ 0] 512000000 69883784
> > core01 [ 8] 512000000 88782536
> > core02 [ 16] 512000000 94296824
> > core03 [ 24] 512000000 80951968
>
> Showing the expected value in another column would make this much
> clearer.
Thanks. Will update the testcase to call out the expected value.
>
> cheers
>
--
Thanks and Regards
gautham.
Powered by blists - more mailing lists