[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQwwn4Z2aYDJlH9T@linux.ibm.com>
Date: Thu, 6 Nov 2025 10:52:39 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
Ben Segall <bsegall@...gle.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...nel.org>, Juri Lelli <juri.lelli@...hat.com>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Mel Gorman <mgorman@...e.de>, Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU
capacity
* Vincent Guittot <vincent.guittot@...aro.org> [2025-11-03 09:46:26]:
> On Wed, 29 Oct 2025 at 09:32, Srikar Dronamraju <srikar@...ux.ibm.com> wrote:
> > * Vincent Guittot <vincent.guittot@...aro.org> [2025-10-29 08:43:34]:
> > > On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar@...ux.ibm.com> wrote:
> > > >
> > > IIUC, the migration is triggered by the reduced capacity case when
> > > there is 1 task on the CPU
> >
> > Thanks Vincent for taking a look at the change.
> >
> > Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
> > cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
> > idle. The vCPUs that are busy will start seeing steal time of around 33%
> > because they cant run completely on the Physical CPU. Without the change,
> > they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
> > ones will have their capacity intact. So when the scheduler switches the 3
> > tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
> > and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
> > will see their capacity increase since their steal time reduces. Hence the
> > tasks will be migrated again.
>
> Thanks for the details
> This is probably even more visible when vcpu are not pinned to separate cpu
If workload runs on vCPUs pinned to CPUs belonging to the same core, then
yes, steal may be less visible. However if workload were to run unpinned or
were to run on vCPUs pinned to CPUs belonging to different cores, then its
more visible.
> > >
> > > > can repeat continuously, resulting in ping-pong behavior between SMT
> > > > siblings.
> > >
> > > Does it mean that the vCPU generates its own steal time or is it
> > > because other vcpus are already running on the other CPU and they
> > > starts to steal time on the sibling vCPU
> >
> > There are other vCPUs running and sharing the same Physical CPU, and hence
> > these vCPUs are seeing steal time.
> >
> > >
> > > >
> > > > To avoid migrations solely triggered by steal time, disable steal from
> > > > updating CPU capacity when running in shared processor mode.
> > >
> > > You are disabling the steal time accounting only for your arch. Does
> > > it mean that only powerpc are impacted by this effect ?
> >
> > On PowerVM, the hypervisor schedules at a core granularity. So in the above
> > scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
> > core. So even if 2 threads are running, they would be scheduled on 2 vCores
> > and hence we would start seeing 50% steal. So this steal accounting is more
> > predominant on Shared LPARs running on PowerVM.
> >
> > However we can use this same mechanism on other architectures too since the
> > framework is arch independent.
> >
> > Does this clarify?
>
> yes, thanks
> I see 2 problems in your use case, the idle cpu doesn't have steal
> time even if the host cpu on which it will run, is already busy with
> other things
> and with not pinned vcpu, we can't estimate what will be the steal
> time on the target host
> And I don't see a simple way other than disabling steal time
>
Yes, neither we can have steal time for an idle sibling nor can we estimate
the steal time for the target CPU. Thanks for acknowledging the problem.
--
Thanks and Regards
Srikar Dronamraju
Powered by blists - more mailing lists