[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvQo9gJgTlZ3nB03@freefall.freebsd.org>
Date: Wed, 25 Sep 2024 15:15:02 +0000
From: Suleiman Souhlal <ssouhlal@...ebsd.org>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Suleiman Souhlal <suleiman@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>, joelaf@...gle.com,
vineethrp@...gle.com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, Srikar Dronamraju <srikar@...ux.ibm.com>,
Sean Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH v2] sched: Don't try to catch up excess steal time.
On Wed, Sep 25, 2024 at 03:26:56PM +0100, David Woodhouse wrote:
> On Wed, 2024-09-25 at 13:25 +0000, Suleiman Souhlal wrote:
> > >
> > > I'm still not utterly convinced this is the right thing to do, though.
> > > It means you will constantly undermeasure the accounting of steal time
> > > as you discard the excess each time.
> > >
> > > The underlying bug here is that we are sampling the steal time and the
> > > time slice at *different* times. This update_rq_clock_task() function
> > > could be called with a calculated 'delta' argument... and then
> > > experience a large amount of steal time *before* calling
> > > paravirt_steal_clock(), which is how we end up in the situation where
> > > the calculated steal time exceeds the running time of the previous
> > > task.
> > >
> > > Which task *should* that steal time be accounted to? At the moment it
> > > ends up being accounted to the next task to run — which seems to make
> > > sense to me. In the situation I just described, we can consider the
> > > time stolen in update_rq_clock_task() itself to have been stolen from
> > > the *incoming* task, not the *outgoing* one. But that seems to be what
> > > you're objecting to?
> >
> > This is a good description of the problem, except that the time stolen
> > in update_rq_clock_task() itself is actually being stolen from the
> > outgoing task. This is because we are still trying to calculate how long
> > it ran for (update_curr()), and time hasn't started ticking for the
> > incoming task yet. We haven't set the incoming task's exec_start with the
> > new clock_task time yet.
> >
> > So, in my opinion, it's wrong to give that time to the incoming task.
>
> That makes sense. That steal time is actually stolen from *neither*
> task, since it's after the 'end' timestamp of the outgoing task, and
> before the 'start' timestamp of the incoming task.
>
> So where *should* it be accounted?
>
> Or is it actually correct to drop it completely?
>
> If you can make a coherent case for the fact that dropping it is really
> the right thing to do (not *just* that it doesn't belong to the
> outgoing task, which is the case you make in your existing commit
> message), then I suppose I'm OK with your patch as-is.
> >
Yes, that's a good way to put it: The excess steal time isn't actually
being stolen from anyone.
And since it's not being stolen from anyone, isn't the right thing to do
to drop it?
There might still be extra steal time that doesn't exceed the current
'delta' from the race between reading the two values, that would still
be erroneously accounted to the outgoing task, which this patch doesn't
address, but we know that any steal > delta is from this race and should
be dropped.
-- Suleiman
Powered by blists - more mailing lists