lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240820105036.39fb9bb7@gandalf.local.home>
Date: Tue, 20 Aug 2024 10:50:36 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Srikar Dronamraju <srikar@...ux.ibm.com>
Cc: Suleiman Souhlal <suleiman@...gle.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
 <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman
 <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Paolo Bonzini
 <pbonzini@...hat.com>, joelaf@...gle.com, vineethrp@...gle.com,
 linux-kernel@...r.kernel.org, kvm@...r.kernel.org, ssouhlal@...ebsd.org
Subject: Re: [PATCH] sched: Don't try to catch up excess steal time.

On Tue, 20 Aug 2024 15:15:55 +0530
Srikar Dronamraju <srikar@...ux.ibm.com> wrote:

> * Suleiman Souhlal <suleiman@...gle.com> [2024-08-06 20:11:57]:
> 
> > When steal time exceeds the measured delta when updating clock_task, we
> > currently try to catch up the excess in future updates.
> > However, this results in inaccurate run times for the future clock_task
> > measurements, as they end up getting additional steal time that did not
> > actually happen, from the previous excess steal time being paid back.
> > 
> > For example, suppose a task in a VM runs for 10ms and had 15ms of steal
> > time reported while it ran. clock_task rightly doesn't advance. Then, a
> > different task runs on the same rq for 10ms without any time stolen.
> > Because of the current catch up mechanism, clock_sched inaccurately ends
> > up advancing by only 5ms instead of 10ms even though there wasn't any
> > actual time stolen. The second task is getting charged for less time
> > than it ran, even though it didn't deserve it.
> > In other words, tasks can end up getting more run time than they should
> > actually get.
> > 
> > So, we instead don't make future updates pay back past excess stolen time.

In other words, If one task had more time stolen from it than it had run,
the excess time is removed from the next task even though it ran for its
entire slot?

I'm curious, how does a task get queued on the run queue if 100% of it's
time was stolen? That is, how did it get queued if the vCPU wasn't running?


> > 
> > Signed-off-by: Suleiman Souhlal <suleiman@...gle.com>
> > ---
> >  kernel/sched/core.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bcf2c4cc0522..42b37da2bda6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -728,13 +728,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> >  #endif
> >  #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> >  	if (static_key_false((&paravirt_steal_rq_enabled))) {
> > -		steal = paravirt_steal_clock(cpu_of(rq));
> > +		u64 prev_steal;
> > +
> > +		steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
> >  		steal -= rq->prev_steal_time_rq;
> >  
> >  		if (unlikely(steal > delta))
> >  			steal = delta;
> >  
> > -		rq->prev_steal_time_rq += steal;
> > +		rq->prev_steal_time_rq = prev_steal;
> >  		delta -= steal;
> >  	}
> >  #endif  
> 
> 
> Agree with the change.
> 
> Probably, we could have achieved by just moving a line above
> Something like this?
> 
> #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> 	if (static_key_false((&paravirt_steal_rq_enabled))) {
> 		steal = paravirt_steal_clock(cpu_of(rq));
> 		steal -= rq->prev_steal_time_rq;
> 		rq->prev_steal_time_rq += steal;
> 
> 		if (unlikely(steal > delta))
> 			steal = delta;
> 
> 		delta -= steal;
> 	}
> #endif

Yeah, that is probably a nicer way of doing the same thing.

Suleiman, care to send a v2?

-- Steve


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ