linux-kernel - Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime counter if timer is disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210616105116.GA801071@lothringen>
Date:   Wed, 16 Jun 2021 12:51:16 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        "Eric W . Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 2/6] posix-cpu-timers: Don't start process wide cputime
 counter if timer is disabled

On Wed, Jun 16, 2021 at 10:51:21AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 04, 2021 at 01:31:55PM +0200, Frederic Weisbecker wrote:
> > If timer_settime() is called with a 0 expiration on a timer that is
> > already disabled, the process wide cputime counter will be started
> > and won't ever get a chance to be stopped by stop_process_timer() since
> > no timer is actually armed to be processed.
> > 
> > This process wide counter might bring some performance hit due to the
> > concurrent atomic additions at the thread group scope.
> > 
> > The following snippet is enough to trigger the issue.
> > 
> > 	void trigger_process_counter(void)
> > 	{
> > 		timer_t id;
> > 		struct itimerspec val = { };
> > 
> > 		timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> > 		timer_settime(id, TIMER_ABSTIME, &val, NULL);
> > 		timer_delete(id);
> > 	}
> > 
> > So make sure we don't needlessly start it.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> > Cc: Oleg Nesterov <oleg@...hat.com>
> > Cc: Thomas Gleixner <tglx@...utronix.de>
> > Cc: Peter Zijlstra (Intel) <peterz@...radead.org>
> > Cc: Ingo Molnar <mingo@...nel.org>
> > Cc: Eric W. Biederman <ebiederm@...ssion.com>
> > ---
> >  kernel/time/posix-cpu-timers.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> > index aa52fc85dbcb..132fd56fb1cd 100644
> > --- a/kernel/time/posix-cpu-timers.c
> > +++ b/kernel/time/posix-cpu-timers.c
> > @@ -632,10 +632,15 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
> >  	 * times (in arm_timer).  With an absolute time, we must
> >  	 * check if it's already passed.  In short, we need a sample.
> >  	 */
> > -	if (CPUCLOCK_PERTHREAD(timer->it_clock))
> > +	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
> >  		val = cpu_clock_sample(clkid, p);
> > -	else
> > -		val = cpu_clock_sample_group(clkid, p, true);
> > +	} else {
> > +		/*
> > +		 * Sample group but only start the process wide cputime counter
> > +		 * if the timer is to be enabled.
> > +		 */
> > +		val = cpu_clock_sample_group(clkid, p, !!new_expires);
> > +	}
> 
> The cpu_timer_enqueue() is in arm_timer() and the condition for calling
> that is:
> 
>   'new_expires != 0 && val < new_expires'
> 
> Which is not the same as the one you add.

There are two different things here:

1) the threadgroup cputime counter, activated by cpu_clock_sample_group(clkid,
p, true)

2) the expiration set (+ the callback enqueued) in arm_timer()

The issue here is that we go through 1) but not through 2)

> 
> I'm thinking the fundamental problem here is the disconnect between
> cpu_timer_enqueue() and pct->timers_active ?

You're right it's the core issue. But what prevents the whole to be
fundamentally connected is a circular dependency: we need to know the
threadgroup cputime before arming the timer, but we would need to know
if we arm the timer before starting the threadgroup cputime counter

To sum up, the current sequence is:

* fetch the threadgroup cputime AND start the whole threadgroup counter

* arm the timer if it isn't zero and it hasn't yet expired

While the ideal sequence should be:

* fetch the threadgroup cputime (without starting the whole threadgroup counter
  yet)

* arm the timer if it isn't zero and it hasn't yet expired

* iff we armed the timer, start the whole theadgroup counter

But that means re-iterating the whole threadgroup and update atomically
the group counter with each task's time.