linux-kernel - Re: [PATCH 08/37] cputime: Convert task/group cputime to nsecs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170128115740.GA688@redhat.com>
Date:   Sat, 28 Jan 2017 12:57:40 +0100
From:   Stanislaw Gruszka <sgruszka@...hat.com>
To:     Frederic Weisbecker <fweisbec@...il.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Tony Luck <tony.luck@...el.com>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Mackerras <paulus@...ba.org>,
        Ingo Molnar <mingo@...nel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Rik van Riel <riel@...hat.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: [PATCH 08/37] cputime: Convert task/group cputime to nsecs

Hi Frederic and sorry for late comment.

On Sun, Jan 22, 2017 at 07:19:44PM +0100, Frederic Weisbecker wrote:
> Now that most cputime readers use the transition API which return the
> task cputime in old style cputime_t, we can safely store the cputime in
> nsecs. This will eventually make cputime statistics less opaque and more
> granular. Back and forth convertions between cputime_t and nsecs in order
> to deal with cputime_t random granularity won't be needed anymore.
<snip>
> -	cputime_t utime;
> -	cputime_t stime;
> +	u64 utime;
> +	u64 stime;
>  	unsigned long long sum_exec_runtime;
>  };
<snip>
> @@ -134,7 +134,7 @@ void account_user_time(struct task_struct *p, cputime_t cputime)
>  	int index;
>  
>  	/* Add user time to process. */
> -	p->utime += cputime;
> +	p->utime += cputime_to_nsecs(cputime);
>  	account_group_user_time(p, cputime);
<snip>
> +void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
>  {
>  	*ut = p->utime;
>  	*st = p->stime;
> o }

On 32 bit architectures 64bit store/load is not atomic and if not
protected - 64bit variables can be mangled. I do not see any protection
(lock) between utime/stime store and load in the patch and seems that
{u/s}time store/load can be performed at the same time. Though problem
is very very improbable it still can happen at least theoretically when
lower and upper 32 bits are changed at the same time i.e. process
{u,s}time become near to multiple of 2**32 nsec (aprox: 4sec) and
64bit {u,s}time is stored and loaded at the same time on different
cpus. As said this is very improbable situation, but eventually could
be possible on long lived processes.

BTW we have already similar problem with sum_exec_runtime. I posted
some patches to solve the problem, but non of them was good:
- https://lkml.org/lkml/2016/9/1/172
  this one slow down scheduler hot path's and Peter hates it.

- https://lkml.org/lkml/2016/9/6/305
  this one was fine for Peter, but I dislike it for taking
  task_rq_lock() and do not push it forward.

I considering fixing problem of sum_exec_runtime possible mangling
by using prev_sum_exec_runtime:

u64 read_sum_exec_runtime(struct task_struct *t)
{
       u64 ns, prev_ns;
 
       do {
               prev_ns = READ_ONCE(t->se.prev_sum_exec_runtime);
               ns = READ_ONCE(t->se.sum_exec_runtime);
       } while (ns < prev_ns || ns > (prev_ns + U32_MAX));
 
       return ns;
}

This should work based on fact that prev_sum_exec_runtime and
sum_exec_runtime are not modified and stored at the same time, so only
one of those variabled can be mangled. Though I need to think about 
correctnes of that a bit more.

Stanislaw