linux-kernel - Re: [PATCH 1/3] sched/cputime: Improve scalability of times()/clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160901100733.GA6388@redhat.com>
Date:   Thu, 1 Sep 2016 12:07:34 +0200
From:   Stanislaw Gruszka <sgruszka@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Galbraith <mgalbraith@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 1/3] sched/cputime: Improve scalability of
 times()/clock_gettime() on 32 bit cpus

On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 01, 2016 at 11:27:42AM +0200, Stanislaw Gruszka wrote:
> > My previous commit:
> > 
> >   a1eb1411b4e4 ("sched/cputime: Improve scalability by not accounting thread group tasks pending runtime")
> > 
> > helped to achieve good performance of SYS_times() and
> > SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) on 64 bit architectures.
> > However taking task_rq_lock() when reading t->se.sum_exec_runtime on
> > 32 bit architectures still make those syscalls slow.
> > 
> > The reason why we take the lock is to make 64bit sum_exec_runtime
> > variable consistent. While a inconsistency scenario is very very unlike,
> > I assume it still may happen at least on some 32 bit architectures.
> > 
> > To protect the variable I introduced new seqcount lock. Performance
> > improvements on machine with 32 cores (32-bit cpus) measured by
> > benchmarks described in commit:
> 
> No,.. running 32bit kernels on a machine with 32 cores is insane, full
> stop.

I agree with that. But I also run this the benchmark on 4 cores
armv7l and see good improvements there too.

> You're now making rather hot paths slower to benefit a rather slow path,
> that too is backwards.

Ok, you have right, I made update_curr() slower (a bit I think, since
this new seqcount primitive should be in the same cache line as other
things).

But do we don't care about inconsistency of accessing of 64 bit variable
on 32 bit processors (see patch 3) ? I know this is unlikely scenario
to get inconsistency, but I assume it's still possible, or not?

If not, I can get rid of read_sum_exec_runtime() and just read
sum_exec_runtime without task_rq_lock() protection on 
thread_group_cputime() . That would make the benchmark happy. 

Stanislaw