linux-kernel - Re: [PATCH 1/3] sched/cputime: Improve scalability of times()/clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160901102925.GR10153@twins.programming.kicks-ass.net>
Date:   Thu, 1 Sep 2016 12:29:25 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Stanislaw Gruszka <sgruszka@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Galbraith <mgalbraith@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 1/3] sched/cputime: Improve scalability of
 times()/clock_gettime() on 32 bit cpus

On Thu, Sep 01, 2016 at 12:07:34PM +0200, Stanislaw Gruszka wrote:
> On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> > You're now making rather hot paths slower to benefit a rather slow path,
> > that too is backwards.
> 
> Ok, you have right, I made update_curr() slower (a bit I think, since
> this new seqcount primitive should be in the same cache line as other
> things).

seqcount adds 2 smp_wmb(), which on ARM, are not free (it is possible to
do with just 1 FWIW).

> But do we don't care about inconsistency of accessing of 64 bit variable
> on 32 bit processors (see patch 3) ? I know this is unlikely scenario
> to get inconsistency, but I assume it's still possible, or not?

Its actually quite possible. We've observed it a fair few times. 64bit
variables are 2 32bit stores/loads and getting interleaved data is quite
possible.

> If not, I can get rid of read_sum_exec_runtime() and just read
> sum_exec_runtime without task_rq_lock() protection on 
> thread_group_cputime() . That would make the benchmark happy. 

I think this benchmark is misguided. Just accept that O(nr_threads) is
expensive, same with process wide itimer, just don't use them when you
care about performance.