lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Sep 2016 12:29:25 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Stanislaw Gruszka <sgruszka@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Galbraith <mgalbraith@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 1/3] sched/cputime: Improve scalability of
 times()/clock_gettime() on 32 bit cpus

On Thu, Sep 01, 2016 at 12:07:34PM +0200, Stanislaw Gruszka wrote:
> On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> > You're now making rather hot paths slower to benefit a rather slow path,
> > that too is backwards.
> 
> Ok, you have right, I made update_curr() slower (a bit I think, since
> this new seqcount primitive should be in the same cache line as other
> things).

seqcount adds 2 smp_wmb(), which on ARM, are not free (it is possible to
do with just 1 FWIW).

> But do we don't care about inconsistency of accessing of 64 bit variable
> on 32 bit processors (see patch 3) ? I know this is unlikely scenario
> to get inconsistency, but I assume it's still possible, or not?

Its actually quite possible. We've observed it a fair few times. 64bit
variables are 2 32bit stores/loads and getting interleaved data is quite
possible.

> If not, I can get rid of read_sum_exec_runtime() and just read
> sum_exec_runtime without task_rq_lock() protection on 
> thread_group_cputime() . That would make the benchmark happy. 

I think this benchmark is misguided. Just accept that O(nr_threads) is
expensive, same with process wide itimer, just don't use them when you
care about performance.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ