lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Sep 2016 12:07:34 +0200
From:   Stanislaw Gruszka <sgruszka@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Galbraith <mgalbraith@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 1/3] sched/cputime: Improve scalability of
 times()/clock_gettime() on 32 bit cpus

On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 01, 2016 at 11:27:42AM +0200, Stanislaw Gruszka wrote:
> > My previous commit:
> > 
> >   a1eb1411b4e4 ("sched/cputime: Improve scalability by not accounting thread group tasks pending runtime")
> > 
> > helped to achieve good performance of SYS_times() and
> > SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) on 64 bit architectures.
> > However taking task_rq_lock() when reading t->se.sum_exec_runtime on
> > 32 bit architectures still make those syscalls slow.
> > 
> > The reason why we take the lock is to make 64bit sum_exec_runtime
> > variable consistent. While a inconsistency scenario is very very unlike,
> > I assume it still may happen at least on some 32 bit architectures.
> > 
> > To protect the variable I introduced new seqcount lock. Performance
> > improvements on machine with 32 cores (32-bit cpus) measured by
> > benchmarks described in commit:
> 
> No,.. running 32bit kernels on a machine with 32 cores is insane, full
> stop.

I agree with that. But I also run this the benchmark on 4 cores
armv7l and see good improvements there too.

> You're now making rather hot paths slower to benefit a rather slow path,
> that too is backwards.

Ok, you have right, I made update_curr() slower (a bit I think, since
this new seqcount primitive should be in the same cache line as other
things).

But do we don't care about inconsistency of accessing of 64 bit variable
on 32 bit processors (see patch 3) ? I know this is unlikely scenario
to get inconsistency, but I assume it's still possible, or not?

If not, I can get rid of read_sum_exec_runtime() and just read
sum_exec_runtime without task_rq_lock() protection on 
thread_group_cputime() . That would make the benchmark happy. 

Stanislaw

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ