[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160901102925.GR10153@twins.programming.kicks-ass.net>
Date: Thu, 1 Sep 2016 12:29:25 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Stanislaw Gruszka <sgruszka@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Giovanni Gherdovich <ggherdovich@...e.cz>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Mike Galbraith <mgalbraith@...e.de>,
Paolo Bonzini <pbonzini@...hat.com>,
Rik van Riel <riel@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Wanpeng Li <wanpeng.li@...mail.com>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 1/3] sched/cputime: Improve scalability of
times()/clock_gettime() on 32 bit cpus
On Thu, Sep 01, 2016 at 12:07:34PM +0200, Stanislaw Gruszka wrote:
> On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> > You're now making rather hot paths slower to benefit a rather slow path,
> > that too is backwards.
>
> Ok, you have right, I made update_curr() slower (a bit I think, since
> this new seqcount primitive should be in the same cache line as other
> things).
seqcount adds 2 smp_wmb(), which on ARM, are not free (it is possible to
do with just 1 FWIW).
> But do we don't care about inconsistency of accessing of 64 bit variable
> on 32 bit processors (see patch 3) ? I know this is unlikely scenario
> to get inconsistency, but I assume it's still possible, or not?
Its actually quite possible. We've observed it a fair few times. 64bit
variables are 2 32bit stores/loads and getting interleaved data is quite
possible.
> If not, I can get rid of read_sum_exec_runtime() and just read
> sum_exec_runtime without task_rq_lock() protection on
> thread_group_cputime() . That would make the benchmark happy.
I think this benchmark is misguided. Just accept that O(nr_threads) is
expensive, same with process wide itimer, just don't use them when you
care about performance.
Powered by blists - more mailing lists