[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181011073133.GZ5663@hirez.programming.kicks-ass.net>
Date: Thu, 11 Oct 2018 09:31:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()
On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote:
> While looking at native_sched_clock() disassembly I had
> the surprise to see the compiler (gcc 7.3 here) had
> optimized out the loop, meaning the code is broken.
>
> Using the documented and approved API not only fixes the bug,
> it also makes the code more readable.
>
> Replacing five this_cpu_read() by one this_cpu_ptr() makes
> the generated code smaller.
Does not for me, that is, the resulting asm is actually larger
You're quite right the loop went missing; no idea wth that compiler is
smoking (gcc-8.2 for me). In order to eliminate that loop it needs to
think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence)
will return the same value. But this_cpu_read() is an asm() statement,
it _should_ not assume such.
We assume that this_cpu_read() implies READ_ONCE() in a number of
locations, this really should not happen.
The reason it was written using this_cpu_read() is so that it can use
%gs: prefixed instructions and avoid ever loading that percpu offset and
doing manual address computation.
Let me prod at this with a sharp stick.
Powered by blists - more mailing lists