linux-kernel - Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181011073133.GZ5663@hirez.programming.kicks-ass.net>
Date:   Thu, 11 Oct 2018 09:31:33 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()

On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote:
> While looking at native_sched_clock() disassembly I had
> the surprise to see the compiler (gcc 7.3 here) had
> optimized out the loop, meaning the code is broken.
> 
> Using the documented and approved API not only fixes the bug,
> it also makes the code more readable.
> 
> Replacing five this_cpu_read() by one this_cpu_ptr() makes
> the generated code smaller.

Does not for me, that is, the resulting asm is actually larger

You're quite right the loop went missing; no idea wth that compiler is
smoking (gcc-8.2 for me). In order to eliminate that loop it needs to
think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence)
will return the same value. But this_cpu_read() is an asm() statement,
it _should_ not assume such.

We assume that this_cpu_read() implies READ_ONCE() in a number of
locations, this really should not happen.

The reason it was written using this_cpu_read() is so that it can use
%gs: prefixed instructions and avoid ever loading that percpu offset and
doing manual address computation.

Let me prod at this with a sharp stick.