lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181011073133.GZ5663@hirez.programming.kicks-ass.net>
Date:   Thu, 11 Oct 2018 09:31:33 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()

On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote:
> While looking at native_sched_clock() disassembly I had
> the surprise to see the compiler (gcc 7.3 here) had
> optimized out the loop, meaning the code is broken.
> 
> Using the documented and approved API not only fixes the bug,
> it also makes the code more readable.
> 
> Replacing five this_cpu_read() by one this_cpu_ptr() makes
> the generated code smaller.

Does not for me, that is, the resulting asm is actually larger

You're quite right the loop went missing; no idea wth that compiler is
smoking (gcc-8.2 for me). In order to eliminate that loop it needs to
think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence)
will return the same value. But this_cpu_read() is an asm() statement,
it _should_ not assume such.

We assume that this_cpu_read() implies READ_ONCE() in a number of
locations, this really should not happen.

The reason it was written using this_cpu_read() is so that it can use
%gs: prefixed instructions and avoid ever loading that percpu offset and
doing manual address computation.

Let me prod at this with a sharp stick.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ