[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200625085745.GD117543@hirez.programming.kicks-ass.net>
Date: Thu, 25 Jun 2020 10:57:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Nick Desaulniers <ndesaulniers@...gle.com>
Cc: Sami Tolvanen <samitolvanen@...gle.com>,
Masahiro Yamada <masahiroy@...nel.org>,
Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Kees Cook <keescook@...omium.org>,
clang-built-linux <clang-built-linux@...glegroups.com>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
linux-arch <linux-arch@...r.kernel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Linux Kbuild mailing list <linux-kbuild@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, linux-pci@...r.kernel.org,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>
Subject: Re: [PATCH 00/22] add support for Clang LTO
On Thu, Jun 25, 2020 at 10:24:33AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2020 at 10:03:13AM +0200, Peter Zijlstra wrote:
> > I'm sure Will will respond, but the basic issue is the trainwreck C11
> > made of dependent loads.
> >
> > Anyway, here's a link to the last time this came up:
> >
> > https://lore.kernel.org/linux-arm-kernel/20171116174830.GX3624@linux.vnet.ibm.com/
>
> Another good read:
>
> https://lore.kernel.org/lkml/20150520005510.GA23559@linux.vnet.ibm.com/
>
> and having (partially) re-read that, I now worry intensily about things
> like latch_tree_find(), cyc2ns_read_begin, __ktime_get_fast_ns().
>
> It looks like kernel/time/sched_clock.c uses raw_read_seqcount() which
> deviates from the above patterns by, for some reason, using a primitive
> that includes an extra smp_rmb().
>
> And this is just the few things I could remember off the top of my head,
> who knows what else is out there.
As an example, let us consider __ktime_get_fast_ns(), the critical bit
is:
seq = raw_read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01);
now = tkr->base;
And we hard rely on that being a dependent load, so:
LOAD seq, (tkf->seq)
LOAD tkr, tkf->base
AND seq, 1
MUL seq, sizeof(tk_read_base)
ADD tkr, seq
LOAD now, (tkr->base)
Such that we obtain 'now' as a direct dependency on 'seq'. This ensures
the loads are ordered.
A compiler can wreck this by translating it into something like:
LOAD seq, (tkf->seq)
LOAD tkr, tkf->base
AND seq, 1
CMP seq, 0
JE 1f
ADD tkr, sizeof(tk_read_base)
1:
LOAD now, (tkr->base)
Because now the machine can speculate and load now before seq, breaking
the ordering.
Powered by blists - more mailing lists