linux-kernel - RE: [PATCH 01/29] iov_iter: Switch to using a table of operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ba98abf0ddb4f16af7166db201fe9c1@AcuMS.aculab.com>
Date:   Sun, 22 Nov 2020 22:34:37 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Linus Torvalds' <torvalds@...ux-foundation.org>,
        David Howells <dhowells@...hat.com>
CC:     Pavel Begunkov <asml.silence@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jens Axboe <axboe@...nel.dk>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 01/29] iov_iter: Switch to using a table of operations

From: Linus Torvalds
> Sent: 22 November 2020 19:22
> Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations
> 
> On Sun, Nov 22, 2020 at 5:33 AM David Howells <dhowells@...hat.com> wrote:
> >
> > I don't know enough about how spectre v2 works to say if this would be a
> > problem for the ops-table approach, but wouldn't it also affect the chain of
> > conditional branches that we currently use, since it's branch-prediction
> > based?
> 
> No, regular conditional branches aren't a problem. Yes, they may
> mispredict, but outside of a few very rare cases that we handle
> specially, that's not an issue.
> 
> Why? Because they always mispredict to one or the other side, so the
> code flow may be mis-predicted, but it is fairly controlled.
> 
> In contrast, an indirect jump can mispredict the target, and branch
> _anywhere_, and the attack vectors can poison the BTB (branch target
> buffer), so our mitigation for that is that every single indirect
> branch isn't predicted at all (using "retpoline").
> 
> So a conditional branch takes zero cycles when predicted (and most
> will predict quite well). And as David Laight pointed out a compiler
> can also turn a series of conditional branches into a tree, means that
> N conditional branches basically only needs log2(N) conditionals
> executed.

The compiler can convert a switch statement into a branch tree.
But I don't think it can convert the 'if chain' in the current code
to one.

There is also the problem that some x86 cpu can't predict branches
if too many happen in the same cache line (or similar).

> In contrast, with retpoline in place, an indirect branch will
> basically always take something like 25-30 cycles, because it always
> mispredicts.

I also wonder if a retpoline also trashes the return stack optimisation.
(If that is ever really a significant gain for real functions.)
 
...
> So this is not in any way "indirect branches are bad". It's more of a
> "indirect branches really aren't necessarily better than a couple of
> conditionals, and _may_ be much worse".

Even without retpolines, the jump table is likely to a data-cache
miss (and maybe a TLB miss) unless you are running hot-cache.
That is probably an extra cache miss on top of the I-cache ones.
Even worse if you end up with the jump table near the code
since the data cache line and TLB might never be shared.

So a very short switch statement is likely to be better as
conditional jumps anyway.

> For example, look at this gcc bugzilla:
> 
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
> 
> which basically is about the compiler generating a jump table (is a
> single indirect branch) vs a series of conditional branches. With
> retpoline, the cross-over point is basically when you need to have
> over 10 conditional branches - and because of the log2(N) behavior,
> that's around a thousand cases!

That was a hot-cache test.
Cold-cache is likely to favour the retpoline a little sooner.
(And the retpoline (probbaly) won't be (much) worse than the
mid-predicted indirect jump.

I do wonder how much of the kernel actually runs hot-cache?
Except for parts that explicitly run things in bursts.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)