lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMe9rOotJ+s0nR58Si3F_X8V6OcZKB8+q8+wORQ7C1YT1Nx+DQ@mail.gmail.com>
Date:   Thu, 28 Feb 2019 10:09:43 -0800
From:   "H.J. Lu" <hjl.tools@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     David Woodhouse <dwmw2@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, bjorn.topel@...el.com,
        David Miller <davem@...emloft.net>, brouer@...hat.com,
        magnus.karlsson@...el.com, Andy Lutomirski <luto@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, ast@...nel.org,
        linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/build] x86, retpolines: Raise limit for generating
 indirect calls from switch-case

On Thu, Feb 28, 2019 at 9:58 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 02/28/2019 05:25 PM, H.J. Lu wrote:
> > On Thu, Feb 28, 2019 at 8:18 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> >> On 02/28/2019 01:53 PM, H.J. Lu wrote:
> >>> On Thu, Feb 28, 2019 at 3:27 AM David Woodhouse <dwmw2@...radead.org> wrote:
> >>>> On Thu, 2019-02-28 at 03:12 -0800, tip-bot for Daniel Borkmann wrote:
> >>>>> Commit-ID:  ce02ef06fcf7a399a6276adb83f37373d10cbbe1
> >>>>> Gitweb:     https://git.kernel.org/tip/ce02ef06fcf7a399a6276adb83f37373d10cbbe1
> >>>>> Author:     Daniel Borkmann <daniel@...earbox.net>
> >>>>> AuthorDate: Thu, 21 Feb 2019 23:19:41 +0100
> >>>>> Committer:  Thomas Gleixner <tglx@...utronix.de>
> >>>>> CommitDate: Thu, 28 Feb 2019 12:10:31 +0100
> >>>>>
> >>>>> x86, retpolines: Raise limit for generating indirect calls from switch-case
> >>>>>
> >>>>> From networking side, there are numerous attempts to get rid of indirect
> >>>>> calls in fast-path wherever feasible in order to avoid the cost of
> >>>>> retpolines, for example, just to name a few:
> >>>>>
> >>>>>   * 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
> >>>>>   * aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
> >>>>>   * 028e0a476684 ("net: use indirect call wrappers at GRO transport layer")
> >>>>>   * 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
> >>>>>   * 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps")
> >>>>>   * 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions")
> >>>>>   [...]
> >>>>>
> >>>>> Recent work on XDP from Björn and Magnus additionally found that manually
> >>>>> transforming the XDP return code switch statement with more than 5 cases
> >>>>> into if-else combination would result in a considerable speedup in XDP
> >>>>> layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
> >>>>> builds.
> >>>>
> >>>> +HJL
> >>>>
> >>>> This is a GCC bug, surely? It should know how expensive each
> >>>> instruction is, and choose which to use accordingly. That should be
> >>>> true even when the indirect branch "instruction" is a retpoline, and
> >>>> thus enormously expensive.
> >>>>
> >>>> I believe this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952 so
> >>>> please at least reference that bug, and be prepared to turn this hack
> >>>> off when GCC is fixed.
> >>>
> >>> We couldn't find a testcase to show jump table with indirect branch
> >>> is slower than direct branches.
> >>
> >> Ok, I've just checked https://github.com/marxin/microbenchmark/tree/retpoline-table
> >> with the below on top.
> >>
> >>  Makefile | 6 +++---
> >>  switch.c | 2 +-
> >>  test.c   | 6 ++++--
> >>  3 files changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Makefile b/Makefile
> >> index bd83233..ea81520 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -1,16 +1,16 @@
> >>  CC=gcc
> >>  CFLAGS=-g -I.
> >> -CFLAGS+=-O2 -mindirect-branch=thunk
> >> +CFLAGS+=-O2 -mindirect-branch=thunk-inline -mindirect-branch-register
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > Does slowdown show up only with -mindirect-branch=thunk-inline?
>
> Not really, numbers are in similar range / outcome. Additionally, I also tried
> on a bit bigger machine (Xeon Gold 5120 this time). First is thunk-inline, second
> is thunk, and third is w/o raising limit for comparison; first test (from last
> mail) on that machine:

Please re-open:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

with new info.

-- 
H.J.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ