linux-kernel - Re: Percpu variables, benchmarking, and performance weirdness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191220103420.6f9304ab@carbon>
Date:   Fri, 20 Dec 2019 10:34:20 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     bpf <bpf@...r.kernel.org>, brouer@...hat.com,
        LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        Dennis Zhou <dennis@...nel.org>
Subject: Re: Percpu variables, benchmarking, and performance weirdness

On Fri, 20 Dec 2019 09:25:43 +0100
Björn Töpel <bjorn.topel@...il.com> wrote:

> I've been doing some benchmarking with AF_XDP, and more specific the
> bpf_xdp_redirect_map() helper and xdp_do_redirect(). One thing that
> puzzles me is that the percpu-variable accesses stands out.
> 
> I did a horrible hack that just accesses a regular global variable,
> instead of the percpu struct bpf_redirect_info, and got a performance
> boost from 22.7 Mpps to 23.8 Mpps with the rxdrop scenario from
> xdpsock.

Yes, this an 2 ns overhead, which is annoying in XDP context.
 (1/22.7-1/23.8)*1000 = 2 ns

> Have anyone else seen this?

Yes, I see it all the time...

> So, my question to the uarch/percpu folks out there: Why are percpu
> accesses (%gs segment register) more expensive than regular global
> variables in this scenario.

I'm also VERY interested in knowing the answer to above question!?
(Adding LKML to reach more people)


> One way around that is changing BPF_PROG_RUN, and BPF_CALL_x to pass a
> context (struct bpf_redirect_info) explicitly, and access that instead
> of doing percpu access. That would be a pretty churny patch, and
> before doing that it would be nice to understand why percpu stands out
> performance-wise.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer