[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLSVPwjEi2ZFUCqUXV05LqY3Jx5qbQCEiayo_s-UZHcAw@mail.gmail.com>
Date: Tue, 22 Jan 2019 14:40:30 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Tejun Heo <tj@...nel.org>
Cc: Vlad Buslov <vladbu@...lanox.com>, Dennis Zhou <dennis@...nel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Yevgeny Kliteynik <kliteyn@...lanox.com>,
Yossef Efraim <yossefe@...lanox.com>,
Maor Gottlieb <maorg@...lanox.com>
Subject: Re: tc filter insertion rate degradation
On Tue, Jan 22, 2019 at 1:18 PM Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
> Percpu storage is expensive and cache line sharing tends to be less of
> a problem (cuz they're per-cpu), so it is useful to support custom
> alignments for tighter packing.
>
We have BPF percpu maps of two 8-byte counters (packets and bytes
counter), with millions of slots.
We update the pair for every packet sent on the hosts.
BPF uses an alignment of 8 (that can not be changed/tuned, at least
all call sites from kernel/bpf/hashtab.c )
If we are lucky, all these pairs are allocated using a single cache line.
But when we are not lucky, 25% of the pairs are crossing a cache line,
reducing performance under DDOS.
Using a nicer alignment in our case does not consume more ram, and we
did not notice
extra cost of per-cpu allocations because we keep them in the slow
path (control path)
Powered by blists - more mailing lists