lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 16 Feb 2024 12:17:28 +0000
From: Asbjørn Sloth Tønnesen <ast@...erby.net>
To: Marcelo Ricardo Leitner <mleitner@...hat.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
 <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
 Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, llu@...erby.dk
Subject: Re: [PATCH net-next 0/3] make skip_sw actually skip software

Hi Marcelo,

On 2/15/24 18:00, Marcelo Ricardo Leitner wrote:
> On Thu, Feb 15, 2024 at 04:04:41PM +0000, Asbjørn Sloth Tønnesen wrote:
> ...
>> Since we use TC flower offload for the hottest
>> prefixes, and leave the long tail to Linux / the CPU.
>> we therefore need both the hardware and software
>> datapath to perform well.
>>
>> I found that skip_sw rules, are quite expensive
>> in the kernel datapath, sice they must be evaluated
>> and matched upon, before the kernel checks the
>> skip_sw flag.
>>
>> This patchset optimizes the case where all rules
>> are skip_sw.
> 
> The talk is interesting. Yet, I don't get how it is set up.
> How do you use a dedicated block for skip_sw, and then have a
> catch-all on sw again please?

Bird installs the DFZ Internet routing table into the main kernel table
for the software datapath.

Bird also installs a subset of routing table into an aux. kernel table.

flower-route then picks up the routes from the aux. kernel table, and
installs them as TC skip_sw filters.

On these machines we don't have any non-skip_sw TC filters.

Since 2021, we have statically offloaded all inbound traffic, since
nexthop for our IP space is always the switch next to it, which does
interior L3 routing. Thereby we could offload ~50% of the packets.

I have put an example of the static script here:
https://files.fiberby.net/ast/2024/tc_skip_sw/mlx5_static_offload.sh

And `tc filter show dev enp5s0f0np0 ingress` after running the script:
https://files.fiberby.net/ast/2024/tc_skip_sw/mlx_offload_demo_tc_dump.txt


> I'm missing which traffic is being matched against the sw datapath. In
> theory, you have all the heavy duty filters offloaded, so the sw
> datapath should be seeing only a few packets, right?

We are an residential ISP, our traffic is therefore residential Internet
traffic, we run the BGP routers as a router on a stick, the filters therefore
see both inbound and outbound traffic.

~50% of packets are inbound traffic, our own prefixes are therefore the
hottest prefixes. Most streaming traffic is handled internally, and is
therefore not seen on our core routers. We regularly have 5%-10% of all
outbound traffic going towards the same prefix, and have 50% of outbound
traffic distributed across just a few prefixes.

We currently only offload our own prefixes, and a select few other known
high-traffic prefixes.

The goal is to offload the majority of the trafic, but it is still early
days for flower-route, and I need to implement some smarter chain layout
first and dynamic filter placement based on hardware counters.

Even when I get flower-route to offload almost all traffic, there will still
be a long tail of prefixes not in hardware, so the kernel still needs
to not be pulled down by the offloaded filters.

-- 
Best regards
Asbjørn Sloth Tønnesen
Network Engineer
Fiberby - AS42541

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ