linux-kernel - Re: [PATCH net-next 3/3] net: sched: make skip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <28bf1467-b7ce-4e36-a4ef-5445f65edd97@fiberby.net>
Date: Fri, 16 Feb 2024 14:01:06 +0000
From: Asbjørn Sloth Tønnesen <ast@...erby.net>
To: Vlad Buslov <vladbu@...dia.com>
Cc: Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
 <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
 Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, llu@...erby.dk
Subject: Re: [PATCH net-next 3/3] net: sched: make skip_sw actually skip
 software

Hi Vlad,

On 2/16/24 08:47, Vlad Buslov wrote:
> On Thu 15 Feb 2024 at 16:04, Asbjørn Sloth Tønnesen <ast@...erby.net> wrote:
>> TC filters come in 3 variants:
>> - no flag (no opinion, process wherever possible)
>> - skip_hw (do not process filter by hardware)
>> - skip_sw (do not process filter by software)
>>
>> However skip_sw is implemented so that the skip_sw
>> flag can first be checked, after it has been matched.
>>
>> IMHO it's common when using skip_sw, to use it on all rules.
>>
>> So if all filters in a block is skip_sw filters, then
>> we can bail early, we can thus avoid having to match
>> the filters, just to check for the skip_sw flag.
>>
>>   +----------------------------+--------+--------+--------+
>>   | Test description           | Pre    | Post   | Rel.   |
>>   |                            | kpps   | kpps   | chg.   |
>>   +----------------------------+--------+--------+--------+
>>   | basic forwarding + notrack | 1264.9 | 1277.7 |  1.01x |
>>   | switch to eswitch mode     | 1067.1 | 1071.0 |  1.00x |
>>   | add ingress qdisc          | 1056.0 | 1059.1 |  1.00x |
>>   +----------------------------+--------+--------+--------+
>>   | 1 non-matching rule        |  927.9 | 1057.1 |  1.14x |
>>   | 10 non-matching rules      |  495.8 | 1055.6 |  2.13x |
>>   | 25 non-matching rules      |  280.6 | 1053.5 |  3.75x |
>>   | 50 non-matching rules      |  162.0 | 1055.7 |  6.52x |
>>   | 100 non-matching rules     |   87.7 | 1019.0 | 11.62x |
>>   +----------------------------+--------+--------+--------+
>>
>> perf top (100 n-m skip_sw rules - pre patch):
>>    25.57%  [kernel]  [k] __skb_flow_dissect
>>    20.77%  [kernel]  [k] rhashtable_jhash2
>>    14.26%  [kernel]  [k] fl_classify
>>    13.28%  [kernel]  [k] fl_mask_lookup
>>     6.38%  [kernel]  [k] memset_orig
>>     3.22%  [kernel]  [k] tcf_classify
>>
>> perf top (100 n-m skip_sw rules - post patch):
>>     4.28%  [kernel]  [k] __dev_queue_xmit
>>     3.80%  [kernel]  [k] check_preemption_disabled
>>     3.68%  [kernel]  [k] nft_do_chain
>>     3.08%  [kernel]  [k] __netif_receive_skb_core.constprop.0
>>     2.59%  [kernel]  [k] mlx5e_xmit
>>     2.48%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
>>
>> Test setup:
>>   DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
>>   Data rate measured on switch (Extreme X690), and DUT connected as
>>   a router on a stick, with pktgen and pktsink as VLANs.
>>   Pktgen was in range 12.79 - 12.95 Mpps across all tests.
>>
>> Signed-off-by: Asbjørn Sloth Tønnesen <ast@...erby.net>
>> ---
>>   include/net/pkt_cls.h | 5 +++++
>>   net/core/dev.c        | 3 +++
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index a4ee43f493bb..a065da4df7ff 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -74,6 +74,11 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
>>   	return block && block->index;
>>   }
>>   
>> +static inline bool tcf_block_has_skip_sw_only(struct tcf_block *block)
>> +{
>> +	return block && atomic_read(&block->filtercnt) == atomic_read(&block->skipswcnt);
>> +}
> 
> Note that this introduces a read from heavily contended cache-line on
> data path for all classifiers, including the ones that don't support
> offloads. Wonder if this a concern for users running purely software tc.

Unfortunately, I don't have access to any multi-CPU machines, so I haven't been
able to test the impact of that.

Alternatively I guess I could also maintain a static key in the counter update logic.


>> +
>>   static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
>>   {
>>   	WARN_ON(tcf_block_shared(block));
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index d8dd293a7a27..7cd014e5066e 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3910,6 +3910,9 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
>>   	if (!miniq)
>>   		return ret;
>>   
>> +	if (tcf_block_has_skip_sw_only(miniq->block))
>> +		return ret;
>> +
>>   	tc_skb_cb(skb)->mru = 0;
>>   	tc_skb_cb(skb)->post_ct = false;
>>   	tcf_set_drop_reason(skb, *drop_reason);
> 

-- 
Best regards
Asbjørn Sloth Tønnesen
Network Engineer
Fiberby - AS42541