netdev - Re: [PATCH net-next 1/1] net/sched: Disambiguate verdict from return code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1df2e804-5d58-026c-5daa-413a3605c129@iogearbox.net>
Date: Fri, 29 Sep 2023 17:48:19 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Victor Nogueira <victor@...atatu.com>, xiyou.wangcong@...il.com,
 jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, paulb@...dia.com, netdev@...r.kernel.org,
 kernel@...atatu.com, martin.lau@...ux.dev, bpf@...r.kernel.org
Subject: Re: [PATCH net-next 1/1] net/sched: Disambiguate verdict from return
 code

On 9/26/23 1:01 AM, Jamal Hadi Salim wrote:
> On Fri, Sep 22, 2023 at 4:12 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 9/20/23 1:20 AM, Jamal Hadi Salim wrote:
>>> On Tue, Sep 19, 2023 at 6:15 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>>>> On 9/19/23 4:59 PM, Victor Nogueira wrote:
[...]
>>
>> In the above case we don't have 'internal' errors which you want to trace, so I would
>> also love to avoid the cost of zeroing struct tcf_result res which should be 3x 8b for
>> every packet.
> 
> We can move the zeroing inside tc_run() but we declare it in the same
> spot as we do right now. You will still need to set res.verdict as
> above.
> Would that work for you?

What I'm not following is that with the below you can avoid the unnecessary
fast path cost (which is only for corner case which is almost never hit) and
get even better visibility. Are you saying it doesn't work?

>> I was more thinking like something below could be a better choice. I presume your main
>> goal is to trace where these errors originated in the first place, so it might even be
>> useful to capture the actual return code as well.
> 
> The main motivation is a few syzkaller bugs which resulted in not
> disambiguating between errors being returned and sometimes
> TC_ACT_SHOT.
> 
>> Then you can use perf script, bpf and whatnot to gather further insights into what
>> happened while being less invasive and avoiding the need to extend struct tcf_result.
> 
> We could use trace instead - the reason we have the skb reason is
> being used in the other spots (does this trace require ebpf to be
> usable?).

No you can just use regular perf by attaching to the tracepoint, no need for using
bpf at all here.

>> This would be quite similar to trace_xdp_exception() as well, and I think you can guarantee
>> that in fast path all errors are < TC_ACT_UNSPEC anyway.
> 
> I am not sure i followed. 0 means success, result codes are returned in res now.

What I was saying is that you don't need the struct change from the patch, but only
the changes where you rework TC_ACT_SHOT into one of the -E<errors>, and then with
the below you can pass this through an exception tracepoint.

>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 85df22f05c38..4089d195144d 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3925,6 +3925,10 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb)
>>
>>          mini_qdisc_bstats_cpu_update(miniq, skb);
>>          ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false);
>> +       if (unlikely(ret < TC_ACT_UNSPEC)) {
>> +               trace_tc_exception(skb->dev, skb->tc_at_ingress, ret);
>> +               ret = TC_ACT_SHOT;
>> +       }
>>          /* Only tcf related quirks below. */
>>          switch (ret) {
>>          case TC_ACT_SHOT:

Thanks,
Daniel