netdev - Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <768c85a3-d307-28bb-761f-9cda9b414aae@iogearbox.net>
Date:   Wed, 5 Oct 2022 20:21:20 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     sdf@...gle.com
Cc:     bpf@...r.kernel.org, razor@...ckwall.org, ast@...nel.org,
        andrii@...nel.org, martin.lau@...ux.dev, john.fastabend@...il.com,
        joannelkoong@...il.com, memxor@...il.com, toke@...hat.com,
        joe@...ium.io, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc
 BPF programs

On 10/5/22 7:56 PM, sdf@...gle.com wrote:
> On 10/05, Daniel Borkmann wrote:
>> On 10/5/22 2:55 AM, sdf@...gle.com wrote:
>> > On 10/05, Daniel Borkmann wrote:
>> [...]
>> >
>> > The series looks exciting, haven't had a chance to look deeply, will try
>> > to find some time this week.
> 
>> Great, thanks!
> 
>> > We've chatted briefly about priority during the talk, let's maybe discuss
>> > it here more?
>> >
>> > I, as a user, still really have no clue about what priority to use.
>> > We have this problem at tc, and we'll seemingly have the same problem
>> > here? I guess it's even more relevant in k8s because internally at G we
>> > can control the users.
>> >
>> > Is it worth at least trying to provide some default bands / guidance?
>> >
>> > For example, having SEC('tc/ingress') receive attach_priority=124 by
>> > default? Maybe we can even have something like 'tc/ingress_first' get
>> > attach_priority=1 and 'tc/ingress_last' with attach_priority=254?
>> > (the names are arbitrary, we can do something better)
>> >
>> > ingress_first/ingress_last can be used by some monitoring jobs. The rest
>> > can use default 124. If somebody really needs a custom priority, then they
>> > can manually use something around 124/2 if they need to trigger before the
>> > 'default' priority or 124+124/2 if they want to trigger after?
>> >
>> > Thoughts? Is it worth it? Do we care?
> 
>> I think guidance is needed, yes, I can add a few paragraphs to the libbpf
>> header file where we also have the tc BPF link API. I had a brief discussion
>> around this also with developers from datadog as they also use the infra
>> via tc BPF. Overall, its a hard problem, and I don't think there's a good
>> generic solution. The '*_last' is implied by prio=0, so that kernel auto-
>> allocates it, and for libbpf we could add an API for it where the user
>> does not need to specify prio specifically. The 'appending' is reasonable
>> to me given if an application explicitly requests to be added as first
>> (and e.g. enforces policy through tc BPF), but some other 3rd party application
>> prepends itself as first, it can bypass the former, which would be too easy
>> to shoot yourself in the foot. Overall the issue in tc land is that ordering
>> matters, skb packet data could be mangled (e.g. IPs NATed), skb fields can
>> be mangled, and we can have redirect actions (dev A vs. B); the only way I'd
>> see were this is possible if somewhat verifier can annotate the prog when
>> it didn't observe any writes to skb, and no redirect was in play. Then you've
>> kind of replicated the constraints similar to tracing where the attachment
>> can say that ordering doesn't matter if all the progs are in same style.
>> Otherwise, explicit corporation is needed as is today with rest of tc (or
>> as Toke did in libxdp) with multi-attach. In the specific case I mentioned
>> at LPC, it can be made to work given one of the two is only observing traffic
>> at the layer, e.g. it could get prepended if there is guarantee that all
>> return codes are tc_act_unspec so that there is no bypass and then you'll
>> see all traffic or appended to see only traffic which made it past the
>> policy. So it all depends on the applications installing programs, but to
>> solve it generically is not possible given ordering and conflicting actions.
>> So, imho, an _append() API for libbpf can be added along with guidance for
>> developers when to use _append() vs explicit prio.
> 
> Agreed, it's a hard problem to solve, especially from the kernel side.
> Ideally, as Toke mentions on the side thread, there should be some kind
> of system daemon or some other place where the ordering is described.
> But let's start with at least some guidance on the current prio.
> 
> Might be also a good idea to narrow down the prio range to 0-65k for
> now? Maybe in the future we'll have some special PRIO_MONITORING_BEFORE_ALL
> and PRIO_MONITORING_AFTER_ALL that trigger regardless of TC_ACT_UNSPEC?
> I agree with Toke that it's another problem with the current action based
> chains that's worth solving somehow (compared to, say, cgroup programs).

Makes sense, I'll restrict the range so there's headroom for future
extensions, the mentioned 0-65k looks very reasonable to me.

Thanks,
Daniel