netdev - Re: [PATCH bpf-next 2/7] bpf, xdp: add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbgPqN8xKX5xpHBRMJSZkhz_BBzBg7r_FPRo=j3ZmLNUQ@mail.gmail.com>
Date:   Wed, 15 Jul 2020 13:54:31 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>,
        David Ahern <dsahern@...il.com>,
        Jakub Kicinski <kicinski@...com>, Andrey Ignatov <rdna@...com>,
        Takshak Chahande <ctakshak@...com>
Subject: Re: [PATCH bpf-next 2/7] bpf, xdp: add bpf_link-based XDP attachment API

On Wed, Jul 15, 2020 at 8:48 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@...il.com> writes:
>
> >> Yup, that was helpful, thanks! I think our difference of opinion on this
> >> stems from the same place as our disagreement about point 2.: You are
> >> assuming that an application that uses XDP sticks around and holds on to
> >> its bpf_link, while I'm assuming it doesn't (and so has to rely on
> >> pinning for any persistence). In the latter case you don't really gain
> >> much from the bpf_link auto-cleanup, and whether it's a prog fd or a
> >> link fd you go find in your bpffs doesn't make that much difference...
> >
> > Right. But if I had to pick just one implementation (prog fd-based vs
> > bpf_link), I'd stick with bpf_link because it is flexible enough to
> > "emulate" prog fd attachment (through BPF FS), but the same isn't true
> > about prog fd attachment emulating bpf_link. That's it. I really don't
> > enjoy harping on that point, but it feels to be silently dismissed all
> > the time based on one particular arrangement for particular existing
> > XDP flow.
>
> It can; kinda. But you introduce a dependency on bpffs that wasn't there
> before, and you end up with resources that are kept around in the kernel
> if the interface disappears (because they are still pinned). So I
> consider it a poor emulation.

Yes, it's not exactly 100% the same semantics.
It is possible with slight additions to API to support essentially
exactly the same semantics you want with prog attachment. E.g., we can
either have a flag at LINK_CREATE time, or a separate command (e.g.,
LINK_PIN or something), that would mark bpf_link as "sticky", bump
it's refcnt. What happens then is that even if last FD is closed,
there is still refcnt 1 there, and then there are two ways to detach
that link:

1) interface/cgroup/whatever is destroyed and bpf_link is
auto-detached. At that point auto-detach handler will see that it's a
"sticky" bpf_link, will decrement refcnt and subsequently free
bpf_link kernel object (unless some application still has FD open, of
course).

2) a new LINK_DESTROY BPF command will be introduced, which will only
work with "sticky" bpf_links. It will decrement refcnt and do the same
stuff as the auto-detach handler does today (so bpf_link->dev = NULL,
for XDP link).

I don't mind this, as long as this is not a default semantics and
require conscious opt-in from whoever creates the link.

>
> >> >> >> I was under the impression that forcible attachment of bpf_links was
> >> >> >> already possible, but looking at the code now it doesn't appear to be?
> >> >> >> Wasn't that the whole point of BPF_LINK_GET_FD_BY_ID? I.e., that a
> >> >> >> sysadmin with CAP_SYS_ADMIN privs could grab the offending bpf_link FD
> >> >> >> and force-remove it? I certainly think this should be added before we
> >> >> >> expand bpf_link usage any more...
> >> >> >
> >> >> > I still maintain that killing processes that installed the bpf_link is
> >> >> > the better approach. Instead of letting the process believe and act as
> >> >> > if it has an active XDP program, while it doesn't, it's better to
> >> >> > altogether kill/restart the process.
> >> >>
> >> >> Killing the process seems like a very blunt tool, though. Say it's a
> >> >> daemon that attaches XDP programs to all available interfaces, but you
> >> >> want to bring down an interface for some one-off maintenance task, but
> >> >> the daemon authors neglected to provide an interface to tell the daemon
> >> >> to detach from specific interfaces. If your only option is to kill the
> >> >> daemon, you can't bring down that interface without disrupting whatever
> >> >> that daemon is doing with XDP on all the other interfaces.
> >> >>
> >> >
> >> > I'd rather avoid addressing made up hypothetical cases, really. Get
> >> > better and more flexible daemon?
> >>
> >> I know you guys don't consider an issue to be real until it has already
> >> crashed something in production. But some of us don't have the luxury of
> >> your fast issue-discovered-to-fix-shipped turnaround times; so instead
> >> we have to go with the old-fashioned way of thinking about how things
> >> can go wrong ahead of time, and making sure we're prepared to handle
> >> issues if (or, all too often, when) they occur. And it's frankly
> >> tiresome to have all such efforts be dismissed as "made up
> >> hypotheticals". Please consider that the whole world does not operate
> >> like your org, and that there may be legitimate reasons to do things
> >> differently.
> >>
> >
> > Having something that breaks production is a very hard evidence of a
> > problem and makes it easier to better understand a **real** issue well
> > and argue why something has to be solved or prevented. But it's not a
> > necessary condition, of course. It's just that made up hypotheticals
> > aren't necessarily good ways to motivate a feature, because all too
> > often it turns out to be just that, hypothetical issue, while the real
> > issue is different enough to warrant a different and better solution.
> > By being conservative with adding features, we are trying to not make
> > unnecessary design and API mistakes, because in the kernel they are
> > here to stay forever.
>
> I do appreciate the need to be conservative with the interfaces we add,
> and I am aware of the maintenance burden. And it's not like I want
> contingencies for any hypothetical I can think of put into the kernel
> ahead of time (talk about a never-ending process :)). What I'm asking
> for is just that something be argued on its merits, instead of
> *automatically* being dismissed as "hypothetical". I.e., the difference
> between "that can be handled by..." or "that is not likely to occur
> because...", as opposed to "that has not happened yet, so come back when
> it does".
>
> > In your example, I'd argue that the design of that daemon is bad if it
> > doesn't allow you to control what's attached where, and how to detach
> > it without breaking completely independent network interfaces. That
> > should be the problem that has to be solved first, IMO. And just
> > because in some scenarios it might be **convenient** to force-detach
> > bpf_link, is in no way a good justification (in my book) to add that
> > feature, especially if there are other (and arguably safer) ways to
> > achieve the same **troubleshooting** effect.
>
> See this is actually what was I asking for - considering a question on
> its merits; so thank you!

Honestly, I was hoping this is more or less obvious, which is why I
didn't go into a lengthy explanation originally. There is only so much
time in a day, unfortunately. So "hypothetical" example was more of
"it doesn't happen today, but when it happens, it should be solved
differently", in my opinion, of course. But anyways, as you say, we
can leave this aside, no need to waste more time on this.

> And yeah, it would be a poorly designed
> daemon, but I'm not quite convinced that such daemons won't show up and
> be put into production by, say, someone running an enterprise operating
> system :)
>
> Anyway, let's leave that particular issue aside for now, and I'll circle
> back to adding the force-remove if needed once I've thought this over a
> bit more.
>
> >> That being said...
> >>
> >> > This force-detach functionality isn't hard to add, but so far we had
> >> > no real reason to do that. Once we do have such use cases, we can add
> >> > it, if we agree it's still a good idea.
> >>
> >> ...this is fair enough, and I guess I should just put this on my list of
> >> things to add. I was just somehow under the impression that this had
> >> already been added.
> >>
> >
> > So, overall, do I understand correctly that you are in principle not
> > opposed to adding BPF XDP link, or you still have some concerns that
> > were not addressed?
>
> Broadly speaking yeah, I am OK with adding this as a second interface. I
> am still concerned about the "locking in place" issued we discussed
> above, but that can be addressed later. I am also a little concerned

Right, see above the "extensions" I outlined to make those bpf_links
have auto-cleanup semantics, similar to prog attachment today.

> about adding a second interface for configuring an interface that has to
> take the RTNL lock etc; but those are mostly details, I suppose.
>
> Unfortunately I won't have to review the latest version of your patch
> set to look at those details before I go on vacation (sorry about that :/).
> I guess I'll just have to leave that to you guys to take care of...
>

Sure, thanks, enjoy your vacation! I'll post v3 then with build fixes
I have so far.

> -Toke
>