[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87369rla1y.fsf@toke.dk>
Date: Sun, 29 Mar 2020 12:39:21 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Andrii Nakryiko <andriin@...com>,
"David S. Miller" <davem@...emloft.net>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Lorenz Bauer <lmb@...udflare.com>,
Andrey Ignatov <rdna@...com>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
> On Sat, Mar 28, 2020 at 08:34:12PM +0100, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
>>
>> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
>> >>
>> >> No, I was certainly not planning to use that to teach libxdp to just
>> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
>> >> the point of this series is to allow libxdp to *avoid* replacing
>> >> something on the interface that it didn't put there itself.
>> >
>> > Exactly! "that it didn't put there itself".
>> > How are you going to do that?
>> > I really hope you thought it through and came up with magic.
>> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
>> > Please walk me step by step how do you think it's possible.
>>
>> I'm inspecting the BPF program itself to make sure it's compatible.
>> Specifically, I'm embedding a piece of metadata into the program BTF,
>> using Andrii's encoding trick that we also use for defining maps. So
>> xdp-dispatcher.c contains this[0]:
>>
>> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
>>
>> and libxdp will refuse to touch any program that it finds loaded on an
>> iface which doesn't have this, or which has a version number that is
>> higher than what the library understands.
>
> so libxdp will do:
> ifindex -> id of currently attached prog -> fd -> prog_info -> btf -> read map
> -> find "dispatcher_version"
> and then it will do replace_fd with new version of the dispatcher ?
> I see how this approach helps the second set of races (from fd into "dispatcher_version")
> when another libxdp is doing the same.
> But there is still a race in query->id->fd. Much smaller though.
You mean the program can disappear before the ID can be turned into an
fd? Yeah, I guess that can happen, but that can just be treated as a
failure that triggers the retry logic.
> In that sense replace_fd is a better behaved prog replacement than
> just calling bpf_set_link_xdp_fd() without XDP_FLAGS_UPDATE_IF_NOEXIST.
> But not much. The libxdp doesn't own the attachment.
> If replace_fd fails what libxdp is going to do?
> Try the whole thing from the beginning?
> ifindex -> id2 -> fd2 ...
Yes, this is predicated on a "retry on failure" logic.
> Say it succeeded.
> But the libxdp1 that won the first race has no clue that libxdp2
> retried and there is a different dispatcher prog there.
> So you'll add netlink notifiers for libxdp to watch ?
No, the idea is that the dispatchers are compatible. So app1 installs
dispatcher1 with sequence (prog1), then app2 installs dispatcher2 with
sequence (prog1,prog2) - or (prog2,prog1) depending on ordering.
> That would mean that some user space process has to be always running
> while typical firewall doesn't need any user space. The firewall.rpm can
> install its prog with all firewall rules, permanently link it to
> the interface and exit.
> But let's continue. So single libxdp daemon is now waiting for notifications
> or both libxdp1 and libxdp2 that are part of two firewalls that are
> being 'yum installed' are waiting for notifications?
> How fight between libxdp1 and libxdp2 to install what they want going
> to be resolved?
> If their versions are the same I think they will settle quickly
> since both libraries will see dispatcher prog with expected version number, right?
> What if versions are different? Older libxdp or newer libxdp suppose to give up?
> If libxdp2 is newer it will still be able to use older dispatcher prog
> that was installed by libxdp1, but it would need to disable all new
> user facing library features?
It will depend on what changes between versions, I guess. But yeah, I
don't think we can completely rule out that a "compatibility mode" may
be necessary at some point. This is orthogonal to how the programs are
being attached, though.
> I guess all that is acceptable behavior to some libxdp users.
I believe so.
>> > I'm saying that without bpf_link for xdp libxdp has no ability to
>> > identify an attachment that is theirs.
>>
>> Ah, so *that* was what you meant with "unique attachment". It never
>> occurred to me that answering this question ("is it my program?") was to
>> be a feature of bpf_link; I always assumed that would be a property of
>> the bpf_prog itself.
>>
>> Any reason what I'm describing above wouldn't work for you?
>
> I don't see how this is even apples to apples comparison.
> Racy query via id with sort-of "atomic" replacement and no ownership
> vs guaranteed attachment with exact ownership and no races.
No, I guess in your "management daemon" case the kernel-enforced
exclusivity does come in handy. And as I said, I can live with there
being two APIs as long as there's a reasonable way to override the
bpf_link "lock" :)
>> > I see two ways out of this stalemate:
>> > 1. assume that replace_fd extension landed and develop libxdp further
>> > into fully fledged library. May be not a complete library, but at least
>> > for few more weeks. If then you still think replace_fd is enough
>> > I'll land it.
>> > 2. I can land replace_fd now, but please don't be surprised that
>> > I will revert it several weeks from now when it's clear that
>> > it's not enough.
>> >
>> > Which one do you prefer?
>>
>> I prefer 2. Reverting if it does turn out that I'm wrong is fine. Heck,
>> in that case I'll even send the revert myself :)
>
> Ok. Applied.
Great, thanks!
-Toke
Powered by blists - more mailing lists