netdev - Re: Redux: Backwards compatibility for XDP multi-buff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fstx37bn.fsf@toke.dk>
Date:   Wed, 22 Sep 2021 00:14:04 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Zvi Effron <zeffron@...tgames.com>
Cc:     Lorenz Bauer <lmb@...udflare.com>,
        Lorenzo Bianconi <lbianconi@...hat.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        netdev@...r.kernel.org, bpf <bpf@...r.kernel.org>
Subject: Re: Redux: Backwards compatibility for XDP multi-buff

Zvi Effron <zeffron@...tgames.com> writes:

> On Tue, Sep 21, 2021 at 11:23 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>
>> Zvi Effron <zeffron@...tgames.com> writes:
>>
>> > On Tue, Sep 21, 2021 at 9:06 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>> >>
>> >> Hi Lorenz (Cc. the other people who participated in today's discussion)
>> >>
>> >> Following our discussion at the LPC session today, I dug up my previous
>> >> summary of the issue and some possible solutions[0]. Seems no on
>> >> actually replied last time, which is why we went with the "do nothing"
>> >> approach, I suppose. I'm including the full text of the original email
>> >> below; please take a look, and let's see if we can converge on a
>> >> consensus here.
>> >>
>> >> First off, a problem description: If an existing XDP program is exposed
>> >> to an xdp_buff that is really a multi-buffer, while it will continue to
>> >> run, it may end up with subtle and hard-to-debug bugs: If it's parsing
>> >> the packet it'll only see part of the payload and not be aware of that
>> >> fact, and if it's calculating the packet length, that will also only be
>> >> wrong (only counting the first fragment).
>> >>
>> >> So what to do about this? First of all, to do anything about it, XDP
>> >> programs need to be able to declare themselves "multi-buffer aware" (but
>> >> see point 1 below). We could try to auto-detect it in the verifier by
>> >> which helpers the program is using, but since existing programs could be
>> >> perfectly happy to just keep running, it probably needs to be something
>> >> the program communicates explicitly. One option is to use the
>> >> expected_attach_type to encode this; programs can then declare it in the
>> >> source by section name, or the userspace loader can set the type for
>> >> existing programs if needed.
>> >>
>> >> With this, the kernel will know if a given XDP program is multi-buff
>> >> aware and can decide what to do with that information. For this we came
>> >> up with basically three options:
>> >>
>> >> 1. Do nothing. This would make it up to users / sysadmins to avoid
>> >>    anything breaking by manually making sure to not enable multi-buffer
>> >>    support while loading any XDP programs that will malfunction if
>> >>    presented with an mb frame. This will probably break in interesting
>> >>    ways, but it's nice and simple from an implementation PoV. With this
>> >>    we don't need the declaration discussed above either.
>> >>
>> >> 2. Add a check at runtime and drop the frames if they are mb-enabled and
>> >>    the program doesn't understand it. This is relatively simple to
>> >>    implement, but it also makes for difficult-to-understand issues (why
>> >>    are my packets suddenly being dropped?), and it will incur runtime
>> >>    overhead.
>> >>
>> >> 3. Reject loading of programs that are not MB-aware when running in an
>> >>    MB-enabled mode. This would make things break in more obvious ways,
>> >>    and still allow a userspace loader to declare a program "MB-aware" to
>> >>    force it to run if necessary. The problem then becomes at what level
>> >>    to block this?
>> >>
>> >
>> > I think there's another potential problem with this as well: what happens to
>> > already loaded programs that are not MB-aware? Are they forcibly unloaded?
>>
>> I'd say probably the opposite: You can't toggle whatever switch we end
>> up with if there are any non-MB-aware programs (you'd have to unload
>> them first)...
>>
>
> How would we communicate that issue? dmesg? I'm not very familiar with
> how sysctl change failure causes are communicated to users, so this
> might be a solved problem, but if I run `sysctl -w net.xdp.multibuffer
> 1` (or whatever ends up actually being the toggle) to active
> multi-buffer, and it fails because there's a loaded non-aware program,
> that seems like a potential for a lot of administrator pain.

Hmm, good question. Document that this only fails if there's a
non-mb-aware XDP program loaded? Or use some other mechanism with better
feedback?

>> >>    Doing this at the driver level is not enough: while a particular
>> >>    driver knows if it's running in multi-buff mode, we can't know for
>> >>    sure if a particular XDP program is multi-buff aware at attach time:
>> >>    it could be tail-calling other programs, or redirecting packets to
>> >>    another interface where it will be processed by a non-MB aware
>> >>    program.
>> >>
>> >>    So another option is to make it a global toggle: e.g., create a new
>> >>    sysctl to enable multi-buffer. If this is set, reject loading any XDP
>> >>    program that doesn't support multi-buffer mode, and if it's unset,
>> >>    disable multi-buffer mode in all drivers. This will make it explicit
>> >>    when the multi-buffer mode is used, and prevent any accidental subtle
>> >>    malfunction of existing XDP programs. The drawback is that it's a
>> >>    mode switch, so more configuration complexity.
>> >>
>> >
>> > Could we combine the last two bits here into a global toggle that doesn't
>> > require a sysctl? If any driver is put into multi-buffer mode, then the system
>> > switches to requiring all programs be multi-buffer? When the last multi-buffer
>> > enabled driver switches out of multi-buffer, remove the system-wide
>> > restriction?
>>
>> Well, the trouble here is that we don't necessarily have an explicit
>> "multi-buf mode" for devices. For instance, you could raise the MTU of a
>> device without it necessarily involving any XDP multi-buffer stuff (if
>> you're not running XDP on that device). So if we did turn "raising the
>> MTU" into such a mode switch, we would end up blocking any MTU changes
>> if any XDP programs are loaded. Or having an MTU change cause a
>> force-unload of all XDP programs.
>
> Maybe I missed something then, but you had stated that "while a
> particular driver knows if it's running in multi-buff mode" so I
> assumed that the driver would be able to tell when to toggle the mode
> on.

Well, a driver knows when it is attaching an XDP program whether it (the
driver) is configured in a way such that this XDP program could
encounter a multi-buf.

> I had been thinking that when a driver turned multi-buffer off, it
> could trigger a check of all drivers, but that also seems like it
> could just be a global refcount of all the drivers that have requested
> multi-buffer mode. When a driver enables multi-buffer for itself, it
> increments the refcount, and when it disables, it decrements. A
> non-zero count means the system is in multi-buffer mode.

I guess we could do a refcount-type thing when an multi-buf XDP program
is first attached (as per above). But I think it may be easier to just
do it at load-time, then, so it doesn't have to be in the driver, but
the BPF core could just enforce it.

This would basically amount to a rule saying "you can't mix mb-aware and
non-mb-aware programs", and the first type to be loaded determines which
mode the system is in. This would be fairly simple to implement and
enforce, I suppose. The drawback is that it's potentially racy in the
order programs are loaded...

-Toke