[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBsJSJoVJGg3j_JxeM_10BRyYTt6kQvbSMWT016jyUOu6w@mail.gmail.com>
Date: Wed, 30 Nov 2022 16:32:44 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org,
yhs@...com, john.fastabend@...il.com, kpsingh@...nel.org,
haoluo@...gle.com, jolsa@...nel.org,
David Ahern <dsahern@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Anatoly Burakov <anatoly.burakov@...el.com>,
Alexander Lobakin <alexandr.lobakin@...el.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Maryam Tahhan <mtahhan@...hat.com>, xdp-hints@...-project.net,
netdev@...r.kernel.org
Subject: Re: [xdp-hints] Re: [PATCH bpf-next v3 00/11] xdp: hints via kfuncs
On Wed, Nov 30, 2022 at 3:01 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Stanislav Fomichev <sdf@...gle.com> writes:
>
> > On Tue, Nov 29, 2022 at 12:50 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
> >>
> >> Stanislav Fomichev <sdf@...gle.com> writes:
> >>
> >> > Please see the first patch in the series for the overall
> >> > design and use-cases.
> >> >
> >> > Changes since v2:
> >> >
> >> > - Rework bpf_prog_aux->xdp_netdev refcnt (Martin)
> >> >
> >> > Switched to dropping the count early, after loading / verification is
> >> > done. At attach time, the pointer value is used only for comparing
> >> > the actual netdev at attach vs netdev at load.
> >>
> >> So if we're not holding the netdev reference, we'll end up with a BPF
> >> program with hard-coded CALL instructions calling into a module that
> >> could potentially be unloaded while that BPF program is still alive,
> >> right?
> >>
> >> I suppose that since we're checking that the attach iface is the same
> >> that the program should not be able to run after the module is unloaded,
> >> but it still seems a bit iffy. And we should definitely block
> >> BPF_PROG_RUN invocations of programs with a netdev set (but we should do
> >> that anyway).
> >
> > Ugh, good point about BPF_PROG_RUN, seems like it should be blocked
> > regardless of the locking scheme though, right?
> > Since our mlx4/mlx5 changes expect something after the xdp_buff, we
> > can't use those per-netdev programs with our generic
> > bpf_prog_test_run_xdp...
>
> Yup, I think we should just block it for now; maybe it can be enabled
> later if it turns out to be useful (and we find a way to resolve the
> kfuncs for this case).
>
> Also, speaking of things we need to disable, tail calls is another one.
> And for freplace program attachment we need to add a check that the
> target interfaces match as well.
Agreed, thanks!
> >> > (potentially can be a problem if the same slub slot is reused
> >> > for another netdev later on?)
> >>
> >> Yeah, this would be bad as well, obviously. I guess this could happen?
> >
> > Not sure, that's why I'm raising it here to see what others think :-)
> > Seems like this has to be actively exploited to happen? (and it's a
> > privileged operation)
> >
> > Alternatively, we can go back to the original version where the prog
> > holds the device.
> > Matin mentioned in the previous version that if we were to hold a
> > netdev refcnt, we'd have to drop it also from unregister_netdevice.
>
> Yeah; I guess we could keep a list of "bound" XDP programs in struct
> net_device and clear each one on unregister? Also, bear in mind that the
> "unregister" callback is also called when a netdev moves between
> namespaces; which is probably not what we want in this case?
>
> > It feels like beyond that extra dev_put, we'd need to reset our
> > aux->xdp_netdev and/or add some flag or something else to indicate
> > that this bpf program is "orphaned" and can't be attached anywhere
> > anymore (since the device is gone; netdev_run_todo should free the
> > netdev it seems).
>
> You could add a flag, and change the check to:
>
> + if (new_prog->aux->xdp_has_netdev &&
> + new_prog->aux->xdp_netdev != dev) {
> + NL_SET_ERR_MSG(extack, "Cannot attach to a different target device");
> + return -EINVAL;
> + }
>
> That way the check will always fail if xdp_netdev is reset to NULL
> (while keeping the flag) on dereg?
Something like that, yeah. I'll also take a closer look at offload.c
as Martin points out. I should probably leverage it instead of trying
to add more custom handling here..
> > That should address this potential issue with reusing the same addr
> > for another netdev, but is a bit more complicated code-wise.
> > Thoughts?
>
> I'd be in favour of adding this tracking; I worry that we'll end up with
> some very subtle and hard-to-debug bugs if we somehow do end up
> executing the wrong kfuncs...
SG, will try to address soon!
Powered by blists - more mailing lists