[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZmB9ctqbqSMdl5Qu@localhost.localdomain>
Date: Wed, 5 Jun 2024 17:00:02 +0200
From: Michal Kubiak <michal.kubiak@...el.com>
To: Jacob Keller <jacob.e.keller@...el.com>
CC: Jakub Kicinski <kuba@...nel.org>, David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>, Wojciech Drewek <wojciech.drewek@...el.com>,
George Kuruvinakunnel <george.kuruvinakunnel@...el.com>, Maciej Fijalkowski
<maciej.fijalkowski@...el.com>
Subject: Re: [PATCH net 4/8] i40e: Fix XDP program unloading while removing
the driver
On Thu, May 30, 2024 at 09:38:04AM -0700, Jacob Keller wrote:
>
>
> On 5/29/2024 6:54 PM, Jakub Kicinski wrote:
> > On Tue, 28 May 2024 15:06:07 -0700 Jacob Keller wrote:
> >> + /* Called from netdev unregister context. Unload the XDP program. */
> >> + if (vsi->netdev->reg_state == NETREG_UNREGISTERING) {
> >> + xdp_features_clear_redirect_target(vsi->netdev);
> >> + old_prog = xchg(&vsi->xdp_prog, NULL);
> >> + if (old_prog)
> >> + bpf_prog_put(old_prog);
> >> +
> >> + return 0;
> >> + }
> >
> > This is not great. The netdevice is closed at this stage, why is the xdp
> > setup try to do work if the device is closed even when not
> > unregistering?
>
> The comment makes this seem like its happening during unregistration. It
> looks like i40e_xdp_setup() is only called from i40e_xdp(), which is if
> xdp->command is XDP_SETUP_PROG
>
> From the looks of things, ndo_bpf is called both for setup and teardown?
Exactly, ndo_bpf with the command XDP_SETUP_PROG can be called for both
loading and unloading the XDP program. When the XDP program is unloaded,
the callback is simply called with NULL as the pointer to the program.
Also, unloading the XDP program can be initiated not only from the user
space directly, but also from the kernel core.
In this specific case we are handling that last case. Calling ndo_bpf
when reg_state == NETREG_UNREGISTERING is the case when unloading the
XDP program is an immanent part of the netdevice unregistering process.
In such a case we have to unconditionally decrease the reference counter for
the XDP program using bpf_prog_put() call and exit with no error to
assure the consistency between BPF core code and our driver.
>
> > 7 >-------/* Set or clear a bpf program used in the earliest stages of packet
> > 6 >------- * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
> > 5 >------- * is responsible for calling bpf_prog_put on any old progs that are
> > 4 >------- * stored. In case of error, the callee need not release the new prog
> > 3 >------- * reference, but on success it takes ownership and must bpf_prog_put
> > 2 >------- * when it is no longer used.
> > 1 >------- */
>
> Indeed, dev_xdp_uninstall calls dev_xdp_install in a loop to remove
> programs.
>
> As far as I can tell, it looks like the .ndo_bpf call is made with a
> program set to NULL during uninstall:
>
> > 30 static void dev_xdp_uninstall(struct net_device *dev)
> > 29 {
> > 28 >-------struct bpf_xdp_link *link;
> > 27 >-------struct bpf_prog *prog;
> > 26 >-------enum bpf_xdp_mode mode;
> > 25 >-------bpf_op_t bpf_op;
> > 24
> > 23 >-------ASSERT_RTNL();
> > 22
> > 21 >-------for (mode = XDP_MODE_SKB; mode < __MAX_XDP_MODE; mode++) {
> > 20 >------->-------prog = dev_xdp_prog(dev, mode);
> > 19 >------->-------if (!prog)
> > 18 >------->------->-------continue;
> > 17
> > 16 >------->-------bpf_op = dev_xdp_bpf_op(dev, mode);
> > 15 >------->-------if (!bpf_op)
> > 14 >------->------->-------continue;
> > 13
> > 12 >------->-------WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL));
> > 11
>
> Here, dev_xdp_install is called with a prog of NULL
>
> > 10 >------->-------/* auto-detach link from net device */
> > 9 >------->-------link = dev_xdp_link(dev, mode);
> > 8 >------->-------if (link)
> > 7 >------->------->-------link->dev = NULL;
> > 6 >------->-------else
> > 5 >------->------->-------bpf_prog_put(prog);
> > 4
> > 3 >------->-------dev_xdp_set_link(dev, mode, NULL);
> > 2 >-------}
> > 1 }
>
I confirm. The current design of netdevice unregistering algorithm
includes checking (in a loop) if the netdevice has any XDP program
attached and forces unloading that program because it won't be used
anymore on that device.
> I think the semantics are confusing here.
>
> Basically, the issue is this function needs to remove the XDP properly
> when called by the netdev unregister flow but has a check against adding
> a new program if its called during remove...
The check for __I40E_IN_REMOVE has been introduced by the commit
6533e558c650 ("i40e: Fix reset path while removing the driver").
Similar checks have been added in other callbacks. I believe the
intention was to fix some synchronization issues by exiting from callbacks
or reset immediately if the driver is being removed.
Unfortunately, although it could work for other callbacks, we cannot do that
in ndo_bpf because we need to leave kernel structures and ref counters
consistent.
I decided to keep the check for IN_REMOVE because I believe it covers
the case when NETREG_UNREGISTERING flag is not set yet but we started to
destroy our internal data structures.
>
> I think this is confusing and could be improved by refactoring how the
> i40e flow works. If the passed-in prog is NULL, its a request to remove.
> If its otherwise, its a request to add a new program (possibly replacing
> an existing one?).
>
> I think we ought to just be checking NULL and not needing to bother with
> the netdev_unregister state at all here?
I am afraid checking for NULL won't be enough here.
Normally, when ndo_bpf is called from the user space application, that
callback can be called with NULL too (when the user just wants to unload
the XDP program). In such a case, apart from calling bpf_prog_put(), we
have to rebuild our internal data structures (queues, pointers, counters
etc.) to restore the i40e driver working back in normal mode (with no
XDP program).
My intention of adding the check for NETREG_REGISTERING was to implement
a quick handler for unloading the XDP program from the netdev
unregistering context only, when our internal data structures are
already destroyed but we need to leave kernel's ref counters in a
consistent state.
>
> Hopefully Michal can chime in with a better understanding.
Thanks,
Michal
Powered by blists - more mailing lists