lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 17 Nov 2016 11:48:55 +0200 From: Saeed Mahameed <saeedm@....mellanox.co.il> To: Daniel Borkmann <daniel@...earbox.net> Cc: "David S. Miller" <davem@...emloft.net>, Alexei Starovoitov <alexei.starovoitov@...il.com>, Brenden Blanco <bblanco@...mgrid.com>, Zhiyi Sun <zhiyisun@...il.com>, Rana Shahout <ranas@...lanox.com>, Saeed Mahameed <saeedm@...lanox.com>, Linux Netdev List <netdev@...r.kernel.org> Subject: Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@...earbox.net> wrote: > On 11/16/2016 03:30 PM, Saeed Mahameed wrote: >> >> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@...earbox.net> >> wrote: >>> >>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote: >>>> >>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@...earbox.net> >>>> wrote: >>>>> >>>>> >>>>> There are multiple issues in mlx5e_xdp_set(): >>>>> >>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When >>>>> doing so, it should be done at an earlier point in time to makes >>>>> sure >>>>> that we cannot fail anymore at the time we want to set the program >>>>> for >>>>> each channel. This only means that we have to undo the >>>>> bpf_prog_add() >>>>> in case we return due to required reset. >>>>> >>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must >>>>> be >>>>> taken since we got that from call path via dev_change_xdp_fd() >>>>> already. >>>>> Otherwise, we'd never be able to free the program. Also, >>>>> bpf_prog_add() >>>>> without checking the return code could fail. >>>>> >>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs >>>>> support") >>>>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net> >>>>> --- >>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 >>>>> ++++++++++++++++++----- >>>>> 1 file changed, 20 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>>>> index ab0c336..cf26672 100644 >>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c >>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device >>>>> *netdev, struct bpf_prog *prog) >>>>> goto unlock; >>>>> } >>>>> >>>>> + if (prog) { >>>>> + /* num_channels is invariant here, so we can take the >>>>> + * batched reference right upfront. >>>>> + */ >>>>> + prog = bpf_prog_add(prog, priv->params.num_channels); >>>>> + if (IS_ERR(prog)) { >>>>> + err = PTR_ERR(prog); >>>>> + goto unlock; >>>>> + } >>>>> + } >>>>> + >>>> >>>> >>>> With this approach you will end up taking a ref count twice per RQ! on >>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL). >>>> One ref will be taken per RQ/Channel from the above code, and since >>>> reset will be TRUE mlx5e_open_locked will be called and another ref >>>> count will be taken on mlx5e_create_rq. >>>> >>>> The issue here is that we have two places for ref count accounting, >>>> xdp_set and mlx5e_create_rq. Having ref-count updates in >>>> mlx5e_create_rq is essential for num_channels configuration changes >>>> (mlx5e_set_ringparam). >>>> >>>> Our previous approach made sure that only one path will do the ref >>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when >>>> reset == false). >>> >>> >>> That is correct, for a short time bpf_prog_add() was charged also when >>> we reset. When we reset, we will then jump to unlock_put and safely undo >>> this since we took ref from mlx5e_create_rq() already in that case, and >>> return successfully. That was intentional, so that eventually we end up >>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more >>> below ... >>> >>> [...] >>>> >>>> >>>> 2. Keep the current approach and make sure to not update the ref count >>>> twice, you can batch update only if (!reset && was_open) otherwise you >>>> can rely on mlx5e_open_locked to take the num_channels ref count for >>>> you. >>>> >>>> Personally I prefer option 2, since we will keep the current logic >>>> which will allow configuration updates such as (mlx5e_set_ringparam) >>>> to not worry about ref counting since it will be done in the reset >>>> flow. >>> >>> >>> ... agree on keeping current approach. I actually like the idea, so we'd >>> end up with this simpler version for the batched ref then. >> >> >> Great :). >> >> So let's do the batched update only if we are not going to reset (we >> already know that in advance), instead of the current patch where you >> batch update unconditionally and then >> unlock_put in case reset was performed (which is just redundant and >> confusing). >> >> Please note that if open_locked fails you need to goto unlock_put. > > > Sorry, I'm not quite sure I follow you here; are you /now/ commenting on > the original patch or on the already updated diff I did from my very last > email, that is, http://patchwork.ozlabs.org/patch/695564/ ? > I am commenting on this patch "V2 2/4". this patch will take batched ref count unconditionally and release the extra refs "unlock_put" if reset was performed. I am asking to take the batched ref only if we are not going to reset in advance to save the extra "unlock_put" >>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not >>> needed >>> since this we already got that one through dev_change_xdp_fd() as >>> mentioned. >> >> >> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup >> in your next patch? this doesn't look symmetric (right) ! >> you shouldn't release a reference you didn't take. >> Overall with this series the driver can take num_channels refs and >> will release num_channels refs on mlx5e_close. we shouldn't release >> one extra ref on NIC cleanup. > > > I already explained this in the commit description; when dev_change_xdp_fd() > is called and sees a valid fd, it does bpf_prog_get_type(), which calls the > __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then > passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp() > callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side. > > For drivers that implement against this ndo, it means that you need N-1 refs > in addition. Have a look at the other two in-tree users, which do it > correctly, > that is mlx4_xdp_set() and nfp_net_xdp_setup(). > > It's documented here (include/linux/netdevice.h) with ndo_xdp referring to > it: > > /* These structures hold the attributes of xdp state that are being passed > * to the netdevice through the xdp op. > */ > enum xdp_netdev_command { > /* Set or clear a bpf program used in the earliest stages of packet > * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The > callee > * is responsible for calling bpf_prog_put on any old progs that are > * stored. In case of error, the callee need not release the new > prog > * reference, but on success it takes ownership and must > bpf_prog_put > * when it is no longer used. > */ > XDP_SETUP_PROG, > [...] > }; > > I think reason why this is rather confusing would be that initially, it was > just a single prog for all queues, but it was requested along the way to > move > prog pointer down to queues instead and let them have their own ref, so that > some time in future individual progs from a subset of the queues can be > exchanged. > > Since the way xdp in mlx5 is implemented, you currently have the > priv->xdp_prog > as the control prog. That's okay right now, but requires to drop the last > ref > on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and > priv->xdp_prog > still present. Got it, but i would rather make the stack cleanup those refs once it receives an unregester_netdev event instead of counting on netdev to do it, as i said before, like any other resource added to the netdev by the stack (vlan/mac/etc..). but this is a general discussion of xdp API, it won't block this series :). Saeed.
Powered by blists - more mailing lists