lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izPZ9Jiu9Gj+Kk3cQ_+t22M4n4-mbPLhx+fti_HiWzL57Q@mail.gmail.com>
Date: Wed, 21 Aug 2024 11:36:31 -0400
From: Mina Almasry <almasrymina@...gle.com>
To: Taehee Yoo <ap420073@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-alpha@...r.kernel.org, 
	linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org, 
	sparclinux@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	linux-arch@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	bpf@...r.kernel.org, linux-media@...r.kernel.org, 
	dri-devel@...ts.freedesktop.org, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, 
	Donald Hunter <donald.hunter@...il.com>, Jonathan Corbet <corbet@....net>, 
	Richard Henderson <richard.henderson@...aro.org>, Ivan Kokshaysky <ink@...assic.park.msu.ru>, 
	Matt Turner <mattst88@...il.com>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>, 
	"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>, 
	Andreas Larsson <andreas@...sler.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Ilias Apalodimas <ilias.apalodimas@...aro.org>, Steven Rostedt <rostedt@...dmis.org>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Arnd Bergmann <arnd@...db.de>, Steffen Klassert <steffen.klassert@...unet.com>, 
	Herbert Xu <herbert@...dor.apana.org.au>, David Ahern <dsahern@...nel.org>, 
	Willem de Bruijn <willemdebruijn.kernel@...il.com>, Shuah Khan <shuah@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	John Fastabend <john.fastabend@...il.com>, Sumit Semwal <sumit.semwal@...aro.org>, 
	Christian König <christian.koenig@....com>, 
	Bagas Sanjaya <bagasdotme@...il.com>, Christoph Hellwig <hch@...radead.org>, 
	Nikolay Aleksandrov <razor@...ckwall.org>, Pavel Begunkov <asml.silence@...il.com>, David Wei <dw@...idwei.uk>, 
	Jason Gunthorpe <jgg@...pe.ca>, Yunsheng Lin <linyunsheng@...wei.com>, 
	Shailend Chand <shailend@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Jeroen de Borst <jeroendb@...gle.com>, 
	Praveen Kaligineedi <pkaligineedi@...gle.com>, Willem de Bruijn <willemb@...gle.com>, 
	Kaiyuan Zhang <kaiyuanz@...gle.com>, Daniel Vetter <daniel.vetter@...ll.ch>
Subject: Re: [PATCH net-next v19 03/13] netdev: support binding dma-buf to netdevice

On Wed, Aug 21, 2024 at 5:15 AM Taehee Yoo <ap420073@...il.com> wrote:
>
> On Tue, Aug 20, 2024 at 1:01 PM Mina Almasry <almasrymina@...gle.com> wrote:
> >
> > On Mon, Aug 19, 2024 at 6:53 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > >
> > > On Mon, 19 Aug 2024 00:44:27 +0900 Taehee Yoo wrote:
> > > > > @@ -9537,6 +9540,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
> > > > >                         NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time");
> > > > >                         return -EEXIST;
> > > > >                 }
> > > > > +               if (dev_get_max_mp_channel(dev) != -1) {
> > > > > +                       NL_SET_ERR_MSG(extack, "XDP can't be installed on a netdev using memory providers");
> > > > > +                       return -EINVAL;
> > > > > +               }
> > > >
> > > > Should we consider virtual interfaces like bonding, bridge, etc?
> > > > Virtual interfaces as an upper interface of physical interfaces can
> > > > still install XDP prog.
> > > >
> > > > # ip link add bond0 type bond
> > > > # ip link set eth0 master bond0
> > > > # ip link set bond0 xdp pin /sys/fs/bpf/x/y
> > > > and
> > > > # ip link set bond0 xdpgeneric pin /sys/fs/bpf/x/y
> > > >
> > > > All virtual interfaces can install generic XDP prog.
> > > > The bonding interface can install native XDP prog.
> > >
> > > Good point. We may need some common helpers to place the checks for XDP.
> > > They are spread all over the place now.
> >
> > Took a bit of a look here. Forgive me, I'm not that familiar with XDP
> > and virtual interfaces, so I'm a bit unsure what to do here.
> >
> > For veth, it seems, the device behind the veth is stored in
> > veth_priv->peer, so it seems maybe a dev_get_max_mp_channel() check on
> > veth_priv->peer is the way to go to disable this for veth? I think we
> > need to do this check on creation of the veth and on the ndo_bpf of
> > veth.
> >
> > For bonding, it seems we need to add mp channel check in bond_xdp_set,
> > and bond_enslave?
> >
> > There are a few other drivers that define ndo_add_slave, seems a check
> > in br_add_slave is needed as well.
> >
> > This seems like a potentially deep rabbit hole with a few checks to
> > add all of the place. Is this blocking the series? AFAICT if XDP fails
> > with mp-bound queues with a benign error, that seems fine to me; I
> > don't have a use case for memory providers + xdp yet. This should only
> > be blocking if someone can repro a very serious error (kernel crash)
> > or something with this combination.
> >
> > I can try to add these checks locally and propose as a follow up
> > series. Let me know if I'm on the right track with figuring out how to
> > implement this, and, if you feel like it's blocking.
> >
> > --
> > Thanks,
> > Mina
>
> I agree with the current approach, which uses the
> dev_get_min_mp_channel_count() in the dev_xdp_attach().
> The only problem that I am concerned about is the
> dev_get_min_mp_channel_count() can't check lower interfaces.
> So, how about just making the current code to be able to check lower
> interfaces?

Thank you for the code snippet! It's very useful! I have been
wondering how to walk lower/upper devices!

To be honest, I think maybe Jakub's suggestion to refactor all the
->ndo_bpf calls needs to happen anyway. The reason is that there are
->ndo_bpf calls in the core net stack, like net/xdp/xsk_buff_pool.c
and kernel/bpf/offload.c. AFAICT we need to add checks in these places
as well, so refactoring them into one place is nice?

Note I sent the refactor for review. Sorry, I forgot to CC Taehee:
https://patchwork.kernel.org/project/netdevbpf/patch/20240821045629.2856641-1-almasrymina@google.com/

Additionally I'm wondering if we should disable adding mp-bound
devices as slaves completely, regardless of xdp. My concern is that if
the lower device is using unreadable memory, then the upper device may
see unreadable memory in its code paths, and will not be expecting
that, so it may break. From the look at the code, it looks like
net/batman-adv calls ndo_add_slave, and a bunch of code that touches
skb_frags:

$ ackc -i ndo_add_slave
soft-interface.c
889:    .ndo_add_slave = batadv_softif_slave_add,

$ ackc -i skb_frag
fragmentation.c
403:    struct sk_buff *skb_fragment;
407:    skb_fragment = dev_alloc_skb(ll_reserved + mtu + tailroom);
408:    if (!skb_fragment)
411:    skb_fragment->priority = skb->priority;
414:    skb_reserve(skb_fragment, ll_reserved + header_size);
415:    skb_split(skb, skb_fragment, skb->len - fragment_size);
418:    skb_push(skb_fragment, header_size);
419:    memcpy(skb_fragment->data, frag_head, header_size);
422:    return skb_fragment;
441:    struct sk_buff *skb_fragment;
513:            skb_fragment = batadv_frag_create(net_dev, skb, &frag_header,
515:            if (!skb_fragment) {
522:                               skb_fragment->len + ETH_HLEN);
523:            ret = batadv_send_unicast_skb(skb_fragment, neigh_node);

If we disable ndo_add_slave on mp devices, then we don't need to walk
lower or upper devices. What do you think? If we don't disable mp
lower devices entirely, then yes, we can make
dev_get_min_mp_channel_count() do a recursive check.

Note that we can add support for mp bound devices as slaves in the
future if we have a use case for it, and it's well tested to be safe
with selftests added.

If we disable adding mp devices as lower devices, then during the mp
binding we should also check if the device has upper devices.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ