[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHS8izNWt2-1bC2f0jv4Qpk_A9VpEXNvVRoXUtL43_16d-Ui-A@mail.gmail.com>
Date: Tue, 4 Mar 2025 17:39:37 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, kvm@...r.kernel.org,
virtualization@...ts.linux.dev, linux-kselftest@...r.kernel.org,
Donald Hunter <donald.hunter@...il.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
Jeroen de Borst <jeroendb@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, Willem de Bruijn <willemb@...gle.com>, David Ahern <dsahern@...nel.org>,
Neal Cardwell <ncardwell@...gle.com>, "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez <eperezma@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>, Stefano Garzarella <sgarzare@...hat.com>, Shuah Khan <shuah@...nel.org>,
sdf@...ichev.me, asml.silence@...il.com, dw@...idwei.uk,
Jamal Hadi Salim <jhs@...atatu.com>, Victor Nogueira <victor@...atatu.com>,
Pedro Tammela <pctammela@...atatu.com>, Samiullah Khawaja <skhawaja@...gle.com>
Subject: Re: [PATCH net-next v6 1/8] net: add get_netmem/put_netmem support
On Mon, Mar 3, 2025 at 4:20 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 28 Feb 2025 17:29:13 -0800 Mina Almasry wrote:
> > On Fri, Feb 28, 2025 at 4:38 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote:
> > > > static inline void __skb_frag_ref(skb_frag_t *frag)
> > > > {
> > > > - get_page(skb_frag_page(frag));
> > > > + get_netmem(skb_frag_netmem(frag));
> > > > }
> > >
> > > Silently handling types of memory the caller may not be expecting
> > > always worries me.
> >
> > Sorry, I'm not following. What caller is not expecting netmem?
> > Here we're making sure __skb_frag_ref() handles netmem correctly,
> > i.e. we were not expecting netmem here before, and after this patch
> > we'll handle it correctly.
> >
> > > Why do we need this?
> > >
> >
> > The MSG_ZEROCOPY TX path takes a page reference on the passed memory
> > in zerocopy_fill_skb_from_iter() that kfree_skb() later drops when the
> > skb is sent. We need an equivalent for netmem, which only supports pp
> > refs today. This is my attempt at implementing a page_ref equivalent
> > to net_iov and generic netmem.
> >
> > I think __skb_frag_[un]ref is used elsewhere in the TX path too,
> > tcp_mtu_probe for example calls skb_frag_ref eventually.
>
> Any such caller must be inspected to make sure it generates
> / anticipates skbs with appropriate pp_recycle and readable settings.
> It's possible that adding a set of _netmem APIs would be too much
> churn, but if it's not - it'd make it clear which parts of the kernel
> we have inspected.
>
I see, you're concerned about interactions between skb->pp_recycle and
skb->unreadable. My strategy to handle this is to take what works for
pages, and extend it as carefully as possible to unreadable
net_iovs/skbs. Abstractly speaking, I think skb_frag_ref/unref should
be able to obtain/drop a ref on a frag regardless of what the
underlying memory type is.
Currently __skb_frag_ref() obtains a page ref regardless of
skb->pp_recycle setting, and skb_page_unref() drops a page_ref if
!skb->pp_recycle and a pp ref if skb->pp_recycle. I extend the logic
so that it applies as-is to net_iovs and unreadable skbs.
After this patch, __skb_frag_ref() obtains a 'page ref equivalent' on
the net_iov regardless of the pp_recycle setting. My conjecture is
that since that works for pages, it should also work for net_iovs.
Also after this patch, skb_page_unref will drop a pp ref on the
net_iov if skb->pp_recycle is set, and drop a 'page ref equivalent' on
the net_iov if !skb->pp_recycle. The conjecture again is that since
that works for pages, it should extend to net_iovs.
With regards to the callers, my thinking is that since we preserved
the semantics of __skb_frag_ref and skb_page_unref (and so
skb_frag_ref/skb_frag_unref by extension), then all existing callers
of skb_frag_[un]ref should work as is. Here is some code analysis of
individual callers:
1. skb_release_data
__skb_frag_unref
skb_page_unref
This code path expects to drop a pp_ref if skb->pp_recycle, and drop a
page ref if !skb->pp_recycle. We make sure to do this regardless if
the skb is actually filled with net_iovs or pages, and the conjecture
is that since the logic to acquire/drop a pp=page ref works for pages,
it should work for the net_iovs as well.
2. __skb_zcopy_downgrade_managed
skb_frag_ref
Same thing here, since this code expects to get a page ref on the
frag, we obtain the net_iov equivalent of a page_ref and that should
work as well for net_iovs as pages. I could look at more callers, but
the general thinking is the same. Am I off the rails here?
Of course, we could leave skb_frag_[un]ref as-is and create an
skb_frag_[un]ref_netmem which support net_iovs, but my ambition here
was to make skb_frag_[un]ref itself support netmem rather than fork
the frag refcounting - if it seems like a good idea?
> > > In general, I'm surprised by the lack of bug reports for devmem.
> >
> > I guess we did a good job making sure we don't regress the page paths.
>
> :)
>
> > The lack of support in any driver that qemu will run is an issue. I
> > wonder if also the fact that devmem needs some setup is also an issue.
> > We need headersplit enabled, udmabuf created, netlink API bound, and
> > then a connection referring to created and we don't support loopback.
> > I think maybe it all may make it difficult for syzbot to repro. I've
> > had it on my todo list to investigate this more.
> >
> > > Can you think of any way we could expose this more to syzbot?
> > > First thing that comes to mind is a simple hack in netdevsim,
> > > to make it insert a netmem handle (allocated locally, not a real
> > > memory provider), every N packets (controllable via debugfs).
> > > Would that work?
> >
> > Yes, great idea. I don't see why it wouldn't work.
> >
> > We don't expect mixing of net_iovs and pages in the same skb, but
> > netdevsim could create one net_iov skb every N skbs.
> >
> > I guess I'm not totally sure something is discoverable to syzbot. Is a
> > netdevsim hack toggleable via a debugfs sufficient for syzbot? I'll
> > investigate and ask.
>
> Yeah, my unreliable memory is that syzbot has a mixed record discovering
> problems with debugfs. If you could ask Dmitry for advice that'd be
> ideal.
Yes, I took a look here and discussed with Willem. Long story short is
that syzbot support is possible but with a handful of changes. We'll
look into that.
Long story long, for syzbot support I don't think netdevsim itself
will be useful. Its our understanding so far that syzbot doesn't do
anything special with netdevsim. We'll need to add queue
API/page_pool/unreadable netmem support to one of the drivers qemu
(syzbot) uses, and that should get syzbot fuzzing the control plane.
To get syzbot to fuzz the data plane, I think we need to set up a
special syzbot instance which configures udmabuf/rss/flow
steering/netlink binding and start injecting packets through the data
path. Syzbot would not discover a working config on its own. I'm told
it's rare to set up specialized syzbot instances but we could sell
that this coverage is important enough.
Hacking netdevsim like you suggested would be useful as well, but
outside of syzbot, AFAICT so far. We can run existing netdevsim data
path tests with netdevsim in 'unreadable netmem mode' and see if it
can reproduce issues. Although I'm not sure how to integrate that with
continuous testing yet.
--
Thanks,
Mina
Powered by blists - more mailing lists