[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izPGuF1PxfdmXUC1XJHpmRqotXh=vUY_a-AEHdAgPmLQ1g@mail.gmail.com>
Date: Tue, 22 Apr 2025 14:10:07 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, io-uring@...r.kernel.org,
virtualization@...ts.linux.dev, kvm@...r.kernel.org,
linux-kselftest@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Donald Hunter <donald.hunter@...il.com>,
Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>,
Jeroen de Borst <jeroendb@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, Willem de Bruijn <willemb@...gle.com>, Jens Axboe <axboe@...nel.dk>,
David Ahern <dsahern@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez <eperezma@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>, Stefano Garzarella <sgarzare@...hat.com>, Shuah Khan <shuah@...nel.org>,
sdf@...ichev.me, dw@...idwei.uk, Jamal Hadi Salim <jhs@...atatu.com>,
Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>,
Samiullah Khawaja <skhawaja@...gle.com>
Subject: Re: [PATCH net-next v9 2/9] net: add get_netmem/put_netmem support
On Tue, Apr 22, 2025 at 1:03 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>
> On 4/22/25 20:47, Pavel Begunkov wrote:
> > On 4/22/25 19:30, Mina Almasry wrote:
> >> On Tue, Apr 22, 2025 at 11:19 AM Pavel Begunkov <asml.silence@...il.com> wrote:
> >>>
> >>> On 4/22/25 14:56, Mina Almasry wrote:
> >>>> On Tue, Apr 22, 2025 at 1:43 AM Pavel Begunkov <asml.silence@...il.com> wrote:
> >>>>>
> >>>>> On 4/18/25 00:15, Mina Almasry wrote:
> >>>>>> Currently net_iovs support only pp ref counts, and do not support a
> >>>>>> page ref equivalent.
> >>>>>
> >>>>> Makes me wonder why it's needed. In theory, nobody should ever be
> >>>>> taking page references without going through struct ubuf_info
> >>>>> handling first, all in kernel users of these pages should always
> >>>>> be paired with ubuf_info, as it's user memory, it's not stable,
> >>>>> and without ubuf_info the user is allowed to overwrite it.
> >>>>>
> >>>>
> >>>> The concern about the stability of the from-userspace data is already
> >>>> called out in the MSG_ZEROCOPY documentation that we're piggybacking
> >>>> devmem TX onto:
> >>>
> >>> Sure, I didn't object that. There is no problem as long as the
> >>> ubuf_info semantics is followed, which by extension mean that
> >>> any ref manipulation should already be gated on ubuf_info, and
> >>> there should be no need in changing generic paths.
> >>>
> >>
> >> I'm sorry I'm not following. skb_frag_ref is how the net stack obtains
> >> references on an skb_frag, regardless on whether the frag is a
> >> MSG_ZEROCOPY one with ubuf info, or a regular tx frag without a
> >> ubuf_info, or even an io_uring frag which I think have the
> >
> > Yep
> >
> >> msg->ubuf_info like we discussed previously. I don't see the net stack
> >> in the current code special casing how it obtains refs on frags, and I
> >> don't see the need to add special casing. Can you elaborate in more
> >
> > You'll be special casing it either way, it's probably unavoidable,
> > just here it is in put/get_netmem.
> >
> >> detail what is the gating you expect, and why? Are you asking that I
> >> check the skb has a ubuf_info before allowing to grab the reference on
> >> the dmabuf binding? Or something else?
> >
> > get_page() already shouldn't be a valid operation for ubuf backed frags
> > apart from few cases where frags are copied/moved together with ubuf.
This is where I'm not following. Per the v5 changelog of this commit,
all these skb_helpers hit skb_frag_ref (which is just get_page
underneath):
tcp_grow_skb, __skb_zcopy_downgrade_managed, __pskb_copy_fclone,
pskb_expand_head, skb_zerocopy, skb_split, pksb_carve_inside_header,
pskb_care_inside_nonlinear, tcp_clone_payload, skb_segment, skb_shift,
skb_try_coalesce.
I don't see many of them opt-out of skb_frag_ref if the skb is
unreadable or has ubuf_info. Are you saying all/most/some of these
callers are invalid? I tend to assume merged code is the correct one
unless I have ample expertise to say otherwise.
> > The frags are essentially bundled with ubuf and shouldn't exist without
> > it, because otherwise user can overwrite memory with all the following
> > nastiness. If there are some spots violating that, I'd rather say they
> > should be addressed.
> >
> > Instead of adding net_iov / devmem handling in generic paths affecting
> > everyone, you could change those functions where it's get_page() are
> > called legitimately. The niov/devmem part of get/put_netmem doesn't
> > even have the same semantics as the page counterparts as it cannot
> > prevent from reallocation. That might be fine, but it's not clear
>
> Actually, maybe it's not that exclusive to netiov, same reallocation
> argument is true for user pages, even though they're reffed
> separately.
>
> It might be fine to leave this approach, while suboptimal it should
> be easier for you. Depends on how folks feel about the extra
> overhead in the normal tx path.
>
Right, I think there is only 2 ways to handle all the code paths in
the tcp stack that hit skb_frag_ref:
1. We go over all of them and make sure they're unreachable for unreadable skbs:
if (!skb_frags_readable()) return; // or something
2. or, we just add net_iov support in skb_frag_ref.
This patch series does the latter, which IMO is much preferred.
FWIW I'm surprised that adding net_iov support to skb_frag_ref/unref
is facing uncertainty. I've added net_iov support for many skb helpers
in commit 65249feb6b3df ("net: add support for skbs with unreadable
frags") and commit 9f6b619edf2e8 ("net: support non paged skb frags").
skb_frag_ref/unref is just 1 helper I "missed" because it's mostly
(but not entirely) used by the TX path.
--
Thanks,
Mina
Powered by blists - more mailing lists