linux-kernel - Re: Device mem changes vs pinning/zerocopy changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izP-6mKM1vEELjRXRj09qwSh_tCDdwA3TWxVuSOYNBGYeA@mail.gmail.com>
Date: Thu, 5 Jun 2025 11:59:24 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: David Howells <dhowells@...hat.com>
Cc: Stanislav Fomichev <stfomichev@...il.com>, willy@...radead.org, hch@...radead.org, 
	Jakub Kicinski <kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: Device mem changes vs pinning/zerocopy changes

On Wed, Jun 4, 2025 at 7:56 AM David Howells <dhowells@...hat.com> wrote:
>
> Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> > >  (1) Separate fragment lifetime management from sk_buff.  No more wangling
> > >      of refcounts in the skbuff code.  If you clone an skb, you stick an
> > >      extra ref on the lifetime management struct, not the page.
> >
> > For device memory TCP we already have this: net_devmem_dmabuf_binding
> > is the owner of the frags. And when we reference skb frag we reference
> > only this owner, not individual chunks: __skb_frag_ref -> get_netmem ->
> > net_devmem_get_net_iov (ref on the binding).
> >
> > Will it be possible to generalize this to cover MSG_ZEROCOPY and splice
> > cases? From what I can tell, this is somewhat equivalent of your net_txbuf.
>
> Yes and no.  The net_devmem stuff that's now upstream still manages refs on a
> per-skb-frag basis.

Actually Stan may be right here, something similar to the net_devmem
model may be what you want here.

The net_devmem stuff actually never grabs references on the frags
themselves, as Stan explained (which is what you want). We have an
object 'net_devmem_dmabuf_binding', which represents a chunk of pinned
devmem passed from userspace. When the net stack asks for a ref on a
frag, we grab a ref on the binding the frag belongs too in this call
path that Stan pointed to:

__skb_frag_ref -> get_netmem -> net_devmem_get_net_iov (ref on the binding).

This sounds earingly similar to what you want to do. You could have a
new struct (net_zcopy_mem) which represents a chunk of zerocopy memory
that you've pinned using GUP or whatever is the correct api is. Then
when the net stack wants a ref on a frag, you (somehow) figure out
which net_zcopy_mem it belongs to, and you grab a ref on the struct
rather than the frag.

Then when the refcount of net_zcopy_mem hits 0, you know you can
un-GUP the zcopy memory. I think that model in general may work. But
also it may be a case of everything looking like a nail to someone
with a hammer.

Better yet, we already have in the code a struct that represent
zerocopy memory, struct ubuf_info_msgzc. Instead of inventing a new
struct, you can reuse this one to do the memory pinning and
refcounting on behalf of the memory underneath?

-- 
Thanks,
Mina