linux-kernel - Re: [PATCH net-next v5 3/9] net: devmem: Implement TX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izNfNJLrMtdR0je3DsXDAvP2Hs8HfKf5Jq7_kQJsVUbrzg@mail.gmail.com>
Date: Tue, 25 Feb 2025 09:41:41 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-doc@...r.kernel.org, virtualization@...ts.linux.dev, 
	kvm@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	Donald Hunter <donald.hunter@...il.com>, Jakub Kicinski <kuba@...nel.org>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch>, 
	Jeroen de Borst <jeroendb@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...zon.com>, Willem de Bruijn <willemb@...gle.com>, David Ahern <dsahern@...nel.org>, 
	Neal Cardwell <ncardwell@...gle.com>, Stefan Hajnoczi <stefanha@...hat.com>, 
	Stefano Garzarella <sgarzare@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>, 
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez <eperezma@...hat.com>, 
	Shuah Khan <shuah@...nel.org>, sdf@...ichev.me, asml.silence@...il.com, dw@...idwei.uk, 
	Jamal Hadi Salim <jhs@...atatu.com>, Victor Nogueira <victor@...atatu.com>, 
	Pedro Tammela <pctammela@...atatu.com>, Samiullah Khawaja <skhawaja@...gle.com>, 
	Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [PATCH net-next v5 3/9] net: devmem: Implement TX path

On Tue, Feb 25, 2025 at 5:04 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 2/22/25 8:15 PM, Mina Almasry wrote:
> [...]
> > @@ -119,6 +122,13 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> >       unsigned long xa_idx;
> >       unsigned int rxq_idx;
> >
> > +     xa_erase(&net_devmem_dmabuf_bindings, binding->id);
> > +
> > +     /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the
> > +      * erase.
> > +      */
> > +     synchronize_net();
>
> Is the above statement always true? can the dmabuf being stuck in some
> qdisc? or even some local socket due to redirect?
>

Yes, we could have have netmems in the TX path in the qdisc or waiting
for retransmits that still have references to the dmabuf, and that's
fine here I think.

What is happening here is that tcp_sendmsg_locked() will look for a
binding in net_devmem_dmabuf_bindings with binding->id ==
sockc.dmabuf_id, and then grab a reference on it.

In parallel, net_devmem_unbind_dmabuf will remove the binding from
net_devmem_dmabuf_bindings, and then drop its refcount (which may be
the last one if the 'get' in tcp_sendmsg_locked has not yet executed,
triggering the binding to be freed).

I need to make sure the lookup + get of the dmabuf in
net_devmem_lookup_dmabuf() doesn't race with the erase + put in
net_devmem_unbind_dmabuf() in a way that causes UAF, and I use
rcu_read_lock for that coupled with synchronize_net().

Note that net_devmem_dmabuf_binding_put() won't free the dmabuf unless
we put the last ref, so any netmems stuck in retransmit paths or qdisc
will still pin the underlying dmabuf.

Let me know if you still see an issue here.


> > @@ -252,13 +261,23 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> >        * binding can be much more flexible than that. We may be able to
> >        * allocate MTU sized chunks here. Leave that for future work...
> >        */
> > -     binding->chunk_pool =
> > -             gen_pool_create(PAGE_SHIFT, dev_to_node(&dev->dev));
> > +     binding->chunk_pool = gen_pool_create(PAGE_SHIFT,
> > +                                           dev_to_node(&dev->dev));
> >       if (!binding->chunk_pool) {
> >               err = -ENOMEM;
> >               goto err_unmap;
> >       }
> >
> > +     if (direction == DMA_TO_DEVICE) {
> > +             binding->tx_vec = kvmalloc_array(dmabuf->size / PAGE_SIZE,
> > +                                              sizeof(struct net_iov *),
> > +                                              GFP_KERNEL);
> > +             if (!binding->tx_vec) {
> > +                     err = -ENOMEM;
> > +                     goto err_free_chunks;
>
> Possibly my comment on v3 has been lost:
>
> """
> It looks like the later error paths (in the for_each_sgtable_dma_sg()
> loop) could happen even for 'direction == DMA_TO_DEVICE', so I guess an
> additional error label is needed to clean tx_vec on such paths.
> """
>

I think I fixed the issue pointed to in v3, but please correct me if I
missed anything. I added kvfree(), but we don't really need an
additional error label.

If we hit any of the error paths conditions in
for_each_sgtable_dma_sg, we will `goto err_free_chunks;`, which will
free all the genpool chunks, destroy the gen pool, then do
`kvfree(binding->tx_vec);` to free the memory allocated for tx_vec.

> [...]
> > @@ -1071,6 +1072,16 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >
> >       flags = msg->msg_flags;
> >
> > +     sockc = (struct sockcm_cookie){ .tsflags = READ_ONCE(sk->sk_tsflags),
> > +                                     .dmabuf_id = 0 };
> > +     if (msg->msg_controllen) {
> > +             err = sock_cmsg_send(sk, msg, &sockc);
> > +             if (unlikely(err)) {
> > +                     err = -EINVAL;
> > +                     goto out_err;
> > +             }
> > +     }
>
> I'm unsure how much that would be a problem, but it looks like that
> unblocking sendmsg(MSG_FASTOPEN) with bad msg argument will start to
> fail on top of this patch, while they should be successful (EINPROGRESS)
> before.
>

Thanks for catching indeed. I guess what I can do here is record that
the cmsg is invalid, then process up to MSG_FASTOPEN, and return
EINVAL after that.

Although that complicates this already complicated function a bit. Let
me know if you have a better/different solution in mind.

--
Thanks,
Mina