linux-kernel - Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y3toiPtBgOcrb8TL@bullseye>
Date:   Mon, 21 Nov 2022 12:01:12 +0000
From:   Bobby Eshleman <bobbyeshleman@...il.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Bobby Eshleman <bobby.eshleman@...edance.com>,
        Cong Wang <cong.wang@...edance.com>,
        Jiang Wang <jiang.wang@...edance.com>,
        Krasnov Arseniy <oxffffaa@...il.com>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Stefano Garzarella <sgarzare@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, kvm@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff

On Tue, Dec 06, 2022 at 11:20:21AM +0100, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-12-02 at 09:35 -0800, Bobby Eshleman wrote:
> [...]
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 35d7eedb5e8e..6c0b2d4da3fe 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -3,10 +3,129 @@
> >  #define _LINUX_VIRTIO_VSOCK_H
> >  
> >  #include <uapi/linux/virtio_vsock.h>
> > +#include <linux/bits.h>
> >  #include <linux/socket.h>
> >  #include <net/sock.h>
> >  #include <net/af_vsock.h>
> >  
> > +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
> > +
> > +enum virtio_vsock_skb_flags {
> > +	VIRTIO_VSOCK_SKB_FLAGS_REPLY		= BIT(0),
> > +	VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED	= BIT(1),
> > +};
> > +
> > +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb)
> > +{
> > +	return (struct virtio_vsock_hdr *)skb->head;
> > +}
> > +
> > +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb)
> > +{
> > +	return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY;
> > +}
> 
> I'm sorry for the late feedback. The above is extremelly risky: if the
> skb will land later into the networking stack, we could experience the
> most difficult to track bugs.
> 
> You should use the skb control buffer instead (skb->cb), with the
> additional benefit you could use e.g. bool - the compiler could emit
> better code to manipulate such fields - and you will not need to clear
> the field before release nor enqueue.
> 
> [...]
> 

Hey Paolo, thank you for the review. For my own learning, this would
happen presumably when the skb is dropped? And I assume we don't see
this in sockmap because it is always cleared before leaving sockmap's
hands? I sanity checked this patch with an out-of-tree patch I have that
uses the networking stack, but I suspect I didn't see issues because my
test harness didn't induce dropping...

I originally avoided skb->cb because the reply flag is set at allocation
and would potentially be clobbered by a pass through the networking
stack. The reply flag would be used after a pass through the networking
stack (e.g., during transmission at the device level and when sockets
close while skbs are still queued for xmit).

I suppose using skb->cb would look like something like this:
- use skb_clone() for reply skbs
- set reply on the cloned sk_buff skb->cb
- keep a hashmap mapping original skb to cloned skb (is there a better
  way?)
- when choosing to apply reply logic, if skb_cloned() refer to the
  hashmap

Is there a better/simpler way to maintain skb->cb?

> > @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  				   size_t len)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt;
> >  	size_t bytes, total = 0;
> > -	u32 free_space;
> > +	struct sk_buff *skb;
> >  	int err = -EFAULT;
> > +	u32 free_space;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> > -	while (total < len && !list_empty(&vvs->rx_queue)) {
> > -		pkt = list_first_entry(&vvs->rx_queue,
> > -				       struct virtio_vsock_pkt, list);
> > +	while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> > +		skb = __skb_dequeue(&vvs->rx_queue);
> 
> Here the locking schema is confusing. It looks like vvs->rx_queue is
> under vvs->rx_lock protection, so the above should be skb_queue_empty()
> instead of the lockless variant.
> 
> [...]
> 
> > @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> >  static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt, *tmp;
> >  
> >  	/* We don't need to take rx_lock, as the socket is closing and we are
> >  	 * removing it.
> >  	 */
> > -	list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) {
> > -		list_del(&pkt->list);
> > -		virtio_transport_free_pkt(pkt);
> > -	}
> > -
> > +	virtio_vsock_skb_queue_purge(&vvs->rx_queue);
> 
> Still assuming rx_queue is under the rx_lock, given you don't need the
> locking here as per the above comment, you should use the lockless
> purge variant.
> 

Good catch, thanks!

Thanks again,
Bobby