lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6423160f46e56_1bf1c92089e@willemb.c.googlers.com.notmuch>
Date:   Tue, 28 Mar 2023 12:30:07 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     沈安琪(凛玥) <amy.saq@...group.com>,
        netdev@...r.kernel.org
Cc:     willemdebruijn.kernel@...il.com, mst@...hat.com,
        davem@...emloft.net, jasowang@...hat.com,
        谈鉴锋 <henry.tjf@...group.com>,
        沈安琪(凛玥) <amy.saq@...group.com>
Subject: RE: [PATCH v6] net/packet: support mergeable feature of virtio

沈安琪(凛玥) wrote:
> From: Jianfeng Tan <henry.tjf@...group.com>
> 
> Packet sockets, like tap, can be used as the backend for kernel vhost.
> In packet sockets, virtio net header size is currently hardcoded to be
> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
> always the case: some virtio features, such as mrg_rxbuf, need virtio
> net header to be 12-byte long.
> 
> Mergeable buffers, as a virtio feature, is worthy of supporting: packets
> that are larger than one-mbuf size will be dropped in vhost worker's
> handle_rx if mrg_rxbuf feature is not used, but large packets
> cannot be avoided and increasing mbuf's size is not economical.
> 
> With this virtio feature enabled by virtio-user, packet sockets with
> hardcoded 10-byte virtio net header will parse mac head incorrectly in
> packet_snd by taking the last two bytes of virtio net header as part of
> mac header.
> This incorrect mac header parsing will cause packet to be dropped due to
> invalid ether head checking in later under-layer device packet receiving.
> 
> By adding extra field vnet_hdr_sz with utilizing holes in struct
> packet_sock to record currently used virtio net header size and supporting
> extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
> sockets can know the exact length of virtio net header that virtio user
> gives.
> In packet_snd, tpacket_snd and packet_recvmsg, instead of using
> hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
> corresponding packet_sock, and parse mac header correctly based on this
> information to avoid the packets being mistakenly dropped.
> 
> Signed-off-by: Jianfeng Tan <henry.tjf@...group.com>
> Co-developed-by: Anqi Shen <amy.saq@...group.com>
> Signed-off-by: Anqi Shen <amy.saq@...group.com>
> ---
> 
> V5 -> V6:
> * rebase patch based on 6.3-rc2
> 
>  include/uapi/linux/if_packet.h |  1 +
>  net/packet/af_packet.c         | 88 +++++++++++++++++++++-------------
>  net/packet/diag.c              |  2 +-
>  net/packet/internal.h          |  2 +-
>  4 files changed, 59 insertions(+), 34 deletions(-)
> 
> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> index 78c981d6a9d4..9efc42382fdb 100644
> --- a/include/uapi/linux/if_packet.h
> +++ b/include/uapi/linux/if_packet.h
> @@ -59,6 +59,7 @@ struct sockaddr_ll {
>  #define PACKET_ROLLOVER_STATS		21
>  #define PACKET_FANOUT_DATA		22
>  #define PACKET_IGNORE_OUTGOING		23
> +#define PACKET_VNET_HDR_SZ		24
>  
>  #define PACKET_FANOUT_HASH		0
>  #define PACKET_FANOUT_LB		1
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 497193f73030..b13536767cbe 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2090,18 +2090,18 @@ static unsigned int run_filter(struct sk_buff *skb,
>  }
>  

> @@ -3925,11 +3938,19 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval,
>  		if (copy_from_sockptr(&val, optval, sizeof(val)))
>  			return -EFAULT;
>  
> +		hdr_len = val ? sizeof(struct virtio_net_hdr) : 0;
> +		if (optname == PACKET_VNET_HDR_SZ) {
> +			if (val && val != sizeof(struct virtio_net_hdr) &&
> +			    val != sizeof(struct virtio_net_hdr_mrg_rxbuf))
> +				return -EINVAL;
> +			hdr_len = val;
> +		}
> +

Since the below requires a change, I'd prefer

    if (optname == PACKET_VNET_HDR_SZ) {
            ...
    } else {
            hdr_len = val ? sizeof(struct virtio_net_hdr) : 0;
    }

Rather than first doing that and then modifying val in the case of
PACKET_VNET_HDR_SZ.

>  		lock_sock(sk);
>  		if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) {
>  			ret = -EBUSY;
>  		} else {
> -			packet_sock_flag_set(po, PACKET_SOCK_HAS_VNET_HDR, val);
> +			po->vnet_hdr_sz = hdr_len;

Needs to use WRITE_ONCE to match the READ_ONCE elsewhere

>  			ret = 0;
>  		}
>  		release_sock(sk);
> @@ -4062,7 +4083,10 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
>  		val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV);
>  		break;
>  	case PACKET_VNET_HDR:
> -		val = packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR);
> +		val = !!po->vnet_hdr_sz;
> +		break;
> +	case PACKET_VNET_HDR_SZ:
> +		val = po->vnet_hdr_sz;
>  		break;
>  	case PACKET_VERSION:
>  		val = po->tp_version;
> diff --git a/net/packet/diag.c b/net/packet/diag.c
> index de4ced5cf3e8..5cf13cf0b862 100644
> --- a/net/packet/diag.c
> +++ b/net/packet/diag.c
> @@ -27,7 +27,7 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb)
>  		pinfo.pdi_flags |= PDI_AUXDATA;
>  	if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV))
>  		pinfo.pdi_flags |= PDI_ORIGDEV;
> -	if (packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR))
> +	if (po->vnet_hdr_sz)
>  		pinfo.pdi_flags |= PDI_VNETHDR;
>  	if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS))
>  		pinfo.pdi_flags |= PDI_LOSS;
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 27930f69f368..63f4865202c1 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -118,6 +118,7 @@ struct packet_sock {
>  	struct mutex		pg_vec_lock;
>  	unsigned long		flags;
>  	int			ifindex;	/* bound device		*/
> +	u8			vnet_hdr_sz;

Did you use pahole to find a spot that is currently padding?

This looks good in principle, an int followed by __be16.

>  	__be16			num;
>  	struct packet_rollover	*rollover;
>  	struct packet_mclist	*mclist;
> @@ -139,7 +140,6 @@ enum packet_sock_flags {
>  	PACKET_SOCK_AUXDATA,
>  	PACKET_SOCK_TX_HAS_OFF,
>  	PACKET_SOCK_TP_LOSS,
> -	PACKET_SOCK_HAS_VNET_HDR,
>  	PACKET_SOCK_RUNNING,
>  	PACKET_SOCK_PRESSURE,
>  	PACKET_SOCK_QDISC_BYPASS,
> -- 
> 2.19.1.6.gb485710b
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ