[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6423160f46e56_1bf1c92089e@willemb.c.googlers.com.notmuch>
Date: Tue, 28 Mar 2023 12:30:07 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: 沈安琪(凛玥) <amy.saq@...group.com>,
netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, mst@...hat.com,
davem@...emloft.net, jasowang@...hat.com,
谈鉴锋 <henry.tjf@...group.com>,
沈安琪(凛玥) <amy.saq@...group.com>
Subject: RE: [PATCH v6] net/packet: support mergeable feature of virtio
沈安琪(凛玥) wrote:
> From: Jianfeng Tan <henry.tjf@...group.com>
>
> Packet sockets, like tap, can be used as the backend for kernel vhost.
> In packet sockets, virtio net header size is currently hardcoded to be
> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
> always the case: some virtio features, such as mrg_rxbuf, need virtio
> net header to be 12-byte long.
>
> Mergeable buffers, as a virtio feature, is worthy of supporting: packets
> that are larger than one-mbuf size will be dropped in vhost worker's
> handle_rx if mrg_rxbuf feature is not used, but large packets
> cannot be avoided and increasing mbuf's size is not economical.
>
> With this virtio feature enabled by virtio-user, packet sockets with
> hardcoded 10-byte virtio net header will parse mac head incorrectly in
> packet_snd by taking the last two bytes of virtio net header as part of
> mac header.
> This incorrect mac header parsing will cause packet to be dropped due to
> invalid ether head checking in later under-layer device packet receiving.
>
> By adding extra field vnet_hdr_sz with utilizing holes in struct
> packet_sock to record currently used virtio net header size and supporting
> extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
> sockets can know the exact length of virtio net header that virtio user
> gives.
> In packet_snd, tpacket_snd and packet_recvmsg, instead of using
> hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
> corresponding packet_sock, and parse mac header correctly based on this
> information to avoid the packets being mistakenly dropped.
>
> Signed-off-by: Jianfeng Tan <henry.tjf@...group.com>
> Co-developed-by: Anqi Shen <amy.saq@...group.com>
> Signed-off-by: Anqi Shen <amy.saq@...group.com>
> ---
>
> V5 -> V6:
> * rebase patch based on 6.3-rc2
>
> include/uapi/linux/if_packet.h | 1 +
> net/packet/af_packet.c | 88 +++++++++++++++++++++-------------
> net/packet/diag.c | 2 +-
> net/packet/internal.h | 2 +-
> 4 files changed, 59 insertions(+), 34 deletions(-)
>
> diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> index 78c981d6a9d4..9efc42382fdb 100644
> --- a/include/uapi/linux/if_packet.h
> +++ b/include/uapi/linux/if_packet.h
> @@ -59,6 +59,7 @@ struct sockaddr_ll {
> #define PACKET_ROLLOVER_STATS 21
> #define PACKET_FANOUT_DATA 22
> #define PACKET_IGNORE_OUTGOING 23
> +#define PACKET_VNET_HDR_SZ 24
>
> #define PACKET_FANOUT_HASH 0
> #define PACKET_FANOUT_LB 1
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 497193f73030..b13536767cbe 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2090,18 +2090,18 @@ static unsigned int run_filter(struct sk_buff *skb,
> }
>
> @@ -3925,11 +3938,19 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval,
> if (copy_from_sockptr(&val, optval, sizeof(val)))
> return -EFAULT;
>
> + hdr_len = val ? sizeof(struct virtio_net_hdr) : 0;
> + if (optname == PACKET_VNET_HDR_SZ) {
> + if (val && val != sizeof(struct virtio_net_hdr) &&
> + val != sizeof(struct virtio_net_hdr_mrg_rxbuf))
> + return -EINVAL;
> + hdr_len = val;
> + }
> +
Since the below requires a change, I'd prefer
if (optname == PACKET_VNET_HDR_SZ) {
...
} else {
hdr_len = val ? sizeof(struct virtio_net_hdr) : 0;
}
Rather than first doing that and then modifying val in the case of
PACKET_VNET_HDR_SZ.
> lock_sock(sk);
> if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) {
> ret = -EBUSY;
> } else {
> - packet_sock_flag_set(po, PACKET_SOCK_HAS_VNET_HDR, val);
> + po->vnet_hdr_sz = hdr_len;
Needs to use WRITE_ONCE to match the READ_ONCE elsewhere
> ret = 0;
> }
> release_sock(sk);
> @@ -4062,7 +4083,10 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
> val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV);
> break;
> case PACKET_VNET_HDR:
> - val = packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR);
> + val = !!po->vnet_hdr_sz;
> + break;
> + case PACKET_VNET_HDR_SZ:
> + val = po->vnet_hdr_sz;
> break;
> case PACKET_VERSION:
> val = po->tp_version;
> diff --git a/net/packet/diag.c b/net/packet/diag.c
> index de4ced5cf3e8..5cf13cf0b862 100644
> --- a/net/packet/diag.c
> +++ b/net/packet/diag.c
> @@ -27,7 +27,7 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb)
> pinfo.pdi_flags |= PDI_AUXDATA;
> if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV))
> pinfo.pdi_flags |= PDI_ORIGDEV;
> - if (packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR))
> + if (po->vnet_hdr_sz)
> pinfo.pdi_flags |= PDI_VNETHDR;
> if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS))
> pinfo.pdi_flags |= PDI_LOSS;
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 27930f69f368..63f4865202c1 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -118,6 +118,7 @@ struct packet_sock {
> struct mutex pg_vec_lock;
> unsigned long flags;
> int ifindex; /* bound device */
> + u8 vnet_hdr_sz;
Did you use pahole to find a spot that is currently padding?
This looks good in principle, an int followed by __be16.
> __be16 num;
> struct packet_rollover *rollover;
> struct packet_mclist *mclist;
> @@ -139,7 +140,6 @@ enum packet_sock_flags {
> PACKET_SOCK_AUXDATA,
> PACKET_SOCK_TX_HAS_OFF,
> PACKET_SOCK_TP_LOSS,
> - PACKET_SOCK_HAS_VNET_HDR,
> PACKET_SOCK_RUNNING,
> PACKET_SOCK_PRESSURE,
> PACKET_SOCK_QDISC_BYPASS,
> --
> 2.19.1.6.gb485710b
>
Powered by blists - more mailing lists