lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <6423160f46e56_1bf1c92089e@willemb.c.googlers.com.notmuch> Date: Tue, 28 Mar 2023 12:30:07 -0400 From: Willem de Bruijn <willemdebruijn.kernel@...il.com> To: 沈安琪(凛玥) <amy.saq@...group.com>, netdev@...r.kernel.org Cc: willemdebruijn.kernel@...il.com, mst@...hat.com, davem@...emloft.net, jasowang@...hat.com, 谈鉴锋 <henry.tjf@...group.com>, 沈安琪(凛玥) <amy.saq@...group.com> Subject: RE: [PATCH v6] net/packet: support mergeable feature of virtio 沈安琪(凛玥) wrote: > From: Jianfeng Tan <henry.tjf@...group.com> > > Packet sockets, like tap, can be used as the backend for kernel vhost. > In packet sockets, virtio net header size is currently hardcoded to be > the size of struct virtio_net_hdr, which is 10 bytes; however, it is not > always the case: some virtio features, such as mrg_rxbuf, need virtio > net header to be 12-byte long. > > Mergeable buffers, as a virtio feature, is worthy of supporting: packets > that are larger than one-mbuf size will be dropped in vhost worker's > handle_rx if mrg_rxbuf feature is not used, but large packets > cannot be avoided and increasing mbuf's size is not economical. > > With this virtio feature enabled by virtio-user, packet sockets with > hardcoded 10-byte virtio net header will parse mac head incorrectly in > packet_snd by taking the last two bytes of virtio net header as part of > mac header. > This incorrect mac header parsing will cause packet to be dropped due to > invalid ether head checking in later under-layer device packet receiving. > > By adding extra field vnet_hdr_sz with utilizing holes in struct > packet_sock to record currently used virtio net header size and supporting > extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet > sockets can know the exact length of virtio net header that virtio user > gives. > In packet_snd, tpacket_snd and packet_recvmsg, instead of using > hardcoded virtio net header size, it can get the exact vnet_hdr_sz from > corresponding packet_sock, and parse mac header correctly based on this > information to avoid the packets being mistakenly dropped. > > Signed-off-by: Jianfeng Tan <henry.tjf@...group.com> > Co-developed-by: Anqi Shen <amy.saq@...group.com> > Signed-off-by: Anqi Shen <amy.saq@...group.com> > --- > > V5 -> V6: > * rebase patch based on 6.3-rc2 > > include/uapi/linux/if_packet.h | 1 + > net/packet/af_packet.c | 88 +++++++++++++++++++++------------- > net/packet/diag.c | 2 +- > net/packet/internal.h | 2 +- > 4 files changed, 59 insertions(+), 34 deletions(-) > > diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h > index 78c981d6a9d4..9efc42382fdb 100644 > --- a/include/uapi/linux/if_packet.h > +++ b/include/uapi/linux/if_packet.h > @@ -59,6 +59,7 @@ struct sockaddr_ll { > #define PACKET_ROLLOVER_STATS 21 > #define PACKET_FANOUT_DATA 22 > #define PACKET_IGNORE_OUTGOING 23 > +#define PACKET_VNET_HDR_SZ 24 > > #define PACKET_FANOUT_HASH 0 > #define PACKET_FANOUT_LB 1 > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 497193f73030..b13536767cbe 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -2090,18 +2090,18 @@ static unsigned int run_filter(struct sk_buff *skb, > } > > @@ -3925,11 +3938,19 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, > if (copy_from_sockptr(&val, optval, sizeof(val))) > return -EFAULT; > > + hdr_len = val ? sizeof(struct virtio_net_hdr) : 0; > + if (optname == PACKET_VNET_HDR_SZ) { > + if (val && val != sizeof(struct virtio_net_hdr) && > + val != sizeof(struct virtio_net_hdr_mrg_rxbuf)) > + return -EINVAL; > + hdr_len = val; > + } > + Since the below requires a change, I'd prefer if (optname == PACKET_VNET_HDR_SZ) { ... } else { hdr_len = val ? sizeof(struct virtio_net_hdr) : 0; } Rather than first doing that and then modifying val in the case of PACKET_VNET_HDR_SZ. > lock_sock(sk); > if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { > ret = -EBUSY; > } else { > - packet_sock_flag_set(po, PACKET_SOCK_HAS_VNET_HDR, val); > + po->vnet_hdr_sz = hdr_len; Needs to use WRITE_ONCE to match the READ_ONCE elsewhere > ret = 0; > } > release_sock(sk); > @@ -4062,7 +4083,10 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, > val = packet_sock_flag(po, PACKET_SOCK_ORIGDEV); > break; > case PACKET_VNET_HDR: > - val = packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR); > + val = !!po->vnet_hdr_sz; > + break; > + case PACKET_VNET_HDR_SZ: > + val = po->vnet_hdr_sz; > break; > case PACKET_VERSION: > val = po->tp_version; > diff --git a/net/packet/diag.c b/net/packet/diag.c > index de4ced5cf3e8..5cf13cf0b862 100644 > --- a/net/packet/diag.c > +++ b/net/packet/diag.c > @@ -27,7 +27,7 @@ static int pdiag_put_info(const struct packet_sock *po, struct sk_buff *nlskb) > pinfo.pdi_flags |= PDI_AUXDATA; > if (packet_sock_flag(po, PACKET_SOCK_ORIGDEV)) > pinfo.pdi_flags |= PDI_ORIGDEV; > - if (packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR)) > + if (po->vnet_hdr_sz) > pinfo.pdi_flags |= PDI_VNETHDR; > if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS)) > pinfo.pdi_flags |= PDI_LOSS; > diff --git a/net/packet/internal.h b/net/packet/internal.h > index 27930f69f368..63f4865202c1 100644 > --- a/net/packet/internal.h > +++ b/net/packet/internal.h > @@ -118,6 +118,7 @@ struct packet_sock { > struct mutex pg_vec_lock; > unsigned long flags; > int ifindex; /* bound device */ > + u8 vnet_hdr_sz; Did you use pahole to find a spot that is currently padding? This looks good in principle, an int followed by __be16. > __be16 num; > struct packet_rollover *rollover; > struct packet_mclist *mclist; > @@ -139,7 +140,6 @@ enum packet_sock_flags { > PACKET_SOCK_AUXDATA, > PACKET_SOCK_TX_HAS_OFF, > PACKET_SOCK_TP_LOSS, > - PACKET_SOCK_HAS_VNET_HDR, > PACKET_SOCK_RUNNING, > PACKET_SOCK_PRESSURE, > PACKET_SOCK_QDISC_BYPASS, > -- > 2.19.1.6.gb485710b >
Powered by blists - more mailing lists