[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <640f2cae6b7de_28b1eb208ed@willemb.c.googlers.com.notmuch>
Date: Mon, 13 Mar 2023 10:01:18 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: "Michael S. Tsirkin" <mst@...hat.com>,
沈安琪(凛玥) <amy.saq@...group.com>
Cc: netdev@...r.kernel.org, willemdebruijn.kernel@...il.com,
davem@...emloft.net, jasowang@...hat.com,
谈鉴锋 <henry.tjf@...group.com>
Subject: Re: [PATCH v4] net/packet: support mergeable feature of virtio
Michael S. Tsirkin wrote:
> On Mon, Mar 13, 2023 at 07:58:25PM +0800, 沈安琪(凛玥) wrote:
> >
> > 在 2023/3/13 下午6:27, Michael S. Tsirkin 写道:
> > > On Mon, Mar 13, 2023 at 04:00:06PM +0800, 沈安琪(凛玥) wrote:
> > > > 在 2023/3/13 下午2:51, Michael S. Tsirkin 写道:
> > > > > On Mon, Mar 13, 2023 at 02:31:13PM +0800, 沈安琪(凛玥) wrote:
> > > > > > From: Jianfeng Tan <henry.tjf@...group.com>
> > > > > >
> > > > > > Packet sockets, like tap, can be used as the backend for kernel vhost.
> > > > > > In packet sockets, virtio net header size is currently hardcoded to be
> > > > > > the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
> > > > > > always the case: some virtio features, such as mrg_rxbuf, need virtio
> > > > > > net headers to be 12-byte long.
> > > > > >
> > > > > > Mergeable buffers, as a virtio feature, is worthy of supporting: packets
> > > > > > that are larger than one-mbuf size will be dropped in vhost worker's
> > > > > > handle_rx if mrg_rxbuf feature is not used, but large packets
> > > > > > cannot be avoided and increasing mbuf's size is not economical.
> > > > > >
> > > > > > With this virtio feature enabled by virtio-user, packet sockets with
> > > > > > hardcoded 10-byte virtio net header will parse mac head incorrectly in
> > > > > > packet_snd by taking the last two bytes of virtio net header as part of
> > > > > > mac header.
> > > > > > This incorrect mac header parsing will cause packet to be dropped due to
> > > > > > invalid ether head checking in later under-layer device packet receiving.
> > > > > >
> > > > > > By adding extra field vnet_hdr_sz with utilizing holes in struct
> > > > > > packet_sock to record currently used virtio net header size and supporting
> > > > > > extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
> > > > > > sockets can know the exact length of virtio net header that virtio user
> > > > > > gives.
> > > > > > In packet_snd, tpacket_snd and packet_recvmsg, instead of using
> > > > > > hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
> > > > > > corresponding packet_sock, and parse mac header correctly based on this
> > > > > > information to avoid the packets being mistakenly dropped.
> > > > > >
> > > > > > Signed-off-by: Jianfeng Tan <henry.tjf@...group.com>
> > > > > > Co-developed-by: Anqi Shen <amy.saq@...group.com>
> > > > > > Signed-off-by: Anqi Shen <amy.saq@...group.com>
> > > > > > ---
> > > > > >
> > > > > > V3 -> V4:
> > > > > > * read po->vnet_hdr_sz once during vnet_hdr_sz and use vnet_hdr_sz locally
> > > > > > to avoid race condition;
> > > > > Wait a second. What kind of race condition? And what happens if
> > > > > it does trigger? By once do you mean this:
> > > > > int vnet_hdr_sz = po->vnet_hdr_sz;
> > > > > ? This is not guaranteed to read the value once, compiler is free
> > > > > to read as many times as it likes.
> > > > >
> > > > > See e.g. memory barriers doc:
> > > > >
> > > > > (*) It _must_not_ be assumed that the compiler will do what you want
> > > > > with memory references that are not protected by READ_ONCE() and
> > > > > WRITE_ONCE(). Without them, the compiler is within its rights to
> > > > > do all sorts of "creative" transformations, which are covered in
> > > > > the COMPILER BARRIER section.
> > > > >
> > > > The expression "read once" may be a little confused here. The race condition
> > > > we want to avoid is:
> > > >
> > > > if (po->vnet_hdr_sz != 0) {
> > > > vnet_hdr_sz = po->vnet_hdr_sz;
> > > > ...
> > > > }
> > > >
> > > > Here we read po->vnet_hdr_sz for if condition first and then read it again to assign the value of vnet_hdr_sz; according to Willem's comment here, it might be a race condition with an update to virtio net header length, causing the vnet header size we used to check in if condition is not exactly the value we use as vnet_hdr_sz later.
> > > >
> > > Above comment seems to apply.
Good point. This should use READ_ONCE to be sure.
The suggestion was based on a similar need in has_vnet_hdr itself.
da7c9561015e ("packet: only test po->has_vnet_hdr once in packet_snd").
Powered by blists - more mailing lists