[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izMrnVUfbbS=OcJ6JT9SZRRfZ2MC7UnggthpZT=zf2BGLA@mail.gmail.com>
Date: Mon, 6 Nov 2023 12:31:44 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: David Ahern <dsahern@...nel.org>
Cc: Stanislav Fomichev <sdf@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Arnd Bergmann <arnd@...db.de>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Shuah Khan <shuah@...nel.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Shakeel Butt <shakeelb@...gle.com>,
Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>,
Willem de Bruijn <willemb@...gle.com>,
Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags
On Mon, Nov 6, 2023 at 11:34 AM David Ahern <dsahern@...nel.org> wrote:
>
> On 11/6/23 11:47 AM, Stanislav Fomichev wrote:
> > On 11/05, Mina Almasry wrote:
> >> For device memory TCP, we expect the skb headers to be available in host
> >> memory for access, and we expect the skb frags to be in device memory
> >> and unaccessible to the host. We expect there to be no mixing and
> >> matching of device memory frags (unaccessible) with host memory frags
> >> (accessible) in the same skb.
> >>
> >> Add a skb->devmem flag which indicates whether the frags in this skb
> >> are device memory frags or not.
> >>
> >> __skb_fill_page_desc() now checks frags added to skbs for page_pool_iovs,
> >> and marks the skb as skb->devmem accordingly.
> >>
> >> Add checks through the network stack to avoid accessing the frags of
> >> devmem skbs and avoid coalescing devmem skbs with non devmem skbs.
> >>
> >> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> >> Signed-off-by: Kaiyuan Zhang <kaiyuanz@...gle.com>
> >> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> >>
> >> ---
> >> include/linux/skbuff.h | 14 +++++++-
> >> include/net/tcp.h | 5 +--
> >> net/core/datagram.c | 6 ++++
> >> net/core/gro.c | 5 ++-
> >> net/core/skbuff.c | 77 ++++++++++++++++++++++++++++++++++++------
> >> net/ipv4/tcp.c | 6 ++++
> >> net/ipv4/tcp_input.c | 13 +++++--
> >> net/ipv4/tcp_output.c | 5 ++-
> >> net/packet/af_packet.c | 4 +--
> >> 9 files changed, 115 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 1fae276c1353..8fb468ff8115 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -805,6 +805,8 @@ typedef unsigned char *sk_buff_data_t;
> >> * @csum_level: indicates the number of consecutive checksums found in
> >> * the packet minus one that have been verified as
> >> * CHECKSUM_UNNECESSARY (max 3)
> >> + * @devmem: indicates that all the fragments in this skb are backed by
> >> + * device memory.
> >> * @dst_pending_confirm: need to confirm neighbour
> >> * @decrypted: Decrypted SKB
> >> * @slow_gro: state present at GRO time, slower prepare step required
> >> @@ -991,7 +993,7 @@ struct sk_buff {
> >> #if IS_ENABLED(CONFIG_IP_SCTP)
> >> __u8 csum_not_inet:1;
> >> #endif
> >> -
> >> + __u8 devmem:1;
> >> #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
> >> __u16 tc_index; /* traffic control index */
> >> #endif
> >> @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb)
> >> __skb_zcopy_downgrade_managed(skb);
> >> }
> >>
> >> +/* Return true if frags in this skb are not readable by the host. */
> >> +static inline bool skb_frags_not_readable(const struct sk_buff *skb)
> >> +{
> >> + return skb->devmem;
> >
> > bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_readable'?
> > It better communicates the fact that the stack shouldn't dereference the
> > frags (because it has 'devmem' fragments or for some other potential
> > future reason).
>
> +1.
>
> Also, the flag on the skb is an optimization - a high level signal that
> one or more frags is in unreadable memory. There is no requirement that
> all of the frags are in the same memory type.
The flag indicates that the skb contains all devmem dma-buf memory
specifically, not generic 'not_readable' frags as the comment says:
+ * @devmem: indicates that all the fragments in this skb are backed by
+ * device memory.
The reason it's not a generic 'not_readable' flag is because handing
off a generic not_readable skb to the userspace is semantically not
what we're doing. recvmsg() is augmented in this patch series to
return a devmem skb to the user via a cmsg_devmem struct which refers
specifically to the memory in the dma-buf. recvmsg() in this patch
series is not augmented to give any 'not_readable' skb to the
userspace.
IMHO skb->devmem + an skb_frags_not_readable() as implemented is
correct. If a new type of unreadable skbs are introduced to the stack,
I imagine the stack would implement:
1. new header flag: skb->newmem
2.
static inline bool skb_frags_not_readable(const struct skb_buff *skb)
{
return skb->devmem || skb->newmem;
}
3. tcp_recvmsg_devmem() would handle skb->devmem skbs is in this patch
series, but tcp_recvmsg_newmem() would handle skb->newmem skbs.
--
Thanks,
Mina
Powered by blists - more mailing lists