lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZUlhu4hlTaqR3CTh@google.com>
Date:   Mon, 6 Nov 2023 13:59:23 -0800
From:   Stanislav Fomichev <sdf@...gle.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     David Ahern <dsahern@...nel.org>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-media@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Shuah Khan <shuah@...nel.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        "Christian König" <christian.koenig@....com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Jeroen de Borst <jeroendb@...gle.com>,
        Praveen Kaligineedi <pkaligineedi@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags

On 11/06, Mina Almasry wrote:
> On Mon, Nov 6, 2023 at 11:34 AM David Ahern <dsahern@...nel.org> wrote:
> >
> > On 11/6/23 11:47 AM, Stanislav Fomichev wrote:
> > > On 11/05, Mina Almasry wrote:
> > >> For device memory TCP, we expect the skb headers to be available in host
> > >> memory for access, and we expect the skb frags to be in device memory
> > >> and unaccessible to the host. We expect there to be no mixing and
> > >> matching of device memory frags (unaccessible) with host memory frags
> > >> (accessible) in the same skb.
> > >>
> > >> Add a skb->devmem flag which indicates whether the frags in this skb
> > >> are device memory frags or not.
> > >>
> > >> __skb_fill_page_desc() now checks frags added to skbs for page_pool_iovs,
> > >> and marks the skb as skb->devmem accordingly.
> > >>
> > >> Add checks through the network stack to avoid accessing the frags of
> > >> devmem skbs and avoid coalescing devmem skbs with non devmem skbs.
> > >>
> > >> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> > >> Signed-off-by: Kaiyuan Zhang <kaiyuanz@...gle.com>
> > >> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> > >>
> > >> ---
> > >>  include/linux/skbuff.h | 14 +++++++-
> > >>  include/net/tcp.h      |  5 +--
> > >>  net/core/datagram.c    |  6 ++++
> > >>  net/core/gro.c         |  5 ++-
> > >>  net/core/skbuff.c      | 77 ++++++++++++++++++++++++++++++++++++------
> > >>  net/ipv4/tcp.c         |  6 ++++
> > >>  net/ipv4/tcp_input.c   | 13 +++++--
> > >>  net/ipv4/tcp_output.c  |  5 ++-
> > >>  net/packet/af_packet.c |  4 +--
> > >>  9 files changed, 115 insertions(+), 20 deletions(-)
> > >>
> > >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > >> index 1fae276c1353..8fb468ff8115 100644
> > >> --- a/include/linux/skbuff.h
> > >> +++ b/include/linux/skbuff.h
> > >> @@ -805,6 +805,8 @@ typedef unsigned char *sk_buff_data_t;
> > >>   *  @csum_level: indicates the number of consecutive checksums found in
> > >>   *          the packet minus one that have been verified as
> > >>   *          CHECKSUM_UNNECESSARY (max 3)
> > >> + *  @devmem: indicates that all the fragments in this skb are backed by
> > >> + *          device memory.
> > >>   *  @dst_pending_confirm: need to confirm neighbour
> > >>   *  @decrypted: Decrypted SKB
> > >>   *  @slow_gro: state present at GRO time, slower prepare step required
> > >> @@ -991,7 +993,7 @@ struct sk_buff {
> > >>  #if IS_ENABLED(CONFIG_IP_SCTP)
> > >>      __u8                    csum_not_inet:1;
> > >>  #endif
> > >> -
> > >> +    __u8                    devmem:1;
> > >>  #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
> > >>      __u16                   tc_index;       /* traffic control index */
> > >>  #endif
> > >> @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb)
> > >>              __skb_zcopy_downgrade_managed(skb);
> > >>  }
> > >>
> > >> +/* Return true if frags in this skb are not readable by the host. */
> > >> +static inline bool skb_frags_not_readable(const struct sk_buff *skb)
> > >> +{
> > >> +    return skb->devmem;
> > >
> > > bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_readable'?
> > > It better communicates the fact that the stack shouldn't dereference the
> > > frags (because it has 'devmem' fragments or for some other potential
> > > future reason).
> >
> > +1.
> >
> > Also, the flag on the skb is an optimization - a high level signal that
> > one or more frags is in unreadable memory. There is no requirement that
> > all of the frags are in the same memory type.

David: maybe there should be such a requirement (that they all are
unreadable)? Might be easier to support initially; we can relax later
on.

> The flag indicates that the skb contains all devmem dma-buf memory
> specifically, not generic 'not_readable' frags as the comment says:
> 
> + *     @devmem: indicates that all the fragments in this skb are backed by
> + *             device memory.
> 
> The reason it's not a generic 'not_readable' flag is because handing
> off a generic not_readable skb to the userspace is semantically not
> what we're doing. recvmsg() is augmented in this patch series to
> return a devmem skb to the user via a cmsg_devmem struct which refers
> specifically to the memory in the dma-buf. recvmsg() in this patch
> series is not augmented to give any 'not_readable' skb to the
> userspace.
> 
> IMHO skb->devmem + an skb_frags_not_readable() as implemented is
> correct. If a new type of unreadable skbs are introduced to the stack,
> I imagine the stack would implement:
> 
> 1. new header flag: skb->newmem
> 2.
> 
> static inline bool skb_frags_not_readable(const struct skb_buff *skb)
> {
>     return skb->devmem || skb->newmem;
> }
> 
> 3. tcp_recvmsg_devmem() would handle skb->devmem skbs is in this patch
> series, but tcp_recvmsg_newmem() would handle skb->newmem skbs.

You copy it to the userspace in a special way because your frags
are page_is_page_pool_iov(). I agree with David, the skb bit is
just and optimization.

For most of the core stack, it doesn't matter why your skb is not
readable. For a few places where it matters (recvmsg?), you can
double-check your frags (all or some) with page_is_page_pool_iov.

Unrelated: we probably need socket to dmabuf association as well (via
netlink or something).
We are fundamentally receiving into and sending from a dmabuf (devmem ==
dmabuf).
And once you have this association, recvmsg shouldn't need any new
special flags.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ