lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Jul 2022 11:35:54 +0200
From:   Willem de Bruijn <willemb@...gle.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Pavel Begunkov <asml.silence@...il.com>, io-uring@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        "David S . Miller" <davem@...emloft.net>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Jens Axboe <axboe@...nel.dk>, David Ahern <dsahern@...nel.org>,
        kernel-team@...com
Subject: Re: [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc

On Tue, Jul 19, 2022 at 3:54 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Tue, 12 Jul 2022 21:52:25 +0100 Pavel Begunkov wrote:
> > Even when zerocopy transmission is requested and possible,
> > __ip_append_data() will still copy a small chunk of data just because it
> > allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles
> > on copy and iter manipulations and also misalignes potentially aligned
> > data. Avoid such coies. And as a bonus we can allocate smaller skb.
>
> s/coies/copies/ can fix when applying
>
> >
> > Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
> > ---
> >  net/ipv4/ip_output.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 00b4bf26fd93..581d1e233260 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk,
> >       struct inet_sock *inet = inet_sk(sk);
> >       struct ubuf_info *uarg = NULL;
> >       struct sk_buff *skb;
> > -
> >       struct ip_options *opt = cork->opt;
> >       int hh_len;
> >       int exthdrlen;
> > @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk,
> >       int copy;
> >       int err;
> >       int offset = 0;
> > +     bool zc = false;
> >       unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
> >       int csummode = CHECKSUM_NONE;
> >       struct rtable *rt = (struct rtable *)cork->dst;
> > @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk,
> >               if (rt->dst.dev->features & NETIF_F_SG &&
> >                   csummode == CHECKSUM_PARTIAL) {
> >                       paged = true;
> > +                     zc = true;
> >               } else {
> >                       uarg->zerocopy = 0;
> >                       skb_zcopy_set(skb, uarg, &extra_uref);
> > @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk,
> >                                (fraglen + alloc_extra < SKB_MAX_ALLOC ||
> >                                 !(rt->dst.dev->features & NETIF_F_SG)))
> >                               alloclen = fraglen;
> > -                     else {
> > +                     else if (!zc) {
> >                               alloclen = min_t(int, fraglen, MAX_HEADER);
>
> Willem, I think this came in with your GSO work, is there a reason we
> use MAX_HEADER here? I thought MAX_HEADER is for headers (i.e. more or
> less to be reserved) not for the min amount of data to be included.
>
> I wanna make sure we're not missing something about GSO here.
>
> Otherwise I don't think we need the extra branch but that can
> be a follow up.

The change was introduced for UDP GSO, to avoid copying most payload
on software segmentation:

"
commit 15e36f5b8e982debe43e425d2e12d34e022d51e9
Author: Willem de Bruijn <willemb@...gle.com>
Date:   Thu Apr 26 13:42:19 2018 -0400

    udp: paged allocation with gso

    When sending large datagrams that are later segmented, store data in
    page frags to avoid copying from linear in skb_segment.
"

and in code

-                       else
-                               alloclen = datalen + fragheaderlen;
+                       else if (!paged)
+                               alloclen = fraglen;
+                       else {
+                               alloclen = min_t(int, fraglen, MAX_HEADER);
+                               pagedlen = fraglen - alloclen;
+                       }


MAX_HEADER was a short-hand for the exact header length. "alloclen =
fragheaderlen + transhdrlen;" is probably a better choice indeed.

Whether with branch or without, the same change needs to be made to
__ip6_append_data, just as in the referenced commit. Let's keep the
stacks in sync.

This is tricky code. If in doubt, run the msg_zerocopy and udp_gso
tests from tools/testing/selftests/net, ideally with KASAN.

Powered by blists - more mailing lists