lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iLcJhA4XGWb-NE0CQg8emKTXVRxDySTNKxnw37eP9M6BQ@mail.gmail.com>
Date:   Thu, 27 Oct 2022 07:11:54 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     "Ziyang Xuan (William)" <william.xuanziyang@...wei.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Petar Penkov <peterpenkov96@...il.com>,
        Mahesh Bandewar <maheshb@...gle.com>
Subject: Re: [PATCH net] net: tun: limit first seg size to avoid oversized linearization

On Thu, Sep 15, 2022 at 3:31 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Tue, 2022-09-13 at 20:07 +0800, Ziyang Xuan (William) wrote:
> > > On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
> > > <william.xuanziyang@...wei.com> wrote:
> > > >
> > > > Recently, we found a syzkaller problem as following:
> > > >
> > > > ========================================================
> > > > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295
> > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > > > ...
> > > > Call trace:
> > > >  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > > >  __alloc_pages_node include/linux/gfp.h:550 [inline]
> > > >  alloc_pages_node include/linux/gfp.h:564 [inline]
> > > >  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
> > > >  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
> > > >  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
> > > >  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
> > > >  __skb_grow include/linux/skbuff.h:2779 [inline]
> > > >  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
> > > >  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
> > > >  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
> > > >
> > > > It is because the first seg size of the iov_iter from user space
> > > > is
> > > > very big, it is 2147479538 which is bigger than the threshold
> > > > value
> > > > for bail out early in __alloc_pages(). And skb->pfmemalloc is
> > > > true,
> > > > __kmalloc_reserve() would use pfmemalloc reserves without
> > > > __GFP_NOWARN
> > > > flag. Thus we got a warning.
> > > >
> > > > I noticed that non-first segs size are required less than
> > > > PAGE_SIZE in
> > > > tun_napi_alloc_frags(). The first seg should not be a special
> > > > case, and
> > > > oversized linearization is also unreasonable. Limit the first seg
> > > > size to
> > > > PAGE_SIZE to avoid oversized linearization.
> > > >
> > > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP
> > > > driver")
> > > > Signed-off-by: Ziyang Xuan <william.xuanziyang@...wei.com>
> > > > ---
> > > >  drivers/net/tun.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index 259b2b84b2b3..7db515f94667 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -1454,12 +1454,12 @@ static struct sk_buff
> > > > *tun_napi_alloc_frags(struct tun_file *tfile,
> > > >                                             size_t len,
> > > >                                             const struct iov_iter
> > > > *it)
> > > >  {
> > > > +       size_t linear = iov_iter_single_seg_count(it);
> > > >         struct sk_buff *skb;
> > > > -       size_t linear;
> > > >         int err;
> > > >         int i;
> > > >
> > > > -       if (it->nr_segs > MAX_SKB_FRAGS + 1)
> > > > +       if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear >
> > > > PAGE_SIZE)
> > > >                 return ERR_PTR(-EMSGSIZE);
> > > >
> > >
> > > This does not look good to me.
> > >
> > > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation,
> > > because the hardware is not SG capable in RX.
> >
> > So, do you mean that it does not matter and keep current status, or
> > give a bigger size but PAGE_SIZE (usually 4KB size)?
> >
> > Would like to hear your advice.
>
> I'm guessing that what Eric is suggesting here is to use a bigger limit
> for 'linear'. Possibly ETH_MAX_MTU could fit. @Eric, fell free to
> correct me :)
>

Something like that, yes. We need to be careful when approaching 64K limit,
because of possible u16 fields overflows.

We just got another patch in GRO layer, just because tun has not been fixed yet.



> Thanks!
>
> Paolo
>

Powered by blists - more mailing lists