netdev - Re: [PATCH net] net: gso: fix panic on frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+FuTSebSMp76U83RpGZVxf=p_Zb9FNH2kScB521x1Z+1zEu9g@mail.gmail.com>
Date:   Mon, 31 Oct 2022 17:16:42 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Jiri Benc <jbenc@...hat.com>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Shmulik Ladkani <shmulik.ladkani@...il.com>,
        netdev@...r.kernel.org, Eric Dumazet <eric.dumazet@...il.com>,
        Tomas Hruby <tomas@...era.io>,
        Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>,
        alexanderduyck@...a.com, Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net] net: gso: fix panic on frag_list with mixed head
 alloc types

On Mon, Oct 31, 2022 at 12:53 PM Jiri Benc <jbenc@...hat.com> wrote:
>
> On Sat, 29 Oct 2022 10:10:03 -0400, Willem de Bruijn wrote:
> > If a device has different allocation strategies depending on packet
> > size, and GRO coalesces those using a list, then indeed this does not
> > have to hold. GRO requires the same packet size and thus allocation
> > strategy to coalesce -- except for the last segment.
>
> That's exactly what I saw: the last segment was different.
>
> However, I don't see anything in the GRO code that enforces that. It
> appears that currently, it just usually happens that way. When there's
> a burst of packets for the given flow on the wire, only the last
> segment is small (and thus malloced) and there's no immediate packet
> following for the same flow. What would happen if (for whatever reason)
> there was such packet following?

This is enforced. In tcp_gro_receive:

        /* If skb is a GRO packet, make sure its gso_size matches
prior packet mss.
         * If it is a single frame, do not aggregate it if its length
         * is bigger than our mss.
         */
        if (unlikely(skb_is_gso(skb)))
                flush |= (mss != skb_shinfo(skb)->gso_size);
        else
                flush |= (len - 1) >= mss;

        [..]

        /* Force a flush if last segment is smaller than mss. */
        if (unlikely(skb_is_gso(skb)))
                flush = len != NAPI_GRO_CB(skb)->count *
skb_shinfo(skb)->gso_size;
        else
                flush = len < mss;

That branch and the comment is very new, introduced with GRO support
for HW-GRO packets. But the else clauses are not new.

>
> > I don't see any allocation in vmxnet3 that uses a head frag, though.
> > There is a small packet path (rxDataRingUsed), but both small and
> > large allocate using a kmalloc-backed skb->data as far as I can tell.
>
> I believe the logic is that for rxDataRingUsed,
> netdev_alloc_skb_ip_align is called to alloc skb to copy data into,
> passing to it the actual packet length. If it's small enough,
> __netdev_alloc_skb will kmalloc the data. However, for !rxDataRingUsed,
> the skb for dma buffer is allocated with a larger length and
> __netdev_alloc_skb will use page_frag_alloc.

That explains perfectly.

> I admit I've not spend that much time understanding the logic in the
> driver. I was satisfied when the perceived logic matched what I saw in
> the kernel memory dump. I may have easily missed something, such as
> Jakub's point that it's not actually the driver deciding on the
> allocation strategy but rather __netdev_alloc_skb on its own. But the
> outcome still seems to be that small packets are kmalloced, while
> larger packets are page backed. Am I wrong?
>
> > In any case, iterating over all frags is more robust. This is an edge
> > case, fine to incur the cost there.
>
> Thanks! We might get a minor speedup if we check only the last segment;
> but first, I'd like to be proven wrong about GRO not enforcing this.
> Plus, I wonder whether the speedup would be measurable if we have to
> iterate through the list to find the last segment anyway.
>
> Unless there are objections or clarifications (and unless I'm wrong
> above), I'll send a v2 with the commit message corrected and with the
> same code.

Sounds good to me, thanks.

>
> Thanks to all!
>
>  Jiri
>