netdev - Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK6E8=eA9KAKWVoiX3OgeNzmK1a0ma7ZjtWZVzcfCkLEBoisKA@mail.gmail.com>
Date:   Tue, 31 Oct 2017 22:50:05 -0700
From:   Yuchung Cheng <ycheng@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        David Miller <davem@...emloft.net>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Roman Gushchin <guro@...com>, netdev <netdev@...r.kernel.org>,
        Neal Cardwell <ncardwell@...gle.com>,
        Lawrence Brakmo <brakmo@...com>
Subject: Re: [PATCH net] tcp: fix tcp_mtu_probe() vs highest_sack

On Mon, Oct 30, 2017 at 11:17 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@...gle.com>
> >
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> >
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> >
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> >
> > Bad things would then happen, including infinite loops.
> >
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> >
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> >
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoitov@...il.com>
> > Reported-by: Roman Gushchin <guro@...com>
> > Reported-by: Oleksandr Natalenko <oleksandr@...alenko.name>
>
> Thanks!
>
> Acked-by: Alexei Starovoitov <ast@...nel.org>
>
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
FWIW some random guess:
Since RACK was confirmed to trigger the issue, and RACK enables
detecting lost retransmission w/o limited-transmit in CA_Loss state, I
guess RACK create a new type of "fast retransmit" that caused some
previously impossible SACK during MTU probing.

Acked-by: Yuchung Cheng <ycheng@...gle.com>


>
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?
>