netdev - Re: traceroute failure in kernel 6.1 and 6.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLV3NDiEA4tPWUxjqoHNx1pv=SEpXd1b38NXU=TK13=tg@mail.gmail.com>
Date:   Tue, 24 Jan 2023 07:03:09 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Mantas Mikulėnas <grawity@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: traceroute failure in kernel 6.1 and 6.2

On Tue, Jan 24, 2023 at 6:34 AM Mantas Mikulėnas <grawity@...il.com> wrote:
>
>
>
> On 24/01/2023 00.26, Eric Dumazet wrote:
> > On Mon, Jan 23, 2023 at 10:45 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> >>
> >> On 23/01/2023 22.56, Eric Dumazet wrote:
> >>> On Mon, Jan 23, 2023 at 8:25 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> >>>>
> >>>> On 2023-01-23 17:21, Eric Dumazet wrote:
> >>>>> On Sat, Jan 21, 2023 at 7:09 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> Not sure whether this has been reported, but:
> >>>>>>
> >>>>>> After upgrading from kernel 6.0.7 to 6.1.6 on Arch Linux, unprivileged
> >>>>>> ICMP traceroute using the `traceroute -I` tool stopped working – it very
> >>>>>> reliably fails with a "No route to host" at some point:
> >>>>>>
> >>>>>>            myth> traceroute -I 83.171.33.188
> >>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> >>>>>>            byte packets
> >>>>>>             1  _gateway (192.168.1.1)  0.819 ms
> >>>>>>            send: No route to host
> >>>>>>            [exited with 1]
> >>>>>>
> >>>>>> while it still works for root:
> >>>>>>
> >>>>>>            myth> sudo traceroute -I 83.171.33.188
> >>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> >>>>>>            byte packets
> >>>>>>             1  _gateway (192.168.1.1)  0.771 ms
> >>>>>>             2  * * *
> >>>>>>             3  10.69.21.145 (10.69.21.145)  47.194 ms
> >>>>>>             4  82-135-179-168.static.zebra.lt (82.135.179.168)  49.124 ms
> >>>>>>             5  213-190-41-3.static.telecom.lt (213.190.41.3)  44.211 ms
> >>>>>>             6  193.219.153.25 (193.219.153.25)  77.171 ms
> >>>>>>             7  83.171.33.188 (83.171.33.188)  78.198 ms
> >>>>>>
> >>>>>> According to `git bisect`, this started with:
> >>>>>>
> >>>>>>            commit 0d24148bd276ead5708ef56a4725580555bb48a3
> >>>>>>            Author: Eric Dumazet <edumazet@...gle.com>
> >>>>>>            Date:   Tue Oct 11 14:27:29 2022 -0700
> >>>>>>
> >>>>>>                inet: ping: fix recent breakage
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> It still happens with a fresh 6.2rc build, unless I revert that commit.
> >>>>>>
> >>>>>> The /bin/traceroute is the one that calls itself "Modern traceroute for
> >>>>>> Linux, version 2.1.1", on Arch Linux. It seems to use socket(AF_INET,
> >>>>>> SOCK_DGRAM, IPPROTO_ICMP), has neither setuid nor file capabilities.
> >>>>>> (The problem does not occur if I run it as root.)
> >>>>>>
> >>>>>> This version of `traceroute` sends multiple probes at once (with TTLs
> >>>>>> 1..16); according to strace, the first approx. 8-12 probes are sent
> >>>>>> successfully, but eventually sendto() fails with EHOSTUNREACH. (Though
> >>>>>> if I run it on local tty as opposed to SSH, it fails earlier.) If I use
> >>>>>> -N1 to have it only send one probe at a time, the problem doesn't seem
> >>>>>> to occur.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I was not able to reproduce the issue (downloading
> >>>>> https://sourceforge.net/projects/traceroute/files/latest/download)
> >>>>>
> >>>>> I suspect some kind of bug in this traceroute, when/if some ICMP error
> >>>>> comes back.
> >>>>>
> >>>>> Double check by
> >>>>>
> >>>>> tcpdump -i ethXXXX icmp
> >>>>>
> >>>>> While you run traceroute -I ....
> >>>>
> >>>> Hmm, no, the only ICMP errors I see in tcpdump are "Time exceeded in
> >>>> transit", which is expected for traceroute. Nothing else shows up.
> >>>>
> >>>> (But when I test against an address that causes *real* ICMP "Host
> >>>> unreachable" errors, it seems to handle those correctly and prints "!H"
> >>>> as usual -- that is, if it reaches that point without dying.)
> >>>>
> >>>> I was able to reproduce this on a fresh Linode 1G instance (starting
> >>>> with their Arch image), where it also happens immediately:
> >>>>
> >>>>           # pacman -Sy archlinux-keyring
> >>>>           # pacman -Syu
> >>>>           # pacman -Sy traceroute strace
> >>>>           # reboot
> >>>>           # uname -r
> >>>>           6.1.7-arch1-1
> >>>>           # useradd foo
> >>>>           # su -c "traceroute -I 8.8.8.8" foo
> >>>>           traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
> >>>>            1  10.210.1.195 (10.210.1.195)  0.209 ms
> >>>>           send: No route to host
> >>>>
> >>>> So now I'm fairly sure it is not something caused by my own network, either.
> >>>>
> >>>> On one system, it seems to work properly about half the time, if I keep
> >>>> re-running the same command.
> >>>>
> >>>
> >>> Here, running the latest  upstream tree and latest traceroute, I have no issue.
> >>>
> >>> Send us :
> >>>
> >>> 1) strace output
> >>> 2) icmp packet capture.
> >>>
> >>> Thanks.
> >>
> >> Attached both.
> >
> > Thanks.
> >
> > I think it is a bug in this traceroute version, pushing too many
> > sendmsg() at once and hitting socket SNDBUF limit
> >
> > If the sendmsg() is blocked in sock_alloc_send_pskb, it might abort
> > because an incoming ICMP message sets sk->sk_err
> >
> > It might have worked in the past, by pure luck.
> >
> > Try to increase /proc/sys/net/core/wmem_default
> >
> > If this solves the issue, I would advise sending a patch to traceroute to :
> >
> > 1) attempt to increase SO_SNDBUF accordingly
> > 2) use non blocking sendmsg() api to sense how many packets can be
> > queued in qdisc/NIC queues
> > 3) reduce number of parallel messages (current traceroute behavior
> > looks like a flood to me)
>
> It doesn't solve the issue; I tried bumping it from the default of
> 212992 to 4096-times-that, with exactly the same results.
>
> The amount of packets it's able to send is variable, For example, right
> now, on my regular VM (which is smaller than the PC that yesterday's
> trace was done on), the program very consistently fails on the *second*
> sendto() call -- I don't think two packets is an unreasonable amount.
>
> The program has -q and -N options to reduce the number of simultaneous
> probes, but the only effect it has is if I reduce the packets all the
> way down to just one at a time.

Problem is : if we revert the patch, unpriv users can trivially crash a host.

Also, sent ICMP packets  look just fine to me, and the patch is
changing tx path.

The reported issue seems more like rx path related to me.
Like IP_RECVERR being not handled correctly.

I think more investigations are needed. Maybe contact Pavel Begunkov
<asml.silence@...il.com>
because the initial crash issue came with
47cf88993c91 ("net: unify alloclen calculation for paged requests")