netdev - Re: traceroute failure in kernel 6.1 and 6.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLKQB=9rYyKXVH=hd2aBUjzhhjXA0FOdSvN3reH+k9cMQ@mail.gmail.com>
Date:   Tue, 24 Jan 2023 09:57:04 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Mantas Mikulėnas <grawity@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: traceroute failure in kernel 6.1 and 6.2

On Tue, Jan 24, 2023 at 7:03 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Jan 24, 2023 at 6:34 AM Mantas Mikulėnas <grawity@...il.com> wrote:
> >
> >
> >
> > On 24/01/2023 00.26, Eric Dumazet wrote:
> > > On Mon, Jan 23, 2023 at 10:45 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> > >>
> > >> On 23/01/2023 22.56, Eric Dumazet wrote:
> > >>> On Mon, Jan 23, 2023 at 8:25 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> > >>>>
> > >>>> On 2023-01-23 17:21, Eric Dumazet wrote:
> > >>>>> On Sat, Jan 21, 2023 at 7:09 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> > >>>>>>
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> Not sure whether this has been reported, but:
> > >>>>>>
> > >>>>>> After upgrading from kernel 6.0.7 to 6.1.6 on Arch Linux, unprivileged
> > >>>>>> ICMP traceroute using the `traceroute -I` tool stopped working – it very
> > >>>>>> reliably fails with a "No route to host" at some point:
> > >>>>>>
> > >>>>>>            myth> traceroute -I 83.171.33.188
> > >>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> > >>>>>>            byte packets
> > >>>>>>             1  _gateway (192.168.1.1)  0.819 ms
> > >>>>>>            send: No route to host
> > >>>>>>            [exited with 1]
> > >>>>>>
> > >>>>>> while it still works for root:
> > >>>>>>
> > >>>>>>            myth> sudo traceroute -I 83.171.33.188
> > >>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> > >>>>>>            byte packets
> > >>>>>>             1  _gateway (192.168.1.1)  0.771 ms
> > >>>>>>             2  * * *
> > >>>>>>             3  10.69.21.145 (10.69.21.145)  47.194 ms
> > >>>>>>             4  82-135-179-168.static.zebra.lt (82.135.179.168)  49.124 ms
> > >>>>>>             5  213-190-41-3.static.telecom.lt (213.190.41.3)  44.211 ms
> > >>>>>>             6  193.219.153.25 (193.219.153.25)  77.171 ms
> > >>>>>>             7  83.171.33.188 (83.171.33.188)  78.198 ms
> > >>>>>>
> > >>>>>> According to `git bisect`, this started with:
> > >>>>>>
> > >>>>>>            commit 0d24148bd276ead5708ef56a4725580555bb48a3
> > >>>>>>            Author: Eric Dumazet <edumazet@...gle.com>
> > >>>>>>            Date:   Tue Oct 11 14:27:29 2022 -0700
> > >>>>>>
> > >>>>>>                inet: ping: fix recent breakage
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> It still happens with a fresh 6.2rc build, unless I revert that commit.
> > >>>>>>
> > >>>>>> The /bin/traceroute is the one that calls itself "Modern traceroute for
> > >>>>>> Linux, version 2.1.1", on Arch Linux. It seems to use socket(AF_INET,
> > >>>>>> SOCK_DGRAM, IPPROTO_ICMP), has neither setuid nor file capabilities.
> > >>>>>> (The problem does not occur if I run it as root.)
> > >>>>>>
> > >>>>>> This version of `traceroute` sends multiple probes at once (with TTLs
> > >>>>>> 1..16); according to strace, the first approx. 8-12 probes are sent
> > >>>>>> successfully, but eventually sendto() fails with EHOSTUNREACH. (Though
> > >>>>>> if I run it on local tty as opposed to SSH, it fails earlier.) If I use
> > >>>>>> -N1 to have it only send one probe at a time, the problem doesn't seem
> > >>>>>> to occur.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I was not able to reproduce the issue (downloading
> > >>>>> https://sourceforge.net/projects/traceroute/files/latest/download)
> > >>>>>
> > >>>>> I suspect some kind of bug in this traceroute, when/if some ICMP error
> > >>>>> comes back.
> > >>>>>
> > >>>>> Double check by
> > >>>>>
> > >>>>> tcpdump -i ethXXXX icmp
> > >>>>>
> > >>>>> While you run traceroute -I ....
> > >>>>
> > >>>> Hmm, no, the only ICMP errors I see in tcpdump are "Time exceeded in
> > >>>> transit", which is expected for traceroute. Nothing else shows up.
> > >>>>
> > >>>> (But when I test against an address that causes *real* ICMP "Host
> > >>>> unreachable" errors, it seems to handle those correctly and prints "!H"
> > >>>> as usual -- that is, if it reaches that point without dying.)
> > >>>>
> > >>>> I was able to reproduce this on a fresh Linode 1G instance (starting
> > >>>> with their Arch image), where it also happens immediately:
> > >>>>
> > >>>>           # pacman -Sy archlinux-keyring
> > >>>>           # pacman -Syu
> > >>>>           # pacman -Sy traceroute strace
> > >>>>           # reboot
> > >>>>           # uname -r
> > >>>>           6.1.7-arch1-1
> > >>>>           # useradd foo
> > >>>>           # su -c "traceroute -I 8.8.8.8" foo
> > >>>>           traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
> > >>>>            1  10.210.1.195 (10.210.1.195)  0.209 ms
> > >>>>           send: No route to host
> > >>>>
> > >>>> So now I'm fairly sure it is not something caused by my own network, either.
> > >>>>
> > >>>> On one system, it seems to work properly about half the time, if I keep
> > >>>> re-running the same command.
> > >>>>
> > >>>
> > >>> Here, running the latest  upstream tree and latest traceroute, I have no issue.
> > >>>
> > >>> Send us :
> > >>>
> > >>> 1) strace output
> > >>> 2) icmp packet capture.
> > >>>
> > >>> Thanks.
> > >>
> > >> Attached both.
> > >
> > > Thanks.
> > >
> > > I think it is a bug in this traceroute version, pushing too many
> > > sendmsg() at once and hitting socket SNDBUF limit
> > >
> > > If the sendmsg() is blocked in sock_alloc_send_pskb, it might abort
> > > because an incoming ICMP message sets sk->sk_err
> > >
> > > It might have worked in the past, by pure luck.
> > >
> > > Try to increase /proc/sys/net/core/wmem_default
> > >
> > > If this solves the issue, I would advise sending a patch to traceroute to :
> > >
> > > 1) attempt to increase SO_SNDBUF accordingly
> > > 2) use non blocking sendmsg() api to sense how many packets can be
> > > queued in qdisc/NIC queues
> > > 3) reduce number of parallel messages (current traceroute behavior
> > > looks like a flood to me)
> >
> > It doesn't solve the issue; I tried bumping it from the default of
> > 212992 to 4096-times-that, with exactly the same results.
> >
> > The amount of packets it's able to send is variable, For example, right
> > now, on my regular VM (which is smaller than the PC that yesterday's
> > trace was done on), the program very consistently fails on the *second*
> > sendto() call -- I don't think two packets is an unreasonable amount.
> >
> > The program has -q and -N options to reduce the number of simultaneous
> > probes, but the only effect it has is if I reduce the packets all the
> > way down to just one at a time.
>
> Problem is : if we revert the patch, unpriv users can trivially crash a host.
>
> Also, sent ICMP packets  look just fine to me, and the patch is
> changing tx path.
>
> The reported issue seems more like rx path related to me.
> Like IP_RECVERR being not handled correctly.
>
> I think more investigations are needed. Maybe contact Pavel Begunkov
> <asml.silence@...il.com>
> because the initial crash issue came with
> 47cf88993c91 ("net: unify alloclen calculation for paged requests")

I am reasonably confident this is a bug in this traceroute binary.

It sets
 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0

So a sendto() can absolutely return the error set by last received
ICMP (cf ping_err()) on the socket,
as per RFC1122 4.1.3.3

 4.1.3.3  ICMP Messages

            UDP MUST pass to the application layer all ICMP error
            messages that it receives from the IP layer.  Conceptually
            at least, this may be accomplished with an upcall to the
            ERROR_REPORT routine (see Section 4.2.4.1).

            DISCUSSION:
                 Note that ICMP error messages resulting from sending a
                 UDP datagram are received asynchronously.  A UDP-based
                 application that wants to receive ICMP error messages
                 is responsible for maintaining the state necessary to
                 demultiplex these messages when they arrive; for
                 example, the application may keep a pending receive
                 operation for this purpose.  The application is also
                 responsible to avoid confusion from a delayed ICMP
                 error message resulting from an earlier use of the same


Fix would be

diff traceroute/traceroute.c.orig traceroute/traceroute.c
1657c1657
<     if (errno == EMSGSIZE)
---
>     if (errno == EMSGSIZE || errno == EHOSTUNREACH)

or to collect a pending socket error (but that would be racy), using
SO_ERROR getsockopt()