[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iK7nn6tdQg9QZO_Gudx1BvLxhoLaNYmnOLb6ccYQnLGwg@mail.gmail.com>
Date: Mon, 23 Jan 2023 23:26:00 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Mantas Mikulėnas <grawity@...il.com>
Cc: netdev@...r.kernel.org
Subject: Re: traceroute failure in kernel 6.1 and 6.2
On Mon, Jan 23, 2023 at 10:45 PM Mantas Mikulėnas <grawity@...il.com> wrote:
>
> On 23/01/2023 22.56, Eric Dumazet wrote:
> > On Mon, Jan 23, 2023 at 8:25 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> >>
> >> On 2023-01-23 17:21, Eric Dumazet wrote:
> >>> On Sat, Jan 21, 2023 at 7:09 PM Mantas Mikulėnas <grawity@...il.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> Not sure whether this has been reported, but:
> >>>>
> >>>> After upgrading from kernel 6.0.7 to 6.1.6 on Arch Linux, unprivileged
> >>>> ICMP traceroute using the `traceroute -I` tool stopped working – it very
> >>>> reliably fails with a "No route to host" at some point:
> >>>>
> >>>> myth> traceroute -I 83.171.33.188
> >>>> traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> >>>> byte packets
> >>>> 1 _gateway (192.168.1.1) 0.819 ms
> >>>> send: No route to host
> >>>> [exited with 1]
> >>>>
> >>>> while it still works for root:
> >>>>
> >>>> myth> sudo traceroute -I 83.171.33.188
> >>>> traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
> >>>> byte packets
> >>>> 1 _gateway (192.168.1.1) 0.771 ms
> >>>> 2 * * *
> >>>> 3 10.69.21.145 (10.69.21.145) 47.194 ms
> >>>> 4 82-135-179-168.static.zebra.lt (82.135.179.168) 49.124 ms
> >>>> 5 213-190-41-3.static.telecom.lt (213.190.41.3) 44.211 ms
> >>>> 6 193.219.153.25 (193.219.153.25) 77.171 ms
> >>>> 7 83.171.33.188 (83.171.33.188) 78.198 ms
> >>>>
> >>>> According to `git bisect`, this started with:
> >>>>
> >>>> commit 0d24148bd276ead5708ef56a4725580555bb48a3
> >>>> Author: Eric Dumazet <edumazet@...gle.com>
> >>>> Date: Tue Oct 11 14:27:29 2022 -0700
> >>>>
> >>>> inet: ping: fix recent breakage
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> It still happens with a fresh 6.2rc build, unless I revert that commit.
> >>>>
> >>>> The /bin/traceroute is the one that calls itself "Modern traceroute for
> >>>> Linux, version 2.1.1", on Arch Linux. It seems to use socket(AF_INET,
> >>>> SOCK_DGRAM, IPPROTO_ICMP), has neither setuid nor file capabilities.
> >>>> (The problem does not occur if I run it as root.)
> >>>>
> >>>> This version of `traceroute` sends multiple probes at once (with TTLs
> >>>> 1..16); according to strace, the first approx. 8-12 probes are sent
> >>>> successfully, but eventually sendto() fails with EHOSTUNREACH. (Though
> >>>> if I run it on local tty as opposed to SSH, it fails earlier.) If I use
> >>>> -N1 to have it only send one probe at a time, the problem doesn't seem
> >>>> to occur.
> >>>
> >>>
> >>>
> >>> I was not able to reproduce the issue (downloading
> >>> https://sourceforge.net/projects/traceroute/files/latest/download)
> >>>
> >>> I suspect some kind of bug in this traceroute, when/if some ICMP error
> >>> comes back.
> >>>
> >>> Double check by
> >>>
> >>> tcpdump -i ethXXXX icmp
> >>>
> >>> While you run traceroute -I ....
> >>
> >> Hmm, no, the only ICMP errors I see in tcpdump are "Time exceeded in
> >> transit", which is expected for traceroute. Nothing else shows up.
> >>
> >> (But when I test against an address that causes *real* ICMP "Host
> >> unreachable" errors, it seems to handle those correctly and prints "!H"
> >> as usual -- that is, if it reaches that point without dying.)
> >>
> >> I was able to reproduce this on a fresh Linode 1G instance (starting
> >> with their Arch image), where it also happens immediately:
> >>
> >> # pacman -Sy archlinux-keyring
> >> # pacman -Syu
> >> # pacman -Sy traceroute strace
> >> # reboot
> >> # uname -r
> >> 6.1.7-arch1-1
> >> # useradd foo
> >> # su -c "traceroute -I 8.8.8.8" foo
> >> traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
> >> 1 10.210.1.195 (10.210.1.195) 0.209 ms
> >> send: No route to host
> >>
> >> So now I'm fairly sure it is not something caused by my own network, either.
> >>
> >> On one system, it seems to work properly about half the time, if I keep
> >> re-running the same command.
> >>
> >
> > Here, running the latest upstream tree and latest traceroute, I have no issue.
> >
> > Send us :
> >
> > 1) strace output
> > 2) icmp packet capture.
> >
> > Thanks.
>
> Attached both.
Thanks.
I think it is a bug in this traceroute version, pushing too many
sendmsg() at once and hitting socket SNDBUF limit
If the sendmsg() is blocked in sock_alloc_send_pskb, it might abort
because an incoming ICMP message sets sk->sk_err
It might have worked in the past, by pure luck.
Try to increase /proc/sys/net/core/wmem_default
If this solves the issue, I would advise sending a patch to traceroute to :
1) attempt to increase SO_SNDBUF accordingly
2) use non blocking sendmsg() api to sense how many packets can be
queued in qdisc/NIC queues
3) reduce number of parallel messages (current traceroute behavior
looks like a flood to me)
Powered by blists - more mailing lists