netdev - Re: traceroute failure in kernel 6.1 and 6.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2ecff1c-91ad-4217-7fd5-d7bbd5704abe@gmail.com>
Date:   Tue, 24 Jan 2023 07:34:20 +0200
From:   Mantas Mikulėnas <grawity@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     netdev@...r.kernel.org
Subject: Re: traceroute failure in kernel 6.1 and 6.2



On 24/01/2023 00.26, Eric Dumazet wrote:
> On Mon, Jan 23, 2023 at 10:45 PM Mantas Mikulėnas <grawity@...il.com> wrote:
>>
>> On 23/01/2023 22.56, Eric Dumazet wrote:
>>> On Mon, Jan 23, 2023 at 8:25 PM Mantas Mikulėnas <grawity@...il.com> wrote:
>>>>
>>>> On 2023-01-23 17:21, Eric Dumazet wrote:
>>>>> On Sat, Jan 21, 2023 at 7:09 PM Mantas Mikulėnas <grawity@...il.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Not sure whether this has been reported, but:
>>>>>>
>>>>>> After upgrading from kernel 6.0.7 to 6.1.6 on Arch Linux, unprivileged
>>>>>> ICMP traceroute using the `traceroute -I` tool stopped working – it very
>>>>>> reliably fails with a "No route to host" at some point:
>>>>>>
>>>>>>            myth> traceroute -I 83.171.33.188
>>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
>>>>>>            byte packets
>>>>>>             1  _gateway (192.168.1.1)  0.819 ms
>>>>>>            send: No route to host
>>>>>>            [exited with 1]
>>>>>>
>>>>>> while it still works for root:
>>>>>>
>>>>>>            myth> sudo traceroute -I 83.171.33.188
>>>>>>            traceroute to 83.171.33.188 (83.171.33.188), 30 hops max, 60
>>>>>>            byte packets
>>>>>>             1  _gateway (192.168.1.1)  0.771 ms
>>>>>>             2  * * *
>>>>>>             3  10.69.21.145 (10.69.21.145)  47.194 ms
>>>>>>             4  82-135-179-168.static.zebra.lt (82.135.179.168)  49.124 ms
>>>>>>             5  213-190-41-3.static.telecom.lt (213.190.41.3)  44.211 ms
>>>>>>             6  193.219.153.25 (193.219.153.25)  77.171 ms
>>>>>>             7  83.171.33.188 (83.171.33.188)  78.198 ms
>>>>>>
>>>>>> According to `git bisect`, this started with:
>>>>>>
>>>>>>            commit 0d24148bd276ead5708ef56a4725580555bb48a3
>>>>>>            Author: Eric Dumazet <edumazet@...gle.com>
>>>>>>            Date:   Tue Oct 11 14:27:29 2022 -0700
>>>>>>
>>>>>>                inet: ping: fix recent breakage
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It still happens with a fresh 6.2rc build, unless I revert that commit.
>>>>>>
>>>>>> The /bin/traceroute is the one that calls itself "Modern traceroute for
>>>>>> Linux, version 2.1.1", on Arch Linux. It seems to use socket(AF_INET,
>>>>>> SOCK_DGRAM, IPPROTO_ICMP), has neither setuid nor file capabilities.
>>>>>> (The problem does not occur if I run it as root.)
>>>>>>
>>>>>> This version of `traceroute` sends multiple probes at once (with TTLs
>>>>>> 1..16); according to strace, the first approx. 8-12 probes are sent
>>>>>> successfully, but eventually sendto() fails with EHOSTUNREACH. (Though
>>>>>> if I run it on local tty as opposed to SSH, it fails earlier.) If I use
>>>>>> -N1 to have it only send one probe at a time, the problem doesn't seem
>>>>>> to occur.
>>>>>
>>>>>
>>>>>
>>>>> I was not able to reproduce the issue (downloading
>>>>> https://sourceforge.net/projects/traceroute/files/latest/download)
>>>>>
>>>>> I suspect some kind of bug in this traceroute, when/if some ICMP error
>>>>> comes back.
>>>>>
>>>>> Double check by
>>>>>
>>>>> tcpdump -i ethXXXX icmp
>>>>>
>>>>> While you run traceroute -I ....
>>>>
>>>> Hmm, no, the only ICMP errors I see in tcpdump are "Time exceeded in
>>>> transit", which is expected for traceroute. Nothing else shows up.
>>>>
>>>> (But when I test against an address that causes *real* ICMP "Host
>>>> unreachable" errors, it seems to handle those correctly and prints "!H"
>>>> as usual -- that is, if it reaches that point without dying.)
>>>>
>>>> I was able to reproduce this on a fresh Linode 1G instance (starting
>>>> with their Arch image), where it also happens immediately:
>>>>
>>>>           # pacman -Sy archlinux-keyring
>>>>           # pacman -Syu
>>>>           # pacman -Sy traceroute strace
>>>>           # reboot
>>>>           # uname -r
>>>>           6.1.7-arch1-1
>>>>           # useradd foo
>>>>           # su -c "traceroute -I 8.8.8.8" foo
>>>>           traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
>>>>            1  10.210.1.195 (10.210.1.195)  0.209 ms
>>>>           send: No route to host
>>>>
>>>> So now I'm fairly sure it is not something caused by my own network, either.
>>>>
>>>> On one system, it seems to work properly about half the time, if I keep
>>>> re-running the same command.
>>>>
>>>
>>> Here, running the latest  upstream tree and latest traceroute, I have no issue.
>>>
>>> Send us :
>>>
>>> 1) strace output
>>> 2) icmp packet capture.
>>>
>>> Thanks.
>>
>> Attached both.
> 
> Thanks.
> 
> I think it is a bug in this traceroute version, pushing too many
> sendmsg() at once and hitting socket SNDBUF limit
> 
> If the sendmsg() is blocked in sock_alloc_send_pskb, it might abort
> because an incoming ICMP message sets sk->sk_err
> 
> It might have worked in the past, by pure luck.
> 
> Try to increase /proc/sys/net/core/wmem_default
> 
> If this solves the issue, I would advise sending a patch to traceroute to :
> 
> 1) attempt to increase SO_SNDBUF accordingly
> 2) use non blocking sendmsg() api to sense how many packets can be
> queued in qdisc/NIC queues
> 3) reduce number of parallel messages (current traceroute behavior
> looks like a flood to me)

It doesn't solve the issue; I tried bumping it from the default of 
212992 to 4096-times-that, with exactly the same results.

The amount of packets it's able to send is variable, For example, right 
now, on my regular VM (which is smaller than the PC that yesterday's 
trace was done on), the program very consistently fails on the *second* 
sendto() call -- I don't think two packets is an unreasonable amount.

The program has -q and -N options to reduce the number of simultaneous 
probes, but the only effect it has is if I reduce the packets all the 
way down to just one at a time.