linux-kernel - Re: Unexpected timestamps in tcpdump with veth + tc qdisc netem delay

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e0aa5a6-0457-ccd0-8984-9c9aaeab2228@gmail.com>
Date:   Mon, 26 Apr 2021 19:07:27 +0200
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Henrique de Moraes Holschuh <hmh@....eng.br>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: Unexpected timestamps in tcpdump with veth + tc qdisc netem delay



On 4/26/21 4:35 PM, Henrique de Moraes Holschuh wrote:
> (please CC me in any replies, thank you!)
> 
> Hello,
> 
> While trying to simulate large delay links using veth and netns, I came across what looks like unexpected / incorrect behavior.
> 
> I have reproduced it in Debian 4.19 and 5.10 kernels, and a quick look at mainline doesn't show any relevant deviation from Debian kernels to mainline in my limited understanding of this area of the kernel.
> 
> I have attached a simple script to reproduce the scenario.  If my explanation below is not clear, please just look at the script to see what it does: it should be trivial to understand.  It needs tcpdump, and CAP_NET_ADMIN (or root, etc).
> 
> Topology
> 
> root netns:
>    veth vec0 (192.168.233.1)   paired to ves0 (192.168.233.2)
>    tc qdisc dev vec0 root netem delay 250ms
> 
> lab500ms netns:
>    veth ves0 (192.168.233.2), paired to vec0 (192.168.233.1)
>    tc qdisc dev ves0 root netem delay 250ms
> 
> So:
> [root netns  -- veth (tc qdisc netem delay 250ms) ] <> [ veth (tc qdisc netem delay 250ms) -- lab500ms netns ]
> 
> Expected RTT from a packet roundtrip (root nets -> lab500ms netns -> root netns) is 500ms.
> 
> 
> The problem:
> 
> [root netns]:  ping 192.168.233.2
> PING 192.168.233.2 (192.168.233.2) 56(84) bytes of data.
> 64 bytes from 192.168.233.2: icmp_seq=1 ttl=64 time=500 ms
> 64 bytes from 192.168.233.2: icmp_seq=2 ttl=64 time=500 ms
> 
> (the RTT reported by ping is 500ms as expected: there is a 250ms transmit delay attached to each member of the veth pair)
> 
> However:
> 
> [root netns]: tcpdump -i vec0 -s0 -n -p net 192.168.233.0/30
> listening on vec0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 17:09:09.740681 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64

Here you see the packet _after_ the 250ms delay

> 17:09:09.990891 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
Same here.

I do not see any problem.

> 17:09:10.741903 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
> 17:09:10.992031 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
> 17:09:11.742813 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
> 17:09:11.993009 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64
> 
> [lab500ms netns]: ip netns exec lab500ms tcpdump -i ves0 -s0 -n -p net 192.168.233.0/30
> listening on ves0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 17:09:09.740724 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
> 17:09:09.990867 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
> 17:09:10.741942 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
> 17:09:10.992012 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
> 17:09:11.742851 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
> 17:09:11.992985 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64
> 
> One can see that the timestamps shown by tcpdump (also reproduced using wireshark) are *not* what one would expect: the 250ms delays are missing in incoming packets (i.e. there's 250ms missing from timestamps in packets "echo reply" in vec0, and "echo request" in ves0).
> 
> The 250ms vec0->ves0 delay AND 250ms ves0 -> vec0 delay *are* there, as shown by "ping", but you'd not know it if you look at the tcpdump.  The timing shown in tcpdump looks more like packet injection time at the first interface, than the time the packet was "seen" at the other end (capture interface).
> 
> Adding more namespaces and VETH pairs + routing "in a row" so that the packet "exits" one veth tunnel and enters another one (after trivial routing) doesn't fix the tcpdump timestamps in the capture at the other end of the veth-veth->routing->veth-veth->routing->... chain.
> 
> It looks like some sort of bug to me, but maybe I am missing something, in which case I would greatly appreciate an explanation of where I went wrong... 
> 

That is only because you expect to see something, but you forgot that tcpdump captures
TX packet _after_ netem.