[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWYdi1QmrHxNZT_DK4A2WUoj=r1+wxSngzaaTuGCatHisaTRw@mail.gmail.com>
Date: Tue, 29 Oct 2019 11:19:53 -0700
From: Ivan Babrou <ivan@...udflare.com>
To: Linux Kernel Network Developers <netdev@...r.kernel.org>
Cc: kernel-team <kernel-team@...udflare.com>,
Eric Dumazet <edumazet@...gle.com>
Subject: fq dropping packets between vlan and ethernet interfaces
Hello,
We're trying to test Linux 5.4 early and hit an issue with FQ.
The relevant part of our network setup involves four interfaces:
* ext0 (ethernet, internet facing)
* vlan101@...0 (vlan)
* int0 (ethernet, lan facing)
* vlan11@...0 (vlan)
Both int0 and ext0 have fq on them:
qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
low_rate_threshold 550Kbit refill_delay 40.0ms
qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
low_rate_threshold 550Kbit refill_delay 40.0ms
The issue itself is that after some time ext0 stops feeding off
vlan101, which is visible as tcpdump not seeing packets on ext0, while
they flow over vlan101.
I can see that fq_dequeue does not report any packets:
$ sudo perf record -e qdisc:qdisc_dequeue -aR sleep 1
hping3 40335 [006] 63920.881016: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881030: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881041: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881070: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
Inside of fq_dequeue I'm able to see that we throw away packets in here:
* https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L510
The output of tc -s qdisc shows the following:
qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
low_rate_threshold 550Kbit refill_delay 40.0ms
Sent 4872143400 bytes 8448638 pkt (dropped 201276670, overlimits 0
requeues 103)
backlog 779376b 10000p requeues 103
2806 flows (2688 inactive, 118 throttled), next packet delay
1572240566653952889 ns
354201 gc, 0 highprio, 804560 throttled, 3919 ns latency, 19492 flows_plimit
qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
low_rate_threshold 550Kbit refill_delay 40.0ms
Sent 15869093876 bytes 17387110 pkt (dropped 0, overlimits 0 requeues 2817)
backlog 0b 0p requeues 2817
2047 flows (2035 inactive, 0 throttled)
225074 gc, 10 highprio, 102308 throttled, 7525 ns latency
The key part here is probably that next packet delay for ext0 is the
current unix timestamp in nanoseconds. Naturally, we see this code
path being executed:
* https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L462
Unfortunately, I don't have a reliable reproduction for this issue. It
appears naturally with some traffic and I can do limited tracing with
perf and bcc tools while running hping3 to generate packets.
The issue goes away if I replace fq with pfifo_fast on ext0.
Powered by blists - more mailing lists