lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWYdi1QmrHxNZT_DK4A2WUoj=r1+wxSngzaaTuGCatHisaTRw@mail.gmail.com>
Date:   Tue, 29 Oct 2019 11:19:53 -0700
From:   Ivan Babrou <ivan@...udflare.com>
To:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Cc:     kernel-team <kernel-team@...udflare.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: fq dropping packets between vlan and ethernet interfaces

Hello,

We're trying to test Linux 5.4 early and hit an issue with FQ.

The relevant part of our network setup involves four interfaces:

* ext0 (ethernet, internet facing)
* vlan101@...0 (vlan)
* int0 (ethernet, lan facing)
* vlan11@...0 (vlan)

Both int0 and ext0 have fq on them:

qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
low_rate_threshold 550Kbit refill_delay 40.0ms
qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
low_rate_threshold 550Kbit refill_delay 40.0ms

The issue itself is that after some time ext0 stops feeding off
vlan101, which is visible as tcpdump not seeing packets on ext0, while
they flow over vlan101.

I can see that fq_dequeue does not report any packets:

$ sudo perf record -e qdisc:qdisc_dequeue -aR sleep 1
hping3 40335 [006] 63920.881016: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881030: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881041: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)
hping3 40335 [006] 63920.881070: qdisc:qdisc_dequeue: dequeue
ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
packets=0 skbaddr=(nil)

Inside of fq_dequeue I'm able to see that we throw away packets in here:

* https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L510

The output of tc -s qdisc shows the following:

qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 4872143400 bytes 8448638 pkt (dropped 201276670, overlimits 0
requeues 103)
 backlog 779376b 10000p requeues 103
  2806 flows (2688 inactive, 118 throttled), next packet delay
1572240566653952889 ns
  354201 gc, 0 highprio, 804560 throttled, 3919 ns latency, 19492 flows_plimit
qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 15869093876 bytes 17387110 pkt (dropped 0, overlimits 0 requeues 2817)
 backlog 0b 0p requeues 2817
  2047 flows (2035 inactive, 0 throttled)
  225074 gc, 10 highprio, 102308 throttled, 7525 ns latency

The key part here is probably that next packet delay for ext0 is the
current unix timestamp in nanoseconds. Naturally, we see this code
path being executed:

* https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L462

Unfortunately, I don't have a reliable reproduction for this issue. It
appears naturally with some traffic and I can do limited tracing with
perf and bcc tools while running hping3 to generate packets.

The issue goes away if I replace fq with pfifo_fast on ext0.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ