netdev - Re: fq dropping packets between vlan and ethernet interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLE-3zxROxGOusPBRmQL4oN2Nqtg3rqXnpO8bkiFAw8EQ@mail.gmail.com>
Date:   Tue, 29 Oct 2019 11:27:13 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Ivan Babrou <ivan@...udflare.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...udflare.com>
Subject: Re: fq dropping packets between vlan and ethernet interfaces

On Tue, Oct 29, 2019 at 11:20 AM Ivan Babrou <ivan@...udflare.com> wrote:
>
> Hello,
>
> We're trying to test Linux 5.4 early and hit an issue with FQ.
>
> The relevant part of our network setup involves four interfaces:
>
> * ext0 (ethernet, internet facing)
> * vlan101@...0 (vlan)
> * int0 (ethernet, lan facing)
> * vlan11@...0 (vlan)
>
> Both int0 and ext0 have fq on them:
>
> qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> low_rate_threshold 550Kbit refill_delay 40.0ms
> qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> low_rate_threshold 550Kbit refill_delay 40.0ms
>
> The issue itself is that after some time ext0 stops feeding off
> vlan101, which is visible as tcpdump not seeing packets on ext0, while
> they flow over vlan101.
>
> I can see that fq_dequeue does not report any packets:
>
> $ sudo perf record -e qdisc:qdisc_dequeue -aR sleep 1
> hping3 40335 [006] 63920.881016: qdisc:qdisc_dequeue: dequeue
> ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> packets=0 skbaddr=(nil)
> hping3 40335 [006] 63920.881030: qdisc:qdisc_dequeue: dequeue
> ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> packets=0 skbaddr=(nil)
> hping3 40335 [006] 63920.881041: qdisc:qdisc_dequeue: dequeue
> ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> packets=0 skbaddr=(nil)
> hping3 40335 [006] 63920.881070: qdisc:qdisc_dequeue: dequeue
> ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> packets=0 skbaddr=(nil)
>
> Inside of fq_dequeue I'm able to see that we throw away packets in here:
>
> * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L510
>
> The output of tc -s qdisc shows the following:
>
> qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> low_rate_threshold 550Kbit refill_delay 40.0ms
>  Sent 4872143400 bytes 8448638 pkt (dropped 201276670, overlimits 0
> requeues 103)
>  backlog 779376b 10000p requeues 103
>   2806 flows (2688 inactive, 118 throttled), next packet delay
> 1572240566653952889 ns
>   354201 gc, 0 highprio, 804560 throttled, 3919 ns latency, 19492 flows_plimit
> qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> low_rate_threshold 550Kbit refill_delay 40.0ms
>  Sent 15869093876 bytes 17387110 pkt (dropped 0, overlimits 0 requeues 2817)
>  backlog 0b 0p requeues 2817
>   2047 flows (2035 inactive, 0 throttled)
>   225074 gc, 10 highprio, 102308 throttled, 7525 ns latency
>
> The key part here is probably that next packet delay for ext0 is the
> current unix timestamp in nanoseconds. Naturally, we see this code
> path being executed:
>
> * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L462
>
> Unfortunately, I don't have a reliable reproduction for this issue. It
> appears naturally with some traffic and I can do limited tracing with
> perf and bcc tools while running hping3 to generate packets.
>
> The issue goes away if I replace fq with pfifo_fast on ext0.

At which commit is your tree  precisely ?

This sounds like the recent fix we had for fragmented packets.

e7a409c3f46cb0dbc7bfd4f6f9421d53e92614a5 ipv4: fix IPSKB_FRAG_PMTU
handling with fragmentation