lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWYdi2Eq30vEKKYxr-diofpeATNXiB3ZYKL6Q15y10w+vsCLg@mail.gmail.com>
Date:   Tue, 29 Oct 2019 11:31:02 -0700
From:   Ivan Babrou <ivan@...udflare.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...udflare.com>
Subject: Re: fq dropping packets between vlan and ethernet interfaces

I'm on 5.4-rc5. Let me apply e7a409c3f46cb0dbc7bfd4f6f9421d53e92614a5
on top and report back to you.

On Tue, Oct 29, 2019 at 11:27 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 29, 2019 at 11:20 AM Ivan Babrou <ivan@...udflare.com> wrote:
> >
> > Hello,
> >
> > We're trying to test Linux 5.4 early and hit an issue with FQ.
> >
> > The relevant part of our network setup involves four interfaces:
> >
> > * ext0 (ethernet, internet facing)
> > * vlan101@...0 (vlan)
> > * int0 (ethernet, lan facing)
> > * vlan11@...0 (vlan)
> >
> > Both int0 and ext0 have fq on them:
> >
> > qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> > buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> > low_rate_threshold 550Kbit refill_delay 40.0ms
> > qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> > buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> > low_rate_threshold 550Kbit refill_delay 40.0ms
> >
> > The issue itself is that after some time ext0 stops feeding off
> > vlan101, which is visible as tcpdump not seeing packets on ext0, while
> > they flow over vlan101.
> >
> > I can see that fq_dequeue does not report any packets:
> >
> > $ sudo perf record -e qdisc:qdisc_dequeue -aR sleep 1
> > hping3 40335 [006] 63920.881016: qdisc:qdisc_dequeue: dequeue
> > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > packets=0 skbaddr=(nil)
> > hping3 40335 [006] 63920.881030: qdisc:qdisc_dequeue: dequeue
> > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > packets=0 skbaddr=(nil)
> > hping3 40335 [006] 63920.881041: qdisc:qdisc_dequeue: dequeue
> > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > packets=0 skbaddr=(nil)
> > hping3 40335 [006] 63920.881070: qdisc:qdisc_dequeue: dequeue
> > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > packets=0 skbaddr=(nil)
> >
> > Inside of fq_dequeue I'm able to see that we throw away packets in here:
> >
> > * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L510
> >
> > The output of tc -s qdisc shows the following:
> >
> > qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> > buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> > low_rate_threshold 550Kbit refill_delay 40.0ms
> >  Sent 4872143400 bytes 8448638 pkt (dropped 201276670, overlimits 0
> > requeues 103)
> >  backlog 779376b 10000p requeues 103
> >   2806 flows (2688 inactive, 118 throttled), next packet delay
> > 1572240566653952889 ns
> >   354201 gc, 0 highprio, 804560 throttled, 3919 ns latency, 19492 flows_plimit
> > qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> > buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> > low_rate_threshold 550Kbit refill_delay 40.0ms
> >  Sent 15869093876 bytes 17387110 pkt (dropped 0, overlimits 0 requeues 2817)
> >  backlog 0b 0p requeues 2817
> >   2047 flows (2035 inactive, 0 throttled)
> >   225074 gc, 10 highprio, 102308 throttled, 7525 ns latency
> >
> > The key part here is probably that next packet delay for ext0 is the
> > current unix timestamp in nanoseconds. Naturally, we see this code
> > path being executed:
> >
> > * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L462
> >
> > Unfortunately, I don't have a reliable reproduction for this issue. It
> > appears naturally with some traffic and I can do limited tracing with
> > perf and bcc tools while running hping3 to generate packets.
> >
> > The issue goes away if I replace fq with pfifo_fast on ext0.
>
> At which commit is your tree  precisely ?
>
> This sounds like the recent fix we had for fragmented packets.
>
> e7a409c3f46cb0dbc7bfd4f6f9421d53e92614a5 ipv4: fix IPSKB_FRAG_PMTU
> handling with fragmentation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ