netdev - Re: fq dropping packets between vlan and ethernet interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+uxbxB8vTWXhOuW4-weP-NO2yFbbs15cJh7+BJtjSSkA@mail.gmail.com>
Date:   Tue, 29 Oct 2019 11:54:31 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Ivan Babrou <ivan@...udflare.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...udflare.com>
Subject: Re: fq dropping packets between vlan and ethernet interfaces

On Tue, Oct 29, 2019 at 11:41 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 29, 2019 at 11:35 AM Ivan Babrou <ivan@...udflare.com> wrote:
> >
> > 5.4-rc5 has it, but we still experience the issue.
>
> Please refrain from top-posting on netdev@
>
> You could try the debug patch I have posted earlier.
>
> Something like :
>
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 98dd87ce15108cfe1c011da44ba32f97763776c8..2b9697e05115d334fd6d3a2909d5112d04032420
> 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -380,9 +380,14 @@ static void flow_queue_add(struct fq_flow *flow,
> struct sk_buff *skb)
>  {
>         struct rb_node **p, *parent;
>         struct sk_buff *head, *aux;
> +       u64 now = ktime_get_ns();
>
> -       fq_skb_cb(skb)->time_to_send = skb->tstamp ?: ktime_get_ns();
> -
> +       if (skb->tstamp) {
> +               WARN_ON_ONCE(skb->tstamp - now > 30LLU * NSEC_PER_SEC);

Probably needs to use s64 as in :

WARN_ON_ONCE((s64)(skb->tstamp - now) > (s64)(30LLU * NSEC_PER_SEC));

> +               fq_skb_cb(skb)->time_to_send = skb->tstamp;
> +       } else {
> +               fq_skb_cb(skb)->time_to_send = now;
> +       }
>         head = flow->head;
>         if (!head ||
>             fq_skb_cb(skb)->time_to_send >=
> fq_skb_cb(flow->tail)->time_to_send) {
>
>
> >
> > On Tue, Oct 29, 2019 at 11:33 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 11:31 AM Ivan Babrou <ivan@...udflare.com> wrote:
> > > >
> > > > I'm on 5.4-rc5. Let me apply e7a409c3f46cb0dbc7bfd4f6f9421d53e92614a5
> > > > on top and report back to you.
> > >
> > >
> > > Oops, wrong copy/paste. I really meant this one :
> > >
> > > 9669fffc1415bb0c30e5d2ec98a8e1c3a418cb9c net: ensure correct
> > > skb->tstamp in various fragmenters
> > >
> > >
> > > >
> > > > On Tue, Oct 29, 2019 at 11:27 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 11:20 AM Ivan Babrou <ivan@...udflare.com> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > We're trying to test Linux 5.4 early and hit an issue with FQ.
> > > > > >
> > > > > > The relevant part of our network setup involves four interfaces:
> > > > > >
> > > > > > * ext0 (ethernet, internet facing)
> > > > > > * vlan101@...0 (vlan)
> > > > > > * int0 (ethernet, lan facing)
> > > > > > * vlan11@...0 (vlan)
> > > > > >
> > > > > > Both int0 and ext0 have fq on them:
> > > > > >
> > > > > > qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> > > > > > buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> > > > > > low_rate_threshold 550Kbit refill_delay 40.0ms
> > > > > > qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> > > > > > buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> > > > > > low_rate_threshold 550Kbit refill_delay 40.0ms
> > > > > >
> > > > > > The issue itself is that after some time ext0 stops feeding off
> > > > > > vlan101, which is visible as tcpdump not seeing packets on ext0, while
> > > > > > they flow over vlan101.
> > > > > >
> > > > > > I can see that fq_dequeue does not report any packets:
> > > > > >
> > > > > > $ sudo perf record -e qdisc:qdisc_dequeue -aR sleep 1
> > > > > > hping3 40335 [006] 63920.881016: qdisc:qdisc_dequeue: dequeue
> > > > > > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > > > > > packets=0 skbaddr=(nil)
> > > > > > hping3 40335 [006] 63920.881030: qdisc:qdisc_dequeue: dequeue
> > > > > > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > > > > > packets=0 skbaddr=(nil)
> > > > > > hping3 40335 [006] 63920.881041: qdisc:qdisc_dequeue: dequeue
> > > > > > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > > > > > packets=0 skbaddr=(nil)
> > > > > > hping3 40335 [006] 63920.881070: qdisc:qdisc_dequeue: dequeue
> > > > > > ifindex=4 qdisc handle=0x10000 parent=0xFFFFFFFF txq_state=0x0
> > > > > > packets=0 skbaddr=(nil)
> > > > > >
> > > > > > Inside of fq_dequeue I'm able to see that we throw away packets in here:
> > > > > >
> > > > > > * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L510
> > > > > >
> > > > > > The output of tc -s qdisc shows the following:
> > > > > >
> > > > > > qdisc fq 1: dev ext0 root refcnt 65 limit 10000p flow_limit 100p
> > > > > > buckets 1024 orphan_mask 1023 quantum 3228 initial_quantum 16140
> > > > > > low_rate_threshold 550Kbit refill_delay 40.0ms
> > > > > >  Sent 4872143400 bytes 8448638 pkt (dropped 201276670, overlimits 0
> > > > > > requeues 103)
> > > > > >  backlog 779376b 10000p requeues 103
> > > > > >   2806 flows (2688 inactive, 118 throttled), next packet delay
> > > > > > 1572240566653952889 ns
> > > > > >   354201 gc, 0 highprio, 804560 throttled, 3919 ns latency, 19492 flows_plimit
> > > > > > qdisc fq 8003: dev int0 root refcnt 65 limit 10000p flow_limit 100p
> > > > > > buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140
> > > > > > low_rate_threshold 550Kbit refill_delay 40.0ms
> > > > > >  Sent 15869093876 bytes 17387110 pkt (dropped 0, overlimits 0 requeues 2817)
> > > > > >  backlog 0b 0p requeues 2817
> > > > > >   2047 flows (2035 inactive, 0 throttled)
> > > > > >   225074 gc, 10 highprio, 102308 throttled, 7525 ns latency
> > > > > >
> > > > > > The key part here is probably that next packet delay for ext0 is the
> > > > > > current unix timestamp in nanoseconds. Naturally, we see this code
> > > > > > path being executed:
> > > > > >
> > > > > > * https://elixir.bootlin.com/linux/v5.4-rc2/source/net/sched/sch_fq.c#L462
> > > > > >
> > > > > > Unfortunately, I don't have a reliable reproduction for this issue. It
> > > > > > appears naturally with some traffic and I can do limited tracing with
> > > > > > perf and bcc tools while running hping3 to generate packets.
> > > > > >
> > > > > > The issue goes away if I replace fq with pfifo_fast on ext0.
> > > > >
> > > > > At which commit is your tree  precisely ?
> > > > >
> > > > > This sounds like the recent fix we had for fragmented packets.
> > > > >
> > > > > e7a409c3f46cb0dbc7bfd4f6f9421d53e92614a5 ipv4: fix IPSKB_FRAG_PMTU
> > > > > handling with fragmentation