lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=sHOh+aXg9abq6_7QLCaqH28Ve1rjSjnHNkZTsE7CuMQ@mail.gmail.com>
Date: Mon, 9 Dec 2024 16:13:47 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Martin Ottens <martin.ottens@....de>
Cc: Stephen Hemminger <stephen@...workplumber.org>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] net/sched: netem: account for backlog updates from
 child qdisc

On Sat, Dec 7, 2024 at 11:37 AM Martin Ottens <martin.ottens@....de> wrote:
>
> On 05.12.24 13:40, Jamal Hadi Salim wrote:
> > Would be nice to see the before and after (your change) output of the
> > stats to illustrate
>
> Setup is as described in my patch. I used a larger limit of
> 1000 for netem so that the overshoot of the qlen becomes more
> visible. Kernel is from the current net-next tree (the patch to
> sch_tbf referenced in my patch is already applied (1596a135e318)).
>

Ok, wasnt aware of this one..

>
> TCP before the fix (qlen is 1150p, exceeding the maximum of 1000p,
> netem qdisc becomes "locked" and stops accepting packets):
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
>  Sent 2760196 bytes 1843 pkt (dropped 389, overlimits 0 requeues 0)
>  backlog 4294560030b 1150p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
>  Sent 2760196 bytes 1843 pkt (dropped 327, overlimits 7356 requeues 0)
>  backlog 0b 0p requeues 0
>
> UDP (iperf3 sends 50Mbit/s) before the fix, no issues here:
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
>  Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 0 requeues 0)
>  backlog 643680b 432p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
>  Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 341057 requeues 0)
>  backlog 311410b 209p requeues 0
>
> TCP after the fix (UDP is not affected by the fix):
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
>  Sent 94859934 bytes 62676 pkt (dropped 15, overlimits 0 requeues 0)
>  backlog 573806b 130p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
>  Sent 94859934 bytes 62676 pkt (dropped 324, overlimits 248442 requeues 0)
>  backlog 4542b 3p requeues 0
>

backlog being > 0 is a problem, unless your results are captured mid
test (instead of end of test)
I will validate on net-next and with your patch.

> > Your fix seems reasonable but I am curious: does this only happen with
> > TCP? If yes, perhaps the
> > GSO handling maybe contributing?
> > Can you run iperf with udp and see if the issue shows up again? Or
> > ping -f with size 1024.
>
> I was only able to reproduce this behavior with tbf and it happens
> only when GSO packets are segmented inside the tbf child qdisc. As
> shown above, UDP is therefore not affected. The behavior also occurs
> if this configuration is used on the "outgoing" interface of a system
> that just forwards packets between two networks and GRO is enabled on
> the "incoming" interface.

Ok, will do a quick check since i have cycles..

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ