lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMm1Qv3_0ak2vtRjSmuW4+zZ7izzBVjDMawfnKm3dLcjyA@mail.gmail.com>
Date: Mon, 9 Dec 2024 17:44:24 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Martin Ottens <martin.ottens@....de>
Cc: Stephen Hemminger <stephen@...workplumber.org>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] net/sched: netem: account for backlog updates from
 child qdisc

On Mon, Dec 9, 2024 at 4:13 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Sat, Dec 7, 2024 at 11:37 AM Martin Ottens <martin.ottens@....de> wrote:
> >
> > On 05.12.24 13:40, Jamal Hadi Salim wrote:
> > > Would be nice to see the before and after (your change) output of the
> > > stats to illustrate
> >
> > Setup is as described in my patch. I used a larger limit of
> > 1000 for netem so that the overshoot of the qlen becomes more
> > visible. Kernel is from the current net-next tree (the patch to
> > sch_tbf referenced in my patch is already applied (1596a135e318)).
> >
>
> Ok, wasnt aware of this one..
>
> >
> > TCP before the fix (qlen is 1150p, exceeding the maximum of 1000p,
> > netem qdisc becomes "locked" and stops accepting packets):
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> >  Sent 2760196 bytes 1843 pkt (dropped 389, overlimits 0 requeues 0)
> >  backlog 4294560030b 1150p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> >  Sent 2760196 bytes 1843 pkt (dropped 327, overlimits 7356 requeues 0)
> >  backlog 0b 0p requeues 0
> >
> > UDP (iperf3 sends 50Mbit/s) before the fix, no issues here:
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> >  Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 0 requeues 0)
> >  backlog 643680b 432p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> >  Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 341057 requeues 0)
> >  backlog 311410b 209p requeues 0
> >
> > TCP after the fix (UDP is not affected by the fix):
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> >  Sent 94859934 bytes 62676 pkt (dropped 15, overlimits 0 requeues 0)
> >  backlog 573806b 130p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> >  Sent 94859934 bytes 62676 pkt (dropped 324, overlimits 248442 requeues 0)
> >  backlog 4542b 3p requeues 0
> >
>
> backlog being > 0 is a problem, unless your results are captured mid
> test (instead of end of test)
> I will validate on net-next and with your patch.
>

Ok, so seems sane to me - but can you please put output on the commit
reflecting after the test is completed?
Something like, before patch (highlighting stuck backlog on netem):

qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 17105543349430145291
 Sent 35220 bytes 43 pkt (dropped 7, overlimits 0 requeues 0)
 backlog 4294958212b 0p requeues 0
qdisc tbf 8003: parent 1: rate 50Mbit burst 1600b lat 224us
 Sent 35220 bytes 43 pkt (dropped 17, overlimits 1 requeues 0)
 backlog 0b 0p requeues 0

And after your patch:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 11503045766577034723
 Sent 42864 bytes 49 pkt (dropped 5, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc tbf 8001: parent 1: rate 50Mbit burst 1600b lat 224us
 Sent 42864 bytes 49 pkt (dropped 16, overlimits 5 requeues 0)
 backlog 0b 0p requeues 0

backlog is now shown as cleared.

Coincidentally, removing your tbf patch (which is already in net-next)
and rerunning the test it didnt seem to matter whether GSO was on or
off (as you can see below backlog is stuck on tbf):


GSO off:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 12925321237200695918
 Sent 26284 bytes 39 pkt (dropped 7, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc tbf 8001: parent 1: rate 50Mbit burst 1600b lat 224us
 Sent 26284 bytes 39 pkt (dropped 17, overlimits 1 requeues 0)
 backlog 4294959726b 0p requeues 0

GSO on:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 18236003995023052493
 Sent 35224 bytes 43 pkt (dropped 7, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc tbf 8002: parent 1: rate 50Mbit burst 1600b lat 224us
 Sent 35224 bytes 43 pkt (dropped 17, overlimits 1 requeues 0)
 backlog 4294958212b 0p requeues 0

Please resubmit the patch - add my acked-by and put the proper
before/after stats.
Fixes is likely: Linux-2.6.12-rc2

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ