lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 16 May 2016 01:34:36 +0300 From: Roman Yeryomin <leroi.lists@...il.com> To: Dave Taht <dave.taht@...il.com> Cc: Jesper Dangaard Brouer <brouer@...hat.com>, Felix Fietkau <nbd@....name>, Jonathan Morton <chromatix99@...il.com>, "codel@...ts.bufferbloat.net" <codel@...ts.bufferbloat.net>, ath10k <ath10k@...ts.infradead.org>, make-wifi-fast@...ts.bufferbloat.net, Rafał Miłecki <zajec5@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, OpenWrt Development List <openwrt-devel@...ts.openwrt.org>, Michal Kazior <michal.kazior@...to.com> Subject: Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood) On 6 May 2016 at 22:43, Dave Taht <dave.taht@...il.com> wrote: > On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists@...il.com> wrote: >> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists@...il.com> wrote: >>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer@...hat.com> wrote: >>>> >>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2] >>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing). >>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project >>>> is in some kind of conflict. >>>> >>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349 >>>> >>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335 >>> >>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a >>> bit with fq_codel limits I was able to get 420Mbps UDP like this: >>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256 >> >> Forgot to mention, I've reduced drop_batch_size down to 32 > > 0) Not clear to me if that's the right line, there are 4 wifi queues, > and the third one > is the BE queue. That was an example, sorry, should have stated that. I've applied same settings to all 4 queues. > That is too low a limit, also, for normal use. And: > for the purpose of this particular UDP test, flows 16 is ok, but not > ideal. I played with different combinations, it doesn't make any (significant) difference: 20-30Mbps, not more. What numbers would you propose? > 1) What's the tcp number (with a simultaneous ping) with this latest patchset? > (I care about tcp performance a lot more than udp floods - surviving a > udp flood yes, performance, no) During the test (both TCP and UDP) it's roughly 5ms in average, not running tests ~2ms. Actually I'm now wondering if target is working at all, because I had same result with target 80ms.. So, yes, latency is good, but performance is poor. > before/after? > > tc -s qdisc show dev wlan0 during/after results? during the test: qdisc mq 0: root Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 17) backlog 1545794b 1021p requeues 17 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 17) backlog 1541252b 1018p requeues 17 maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 after the test (60sec): qdisc mq 0: root Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 28) backlog 0b 0p requeues 28 maxpacket 1514 drop_overlimit 2770176 new_flow_count 64 ecn_mark 0 new_flows_len 0 old_flows_len 1 qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514 target 80.0ms ce_threshold 32us interval 100.0ms ecn Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 > IF you are doing builds for the archer c7v2, I can join in on this... (?) I'm not but I have c7 somewhere, so I can do a build for it and also test, so we are on the same page. > I did do a test of the ath10k "before", fq_codel *never engaged*, and > tcp induced latencies under load, e at 100mbit, cracked 600ms, while > staying flat (20ms) at 100mbit. (not the same patches you are testing) > on x86. I have got tcp 300Mbit out of an osx box, similar latency, > have yet to get anything more on anything I currently have > before/after patchsets. > > I'll go add flooding to the tests, I just finished a series comparing > two different speed stations and life was good on that. > > "before" - fq_codel never engages, we see seconds of latency under load. > > root@...2:~# tc -s qdisc show dev wlp4s0 > qdisc mq 0: root > Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 > new_flows_len 0 old_flows_len 0 > qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0 > new_flows_len 0 old_flows_len 1 > qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0 > new_flows_len 1 old_flows_len 3 > qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 > target 5.0ms interval 100.0ms ecn > Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0 > new_flows_len 1 old_flows_len 0 > ``` > > >>> This is certainly better than 30Mbps but still more than two times >>> less than before (900). > > The number that I still am not sure we got is that you were sending > 900mbit udp and recieving 900mbit on the prior tests? 900 was sending, AP POV (wifi client is downloading) >>> TCP also improved a little (550 to ~590). > > The limit is probably a bit low, also. You might want to try target > 20ms as well. I've tried limit up to 1024 and target up to 80ms >>> >>> Felix, others, do you want to see the ported patch, maybe I did something wrong? >>> Doesn't look like it will save ath10k from performance regression. > > what was tcp "before"? (I'm sorry, such a long thread) 750Mbps >>> >>>> >>>> On Fri, 6 May 2016 11:42:43 +0200 >>>> Jesper Dangaard Brouer <brouer@...hat.com> wrote: >>>> >>>>> Hi Felix, >>>>> >>>>> This is an important fix for OpenWRT, please read! >>>>> >>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, >>>>> without also adjusting q->flows_cnt. Eric explains below that you must >>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just >>>>> adjust it to 128) >>>>> >>>>> Problematic OpenWRT commit in question: >>>>> http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e >>>>> 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)") >>>>> >>>>> >>>>> I also highly recommend you cherry-pick this very recent commit: >>>>> net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") >>>>> https://git.kernel.org/davem/net-next/c/9d18562a227 >>>>> >>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode. >>>>> The problem is that drop mode was considered rare, and implementation >>>>> wise it was chosen to be more expensive (to save cycles on normal mode). >>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is >>>>> especially expensive for smaller devices, as it scans a 4K big array, >>>>> thus 64 cache misses for small devices! >>>>> >>>>> The fix is to allow drop-mode to bulk-drop more packets when entering >>>>> drop-mode (default 64 bulk drop). That way we don't suddenly >>>>> experience a significantly higher processing cost per packet, but >>>>> instead can amortize this. >>>>> >>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk >>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount >>>>> of memory to scan is less, but their CPU is also much smaller). >>>>> >>>>> --Jesper >>>>> >>>>> >>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet@...il.com> wrote: >>>>> >>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote: >>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet@...il.com> wrote: >>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote: >>>>> > > > >>>>> > > >> >>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024 >>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn >>>>> > > >> Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) >>>>> > > >> backlog 0b 0p requeues 0 >>>>> > > >> maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 >>>>> > > >> new_flows_len 0 old_flows_len 0 >>>>> > > > >>>>> > > > >>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think. >>>>> > > > >>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet, >>>>> > > > which is almost the same than having no queue at all) >>>>> > > > >>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a >>>>> > > > chance to trigger. >>>>> > > > >>>>> > > > So you could either reduce number of buckets to 128 (if memory is >>>>> > > > tight), or increase limit to 8192. >>>>> > > >>>>> > > Will try, but what I've posted is default, I didn't change/configure that. >>>>> > >>>>> > fq_codel has a default of 10240 packets and 1024 buckets. >>>>> > >>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413 >>>>> > >>>>> > If someone changed that in the linux variant you use, he probably should >>>>> > explain the rationale. >>>> >>>> -- >>>> Best regards, >>>> Jesper Dangaard Brouer >>>> MSc.CS, Principal Kernel Engineer at Red Hat >>>> Author of http://www.iptv-analyzer.org >>>> LinkedIn: http://www.linkedin.com/in/brouer > > > > -- > Dave Täht > Let's go make home routers and wifi faster! With better software! > http://blog.cerowrt.org
Powered by blists - more mailing lists