[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <vbfmunui7dm.fsf@mellanox.com>
Date: Mon, 21 Jan 2019 11:24:44 +0000
From: Vlad Buslov <vladbu@...lanox.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Yevgeny Kliteynik <kliteyn@...lanox.com>,
Yossef Efraim <yossefe@...lanox.com>,
Maor Gottlieb <maorg@...lanox.com>
Subject: tc filter insertion rate degradation
Hi Eric,
I've been investigating significant tc filter insertion rate degradation
and it seems it is caused by your commit 001c96db0181 ("net: align
gnet_stats_basic_cpu struct"). With this commit insertion rate is
reduced from ~65k rules/sec to ~43k rules/sec when inserting 1m rules
from file in tc batch mode on my machine.
Tc perf profile indicates that pcpu allocator now consumes 2x CPU:
1) Before:
Samples: 63K of event 'cycles:ppp', Event count (approx.): 48796480071
Children Self Co Shared Object Symbol
+ 21.19% 3.38% tc [kernel.vmlinux] [k] pcpu_alloc
+ 3.45% 0.25% tc [kernel.vmlinux] [k] pcpu_alloc_area
2) After:
Samples1: 92K of event 'cycles:ppp', Event count (approx.): 71446806550
Children Self Co Shared Object Symbol
+ 44.67% 3.99% tc [kernel.vmlinux] [k] pcpu_alloc
+ 19.25% 0.22% tc [kernel.vmlinux] [k] pcpu_alloc_area
It seems that it takes much more work for pcpu allocator to perform
allocation with new stricter alignment requirements. Not sure if it is
expected behavior or not in this case.
Regards,
Vlad
Powered by blists - more mailing lists