lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <151e4778-f4b2-d5ea-eb8e-cbb8d7b26f9b@intel.com> Date: Thu, 16 Feb 2023 18:59:29 +0100 From: Alexander Lobakin <aleksander.lobakin@...el.com> To: Jakub Kicinski <kuba@...nel.org> CC: Saeed Mahameed <saeed@...nel.org>, "David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, Saeed Mahameed <saeedm@...dia.com>, <netdev@...r.kernel.org>, Tariq Toukan <tariqt@...dia.com>, Gal Pressman <gal@...dia.com> Subject: Re: [net-next 1/9] net/mlx5e: Switch to using napi_build_skb() From: Jakub Kicinski <kuba@...nel.org> Date: Thu, 16 Feb 2023 09:53:24 -0800 > On Thu, 16 Feb 2023 18:26:19 +0100 Alexander Lobakin wrote: >>> Before: 26.5 Gbits/sec >>> After: 30.1 Gbits/sec (+13.6%) >> >> +14%, gosh! Happy to see more and more vendors switching to it, someone >> told me back then we have so fast RAM nowadays that it won't make any >> sense to directly recycle kmem-cached objects. Maybe it's fast, but >> seems like not *so* fast :D > > Interestingly I had a similar patch in my tree when testing the skb_ext > cache and enabling slow_gro kills this gain. > > IOW without adding an skb_ext using napi_build_skb() gives me ~12% > boost. If I start adding skb_ext (with the cache and perfect reuse) > I'm back to the baseline (26.5Gbps in this case). > > But without using napi_build_skb() adding skb_ext (with the cache) > doesn't change anything, skb_ext or not, I'll get 26.5Gbps. > > Very finicky. Not sure why this happens. Perhaps napi_build_skb() > let's us fit under some CPU resource constraint and additional > functionality knocks us back over the line? Both skb and skb ext use kmem cache, maybe calling kmem cache related functions like alloc/free touches some global objects (or even locks) we'd like to avoid accessing on hotpath? I'm not deep into the kmem cache, so might be saying something perfectly stupid here :D Nevertheless, it's always fun to see how performance does some weird and counter-intuitive moves sometimes (not speaking of why CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B exists). Thanks, Olek
Powered by blists - more mailing lists