linux-kernel - Re: [PATCH v3] mm/vmalloc: lock contention optimization under multi-threading

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZddDdxcdD5hNpyUX@pc636>
Date: Thu, 22 Feb 2024 13:52:07 +0100
From: Uladzislau Rezki <urezki@...il.com>
To: rulinhuang <rulin.huang@...el.com>
Cc: akpm@...ux-foundation.org, urezki@...il.com, colin.king@...el.com,
	hch@...radead.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	lstoakes@...il.com, tianyou.li@...el.com, tim.c.chen@...el.com,
	wangyang.guo@...el.com, zhiguo.zhou@...el.com
Subject: Re: [PATCH v3] mm/vmalloc: lock contention optimization under
 multi-threading

Hello, Rulinhuang!

> Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch 
> mm-unstable and remeasured it. Could you kindly help confirm if 
> this is the right base to work on?
> Compared to the previous result at kernel v6.7 with a 5% performance 
> gain on intel icelake(160 vcpu), we only had a 0.6% with this commit 
> base. But we think our modification still has some significance. On 
> the one hand, this does reduce a critical section. On the other hand, 
> we have a 4% performance gain on intel sapphire rapids(224 vcpu), 
> which suggests more performance improvement would likely be achieved 
> when the core count of processors increases to hundreds or 
> even thousands.
> Thank you again for your comments.
>
According to the patch that was a correct rebase. Right a small delta
on your 160 CPUs is because of removing a contention. As for bigger
systems it is bigger impact, like you point here on your 224 vcpu
results where you see %4 perf improvement.

So we should fix it. But the way how it is fixed is not optimal from
my point of view, because the patch that is in question spreads the
internals from alloc_vmap_area(), like inserting busy area, across
many parts now.

--
Uladzislau Rezki