lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZddDdxcdD5hNpyUX@pc636>
Date: Thu, 22 Feb 2024 13:52:07 +0100
From: Uladzislau Rezki <urezki@...il.com>
To: rulinhuang <rulin.huang@...el.com>
Cc: akpm@...ux-foundation.org, urezki@...il.com, colin.king@...el.com,
	hch@...radead.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	lstoakes@...il.com, tianyou.li@...el.com, tim.c.chen@...el.com,
	wangyang.guo@...el.com, zhiguo.zhou@...el.com
Subject: Re: [PATCH v3] mm/vmalloc: lock contention optimization under
 multi-threading

Hello, Rulinhuang!

> Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch 
> mm-unstable and remeasured it. Could you kindly help confirm if 
> this is the right base to work on?
> Compared to the previous result at kernel v6.7 with a 5% performance 
> gain on intel icelake(160 vcpu), we only had a 0.6% with this commit 
> base. But we think our modification still has some significance. On 
> the one hand, this does reduce a critical section. On the other hand, 
> we have a 4% performance gain on intel sapphire rapids(224 vcpu), 
> which suggests more performance improvement would likely be achieved 
> when the core count of processors increases to hundreds or 
> even thousands.
> Thank you again for your comments.
>
According to the patch that was a correct rebase. Right a small delta
on your 160 CPUs is because of removing a contention. As for bigger
systems it is bigger impact, like you point here on your 224 vcpu
results where you see %4 perf improvement.

So we should fix it. But the way how it is fixed is not optimal from
my point of view, because the patch that is in question spreads the
internals from alloc_vmap_area(), like inserting busy area, across
many parts now.

--
Uladzislau Rezki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ