lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 23 Feb 2024 08:09:25 -0500
From: rulinhuang <rulin.huang@...el.com>
To: urezki@...il.com,
	bhe@...hat.com
Cc: colin.king@...el.com,
	hch@...radead.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	lstoakes@...il.com,
	rulin.huang@...el.com,
	tianyou.li@...el.com,
	tim.c.chen@...el.com,
	wangyang.guo@...el.com,
	zhiguo.zhou@...el.com
Subject: Re: [PATCH v3] mm/vmalloc: lock contention optimization under multi-threading

> 
> Hello, Rulinhuang!
> 
> > Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch
> > mm-unstable and remeasured it. Could you kindly help confirm if this
> > is the right base to work on?
> > Compared to the previous result at kernel v6.7 with a 5% performance
> > gain on intel icelake(160 vcpu), we only had a 0.6% with this commit
> > base. But we think our modification still has some significance. On
> > the one hand, this does reduce a critical section. On the other hand,
> > we have a 4% performance gain on intel sapphire rapids(224 vcpu),
> > which suggests more performance improvement would likely be achieved
> > when the core count of processors increases to hundreds or even
> > thousands.
> > Thank you again for your comments.
> >
> According to the patch that was a correct rebase. Right a small delta on your
> 160 CPUs is because of removing a contention. As for bigger systems it is
> bigger impact, like you point here on your 224 vcpu results where you see %4
> perf improvement.
> 
> So we should fix it. But the way how it is fixed is not optimal from my point of
> view, because the patch that is in question spreads the internals from
> alloc_vmap_area(), like inserting busy area, across many parts now.
> 
> --
> Uladzislau Rezki

Our modifications in patch 5 not only achieve the original effect, 
but also cancel the split of alloc_vmap_area()and setup_vmalloc_vm() 
is placed without lock and lengthen the critical section.
Without splitting alloc_vmap_area(), putting setup_vmalloc_vm() 
directly into it is all we can think of.
Regarding Baoquan’s changes, we think that:
We prefer put setup_vmalloc_vm() function not placed inside the 
critical section and it is no need to lengthen the critical section.
We prefer use judging (vm_data) rather than 
((!(va_flags & VMAP_RAM) && vm), and it is enough to deetermine the 
conditions for assignment. The change seem to be wandering about the 
judgment of va_flags.
Hi Uladzislau, could you please let us know if you have any better 
suggestions on the modification scheme?
Thank you for your advice!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ