[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240223130946.112890-1-rulin.huang@intel.com>
Date: Fri, 23 Feb 2024 08:09:25 -0500
From: rulinhuang <rulin.huang@...el.com>
To: urezki@...il.com,
bhe@...hat.com
Cc: colin.king@...el.com,
hch@...radead.org,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
lstoakes@...il.com,
rulin.huang@...el.com,
tianyou.li@...el.com,
tim.c.chen@...el.com,
wangyang.guo@...el.com,
zhiguo.zhou@...el.com
Subject: Re: [PATCH v3] mm/vmalloc: lock contention optimization under multi-threading
>
> Hello, Rulinhuang!
>
> > Hi Uladzislau and Andrew, we have rebased it(Patch v4) on branch
> > mm-unstable and remeasured it. Could you kindly help confirm if this
> > is the right base to work on?
> > Compared to the previous result at kernel v6.7 with a 5% performance
> > gain on intel icelake(160 vcpu), we only had a 0.6% with this commit
> > base. But we think our modification still has some significance. On
> > the one hand, this does reduce a critical section. On the other hand,
> > we have a 4% performance gain on intel sapphire rapids(224 vcpu),
> > which suggests more performance improvement would likely be achieved
> > when the core count of processors increases to hundreds or even
> > thousands.
> > Thank you again for your comments.
> >
> According to the patch that was a correct rebase. Right a small delta on your
> 160 CPUs is because of removing a contention. As for bigger systems it is
> bigger impact, like you point here on your 224 vcpu results where you see %4
> perf improvement.
>
> So we should fix it. But the way how it is fixed is not optimal from my point of
> view, because the patch that is in question spreads the internals from
> alloc_vmap_area(), like inserting busy area, across many parts now.
>
> --
> Uladzislau Rezki
Our modifications in patch 5 not only achieve the original effect,
but also cancel the split of alloc_vmap_area()and setup_vmalloc_vm()
is placed without lock and lengthen the critical section.
Without splitting alloc_vmap_area(), putting setup_vmalloc_vm()
directly into it is all we can think of.
Regarding Baoquan’s changes, we think that:
We prefer put setup_vmalloc_vm() function not placed inside the
critical section and it is no need to lengthen the critical section.
We prefer use judging (vm_data) rather than
((!(va_flags & VMAP_RAM) && vm), and it is enough to deetermine the
conditions for assignment. The change seem to be wandering about the
judgment of va_flags.
Hi Uladzislau, could you please let us know if you have any better
suggestions on the modification scheme?
Thank you for your advice!
Powered by blists - more mailing lists