[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YMBa8Ms9rL795OdS@yekko>
Date: Wed, 9 Jun 2021 16:08:48 +1000
From: David Gibson <david@...son.dropbear.id.au>
To: Leonardo Brás <leobras.c@...il.com>
Cc: Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Sandipan Das <sandipan@...ux.ibm.com>,
Mike Rapoport <rppt@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>,
Nathan Lynch <nathanl@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>,
Scott Cheloha <cheloha@...ux.ibm.com>,
Laurent Dufour <ldufour@...ux.ibm.com>,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs
on memory hotunplug
On Wed, Jun 09, 2021 at 02:30:36AM -0300, Leonardo Brás wrote:
> On Mon, 2021-06-07 at 15:20 +1000, David Gibson wrote:
> > On Fri, Apr 30, 2021 at 11:36:10AM -0300, Leonardo Bras wrote:
> > > During memory hotunplug, after each LMB is removed, the HPT may be
> > > resized-down if it would map a max of 4 times the current amount of
> > > memory.
> > > (2 shifts, due to introduced histeresis)
> > >
> > > It usually is not an issue, but it can take a lot of time if HPT
> > > resizing-down fails. This happens because resize-down failures
> > > usually repeat at each LMB removal, until there are no more bolted
> > > entries
> > > conflict, which can take a while to happen.
> > >
> > > This can be solved by doing a single HPT resize at the end of
> > > memory
> > > hotunplug, after all requested entries are removed.
> > >
> > > To make this happen, it's necessary to temporarily disable all HPT
> > > resize-downs before hotunplug, re-enable them after hotunplug ends,
> > > and then resize-down HPT to the current memory size.
> > >
> > > As an example, hotunplugging 256GB from a 385GB guest took 621s
> > > without
> > > this patch, and 100s after applied.
> > >
> > > Signed-off-by: Leonardo Bras <leobras.c@...il.com>
> >
> > Hrm. This looks correct, but it seems overly complicated.
> >
> > AFAICT, the resize calls that this adds should in practice be the
> > *only* times we call resize, all the calls from the lower level code
> > should be suppressed.
>
> That's correct.
>
> > In which case can't we just remove those calls
> > entirely, and not deal with the clunky locking and exclusion here.
> > That should also remove the need for the 'shrinking' parameter in
> > 1/3.
>
>
> If I get your suggestion correctly, you suggest something like:
> 1 - Never calling resize_hpt_for_hotplug() in
> hash__remove_section_mapping(), thus not needing the srinking
> parameter.
> 2 - Functions in hotplug-memory.c that call dlpar_remove_lmb() would in
> fact call another function to do the batch resize_hpt_for_hotplug() for
> them
Basically, yes.
> If so, that assumes that no other function that currently calls
> resize_hpt_for_hotplug() under another path, or if they do, it does not
> need to actually resize the HPT.
>
> Is the above correct?
>
> There are some examples of functions that currently call
> resize_hpt_for_hotplug() by another path:
>
> add_memory_driver_managed
> virtio_mem_add_memory
> dev_dax_kmem_probe
Oh... virtio-mem. I didn't think of that.
> reserve_additional_memory
> balloon_process
> add_ballooned_pages
AFAICT this comes from drivers/xen, and Xen has never been a thing on
POWER.
> __add_memory
> probe_store
So this is a sysfs triggered memory add. If the user is doing this
manually, then I think it's reasonable for them to manually manage the
HPT size as well, which they can do through debugfs. I think it might
also be used my drmgr under pHyp, but pHyp doesn't support HPT
resizing.
> __remove_memory
> pseries_remove_memblock
Huh, this one comes through OF_RECONFIG_DETACH_NODE. I don't really
know when those happen, but I strongly suspect it's only under pHyp
again.
> remove_memory
> dev_dax_kmem_remove
> virtio_mem_remove_memory
virtio-mem again.
> memunmap_pages
> pci_p2pdma_add_resource
> virtio_fs_setup_dax
And virtio-fs in dax mode. Didn't think of that either.
Ugh, yeah, I'm used to the world where the platform provides the only
way of hotplugging memory, but virtio-mem does indeed provide another
one, and we could indeed need to manage the HPT size based on that.
Drat, so moving all the HPT resizing handling up into
pseries/hotplug-memory.c won't work.
I still think we can simplify the communication between the stuff in
the pseries hotplug code and the actual hash resizing. In your draft
there are kind of 3 ways the information is conveyed: the mutex
suppresses HPT shrinks, pre-growing past what we need prevents HPT
grows, and the 'shrinking' flag handles some edge cases.
I suggest instead a single flag that will suppress all the current
resizes. Not sure it technically has to be an atomic mutex, but
that's probably the obvious safe choice. Then have a "resize up to
target" and "resize down to target" that ignore that suppression and
are no-ops if the target is in the other direction.
Then you should be able to make the path for pseries hotplugs be:
suppress other resizes
resize up to target
do the actual adds or removes
resize down to target
unsuppress other resizes
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists