[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB415796E6C12A3FF5C85AA3EED4182@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 2 May 2024 22:40:44 +0000
From: Michael Kelley <mhklinux@...look.com>
To: David Hildenbrand <david@...hat.com>, "haiyangz@...rosoft.com"
<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>
Subject: RE: [PATCH v2 2/2] hv_balloon: Enable hot-add for memblock sizes >
128 MiB
From: David Hildenbrand <david@...hat.com> Sent: Thursday, May 2, 2024 12:17 AM
>
> On 01.05.24 17:14, mhkelley58@...il.com wrote:
> > From: Michael Kelley <mhklinux@...look.com>
> >
> > The Hyper-V balloon driver supports hot-add of memory in addition
> > to ballooning. Current code hot-adds in fixed size chunks of
> > 128 MiB (fixed constant HA_CHUNK in the code). While this works
> > in Hyper-V VMs with 64 GiB or less or memory where the Linux
> > memblock size is 128 MiB, the hot-add fails for larger memblock
> > sizes because add_memory() expects memory to be added in chunks
> > that match the memblock size. Messages like the following are
> > reported when Linux has a 256 MiB memblock size:
> >
> > [ 312.668859] Block size [0x10000000] unaligned hotplug range:
> > start 0x310000000, size 0x8000000
> > [ 312.668880] hv_balloon: hot_add memory failed error is -22
> > [ 312.668984] hv_balloon: Memory hot add failed
> >
> > Larger memblock sizes are usually used in VMs with more than
> > 64 GiB of memory, depending on the alignment of the VM's
> > physical address space.
> >
> > Fix this problem by having the Hyper-V balloon driver determine
> > the Linux memblock size, and process hot-add requests in that
> > chunk size instead of a fixed 128 MiB. Also update the hot-add
> > alignment requested of the Hyper-V host to match the memblock
> > size.
> >
> > The code changes look significant, but in fact are just a
> > simple text substitution of a new global variable for the
> > previous HA_CHUNK constant. No algorithms are changed except
> > to initialize the new global variable and to calculate the
> > alignment value to pass to Hyper-V. Testing with memblock
> > sizes of 256 MiB and 2 GiB shows correct operation.
> >
> > Signed-off-by: Michael Kelley <mhklinux@...look.com>
> > ---
> > Changes in v2:
> > * Change new global variable name from ha_chunk_pgs to
> > ha_pages_in_chunk [David Hildenbrand]
> > * Use kernel macros ALIGN(), ALIGN_DOWN(), and umin()
> > to simplify code and reduce references to HA_CHUNK. For
> > ease of review, this is done in a new patch preceeding
> > this one. [David Hildenbrand]
> >
> > drivers/hv/hv_balloon.c | 55 +++++++++++++++++++++++++----------------
> > 1 file changed, 34 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index 9f45b8a6762c..e0a1a18041ca 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -425,11 +425,11 @@ struct dm_info_msg {
> > * The range start_pfn : end_pfn specifies the range
> > * that the host has asked us to hot add. The range
> > * start_pfn : ha_end_pfn specifies the range that we have
> > - * currently hot added. We hot add in multiples of 128M
> > - * chunks; it is possible that we may not be able to bring
> > - * online all the pages in the region. The range
> > + * currently hot added. We hot add in chunks equal to the
> > + * memory block size; it is possible that we may not be able
> > + * to bring online all the pages in the region. The range
> > * covered_start_pfn:covered_end_pfn defines the pages that can
> > - * be brough online.
> > + * be brought online.
> > */
> >
> > struct hv_hotadd_state {
> > @@ -505,8 +505,9 @@ enum hv_dm_state {
> >
> > static __u8 recv_buffer[HV_HYP_PAGE_SIZE];
> > static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE];
> > +static unsigned long ha_pages_in_chunk;
> > +
> > #define PAGES_IN_2M (2 * 1024 * 1024 / PAGE_SIZE)
> > -#define HA_CHUNK (128 * 1024 * 1024 / PAGE_SIZE)
> >
> > struct hv_dynmem_device {
> > struct hv_device *dev;
> > @@ -724,21 +725,21 @@ static void hv_mem_hot_add(unsigned long start,
> unsigned long size,
> > unsigned long processed_pfn;
> > unsigned long total_pfn = pfn_count;
> >
> > - for (i = 0; i < (size/HA_CHUNK); i++) {
> > - start_pfn = start + (i * HA_CHUNK);
> > + for (i = 0; i < (size/ha_pages_in_chunk); i++) {
> > + start_pfn = start + (i * ha_pages_in_chunk);
> >
> > scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> > - has->ha_end_pfn += HA_CHUNK;
> > - processed_pfn = umin(total_pfn, HA_CHUNK);
> > + has->ha_end_pfn += ha_pages_in_chunk;
> > + processed_pfn = umin(total_pfn, ha_pages_in_chunk);
> > total_pfn -= processed_pfn;
> > - has->covered_end_pfn += processed_pfn;
> > + has->covered_end_pfn += processed_pfn;
> > }
> >
> > reinit_completion(&dm_device.ol_waitevent);
> >
> > nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
> > ret = add_memory(nid, PFN_PHYS((start_pfn)),
> > - (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE);
> > + (ha_pages_in_chunk << PAGE_SHIFT), MHP_MERGE_RESOURCE);
> >
>
> HA_BYTES_IN_CHUNK might be reasonable to have (see below)
>
> > if (do_hot_add)
> > @@ -1807,10 +1808,13 @@ static int balloon_connect_vsp(struct hv_device *dev)
> > cap_msg.caps.cap_bits.hot_add = hot_add_enabled();
> >
> > /*
> > - * Specify our alignment requirements as it relates
> > - * memory hot-add. Specify 128MB alignment.
> > + * Specify our alignment requirements for memory hot-add. The value is
> > + * the log base 2 of the number of megabytes in a chunk. For example,
> > + * with 256 MiB chunks, the value is 8. The number of MiB in a chunk
> > + * must be a power of 2.
> > */
> > - cap_msg.caps.cap_bits.hot_add_alignment = 7;
> > + cap_msg.caps.cap_bits.hot_add_alignment =
> > + ilog2(ha_pages_in_chunk >> (20 - PAGE_SHIFT));
>
> I was wondering if we can remove some of the magic here. Something along
> the lines of:
>
> ilog2(ha_pages_in_chunk / (SZ_1M >> PAGE_SHIFT))
>
> or simply
>
> #define HA_BYTES_IN_CHUNK (ha_pages_in_chunk << PAGE_SHIFT)
>
> ilog2(HA_BYTES_IN_CHUNK / SZ_1M)
>
>
> Apart from that nothing jumped at me; looks much cleaner.
>
> Reviewed-by: David Hildenbrand <david@...hat.com>
>
David -- I need to respin anyway because I missed a dependency on
CONFIG_MEMORY_HOTPLUG as pointed out by the kernel test robot.
I'll add your suggestion to that respin.
Michael
Powered by blists - more mailing lists