[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d320cf89-768f-4ae8-8a32-dff4e5699f0a@redhat.com>
Date: Thu, 2 May 2024 09:17:05 +0200
From: David Hildenbrand <david@...hat.com>
To: mhklinux@...look.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
decui@...rosoft.com, linux-kernel@...r.kernel.org,
linux-hyperv@...r.kernel.org
Subject: Re: [PATCH v2 2/2] hv_balloon: Enable hot-add for memblock sizes >
128 MiB
On 01.05.24 17:14, mhkelley58@...il.com wrote:
> From: Michael Kelley <mhklinux@...look.com>
>
> The Hyper-V balloon driver supports hot-add of memory in addition
> to ballooning. Current code hot-adds in fixed size chunks of
> 128 MiB (fixed constant HA_CHUNK in the code). While this works
> in Hyper-V VMs with 64 GiB or less or memory where the Linux
> memblock size is 128 MiB, the hot-add fails for larger memblock
> sizes because add_memory() expects memory to be added in chunks
> that match the memblock size. Messages like the following are
> reported when Linux has a 256 MiB memblock size:
>
> [ 312.668859] Block size [0x10000000] unaligned hotplug range:
> start 0x310000000, size 0x8000000
> [ 312.668880] hv_balloon: hot_add memory failed error is -22
> [ 312.668984] hv_balloon: Memory hot add failed
>
> Larger memblock sizes are usually used in VMs with more than
> 64 GiB of memory, depending on the alignment of the VM's
> physical address space.
>
> Fix this problem by having the Hyper-V balloon driver determine
> the Linux memblock size, and process hot-add requests in that
> chunk size instead of a fixed 128 MiB. Also update the hot-add
> alignment requested of the Hyper-V host to match the memblock
> size.
>
> The code changes look significant, but in fact are just a
> simple text substitution of a new global variable for the
> previous HA_CHUNK constant. No algorithms are changed except
> to initialize the new global variable and to calculate the
> alignment value to pass to Hyper-V. Testing with memblock
> sizes of 256 MiB and 2 GiB shows correct operation.
>
> Signed-off-by: Michael Kelley <mhklinux@...look.com>
> ---
> Changes in v2:
> * Change new global variable name from ha_chunk_pgs to
> ha_pages_in_chunk [David Hildenbrand]
> * Use kernel macros ALIGN(), ALIGN_DOWN(), and umin()
> to simplify code and reduce references to HA_CHUNK. For
> ease of review, this is done in a new patch preceeding
> this one. [David Hildenbrand]
>
> drivers/hv/hv_balloon.c | 55 +++++++++++++++++++++++++----------------
> 1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 9f45b8a6762c..e0a1a18041ca 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -425,11 +425,11 @@ struct dm_info_msg {
> * The range start_pfn : end_pfn specifies the range
> * that the host has asked us to hot add. The range
> * start_pfn : ha_end_pfn specifies the range that we have
> - * currently hot added. We hot add in multiples of 128M
> - * chunks; it is possible that we may not be able to bring
> - * online all the pages in the region. The range
> + * currently hot added. We hot add in chunks equal to the
> + * memory block size; it is possible that we may not be able
> + * to bring online all the pages in the region. The range
> * covered_start_pfn:covered_end_pfn defines the pages that can
> - * be brough online.
> + * be brought online.
> */
>
> struct hv_hotadd_state {
> @@ -505,8 +505,9 @@ enum hv_dm_state {
>
> static __u8 recv_buffer[HV_HYP_PAGE_SIZE];
> static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE];
> +static unsigned long ha_pages_in_chunk;
> +
> #define PAGES_IN_2M (2 * 1024 * 1024 / PAGE_SIZE)
> -#define HA_CHUNK (128 * 1024 * 1024 / PAGE_SIZE)
>
> struct hv_dynmem_device {
> struct hv_device *dev;
> @@ -724,21 +725,21 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
> unsigned long processed_pfn;
> unsigned long total_pfn = pfn_count;
>
> - for (i = 0; i < (size/HA_CHUNK); i++) {
> - start_pfn = start + (i * HA_CHUNK);
> + for (i = 0; i < (size/ha_pages_in_chunk); i++) {
> + start_pfn = start + (i * ha_pages_in_chunk);
>
> scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> - has->ha_end_pfn += HA_CHUNK;
> - processed_pfn = umin(total_pfn, HA_CHUNK);
> + has->ha_end_pfn += ha_pages_in_chunk;
> + processed_pfn = umin(total_pfn, ha_pages_in_chunk);
> total_pfn -= processed_pfn;
> - has->covered_end_pfn += processed_pfn;
> + has->covered_end_pfn += processed_pfn;
> }
>
> reinit_completion(&dm_device.ol_waitevent);
>
> nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
> ret = add_memory(nid, PFN_PHYS((start_pfn)),
> - (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE);
> + (ha_pages_in_chunk << PAGE_SHIFT), MHP_MERGE_RESOURCE);
>
HA_BYTES_IN_CHUNK might be reasonable to have (see below)
> if (do_hot_add)
> @@ -1807,10 +1808,13 @@ static int balloon_connect_vsp(struct hv_device *dev)
> cap_msg.caps.cap_bits.hot_add = hot_add_enabled();
>
> /*
> - * Specify our alignment requirements as it relates
> - * memory hot-add. Specify 128MB alignment.
> + * Specify our alignment requirements for memory hot-add. The value is
> + * the log base 2 of the number of megabytes in a chunk. For example,
> + * with 256 MiB chunks, the value is 8. The number of MiB in a chunk
> + * must be a power of 2.
> */
> - cap_msg.caps.cap_bits.hot_add_alignment = 7;
> + cap_msg.caps.cap_bits.hot_add_alignment =
> + ilog2(ha_pages_in_chunk >> (20 - PAGE_SHIFT));
I was wondering if we can remove some of the magic here. Something along
the lines of:
ilog2(ha_pages_in_chunk / (SZ_1M >> PAGE_SHIFT))
or simply
#define HA_BYTES_IN_CHUNK (ha_pages_in_chunk << PAGE_SHIFT)
ilog2(HA_BYTES_IN_CHUNK / SZ_1M)
Apart from that nothing jumped at me; looks much cleaner.
Reviewed-by: David Hildenbrand <david@...hat.com>
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists