lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 18 Apr 2024 13:51:04 +0200
From: David Hildenbrand <david@...hat.com>
To: zhenwei pi <pizhenwei@...edance.com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, virtualization@...ts.linux.dev
Cc: mst@...hat.com, jasowang@...hat.com, xuanzhuo@...ux.alibaba.com,
 akpm@...ux-foundation.org
Subject: Re: [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info

On 18.04.24 08:26, zhenwei pi wrote:
> Expose memory scan/reclaim information to the host side via virtio
> balloon device.
> 
> Now we have a metric to analyze the memory performance:
> 
> y: counter increases
> n: counter does not changes
> h: the rate of counter change is high
> l: the rate of counter change is low
> 
> OOM: VIRTIO_BALLOON_S_OOM_KILL
> STALL: VIRTIO_BALLOON_S_ALLOC_STALL
> ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
> DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
> ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
> DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT
> 
> - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
>    the guest runs under really critial memory pressure
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
>    the memory allocation stalls due to cgroup, not the global memory
>    pressure.
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
>    the memory allocation stalls due to global memory pressure. The
>    performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
>    quite effective memory reclaiming.
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
>    the memory allocation stalls due to global memory pressure.
>    the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
>    heavily, the serious case leads poor performance and difficult
>    trouble shooting. Ex, sshd may block on memory allocation when
>    accepting new connections, a user can't login a VM by ssh command.
> 
> - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
>    the low ratio between ARCLM/ASCAN shows that the guest tries to
>    reclaim more memory, but it can't. Once more memory is required in
>    future, it will struggle to reclaim memory.
> 
> Signed-off-by: zhenwei pi <pizhenwei@...edance.com>
> ---
>   drivers/virtio/virtio_balloon.c     |  9 +++++++++
>   include/uapi/linux/virtio_balloon.h | 12 ++++++++++--
>   2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index e88e6573afa5..bc9332c1ae85 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -356,6 +356,15 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>   	stall += events[ALLOCSTALL_MOVABLE];
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
>   
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
> +			pages_to_bytes(events[PGSCAN_KSWAPD]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
> +			pages_to_bytes(events[PGSCAN_DIRECT]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
> +			pages_to_bytes(events[PGSTEAL_KSWAPD]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
> +			pages_to_bytes(events[PGSTEAL_DIRECT]));
> +
>   #ifdef CONFIG_HUGETLB_PAGE
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>   		    events[HTLB_BUDDY_PGALLOC]);
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index 487b893a160e..ee35a372805d 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -73,7 +73,11 @@ struct virtio_balloon_config {
>   #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures */
>   #define VIRTIO_BALLOON_S_OOM_KILL      10 /* OOM killer invocations */
>   #define VIRTIO_BALLOON_S_ALLOC_STALL   11 /* Stall count of memory allocatoin */
> -#define VIRTIO_BALLOON_S_NR       12
> +#define VIRTIO_BALLOON_S_ASYNC_SCAN    12 /* Amount of memory scanned asynchronously */
> +#define VIRTIO_BALLOON_S_DIRECT_SCAN   13 /* Amount of memory scanned directly */
> +#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14 /* Amount of memory reclaimed asynchronously */
> +#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15 /* Amount of memory reclaimed directly */
> +#define VIRTIO_BALLOON_S_NR       16
>   
>   #define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
>   	VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
> @@ -87,7 +91,11 @@ struct virtio_balloon_config {
>   	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
>   	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures", \
>   	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills", \
> -	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls" \
> +	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "async-scans", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "direct-scans", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "async-reclaims", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "direct-reclaims" \
>   }
>   
>   #define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")

Not an expert on these counters/events, but LGTM

Acked-by: David Hildenbrand <david@...hat.com>

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ