[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c88b999f-39f9-4771-bb28-b6b0cd5ba22c@bytedance.com>
Date: Tue, 16 Apr 2024 11:20:19 +0800
From: zhenwei pi <pizhenwei@...edance.com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, virtualization@...ts.linux.dev
Cc: mst@...hat.com, jasowang@...hat.com, xuanzhuo@...ux.alibaba.com,
akpm@...ux-foundation.org
Subject: Re: Re: [RFC 0/3] Improve memory statistics for virtio balloon
On 4/15/24 23:01, David Hildenbrand wrote:
> On 15.04.24 10:41, zhenwei pi wrote:
>> Hi,
>>
>> When the guest runs under critial memory pressure, the guest becomss
>> too slow, even sshd turns D state(uninterruptible) on memory
>> allocation. We can't login this VM to do any work on trouble shooting.
>>
>> Guest kernel log via virtual TTY(on host side) only provides a few
>> necessary log after OOM. More detail memory statistics are required,
>> then we can know explicit memory events and estimate the pressure.
>>
>> I'm going to introduce several VM counters for virtio balloon:
>> - oom-kill
>> - alloc-stall
>> - scan-async
>> - scan-direct
>> - reclaim-async
>> - reclaim-direct
>
> IIUC, we're only exposing events that are already getting provided via
> all_vm_events(), correct?
>
Yes, all of these counters come from all_vm_events(). The 'alloc-stall'
is summary of several classes of alloc-stall. please see '[RFC 2/3]
virtio_balloon: introduce memory allocation stall counter'.
> In that case, I don't really see a major issue. Some considerations:
>
> (1) These new events are fairly Linux specific.
>
> PSWPIN and friends are fairly generic, but HGTLB is also already fairly
> Linux specific already. OOM-kills don't really exist on Windows, for
> example. We'll have to be careful of properly describing what the
> semantics are.
>
I also notice FreeBSD supports virtio balloon for a long time, 'OOM
kill' is used on FreeBSD too.(LINK:
https://klarasystems.com/articles/exploring-swap-on-freebsd/)
> (2) How should we handle if Linux ever stops supporting a certain event
> (e.g., major reclaim rework). I assume, simply return nothing like we
> currently would for VIRTIO_BALLOON_S_HTLB_PGALLOC without
> CONFIG_HUGETLB_PAGE.
>
Luckily, virtio balloon stats schema is tag-value style. This way would
be safe enough.
Suggestions in patch [1-3] are good, I'll fix them in the next version
if this series is acceptable.
--
zhenwei pi
Powered by blists - more mailing lists