[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dfad5c8-59d2-69a1-cc4c-d530c12ceea9@virtuozzo.com>
Date: Tue, 2 Aug 2022 11:53:42 +0300
From: Alexander Atanasov <alexander.atanasov@...tuozzo.com>
To: David Hildenbrand <david@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>
Cc: kernel@...nvz.org, virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, stevensd@...omium.org,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Juergen Gross <jgross@...e.com>,
Stefano Stabellini <sstabellini@...nel.org>,
Wei Liu <wei.liu@...nel.org>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Nadav Amit <namit@...are.com>, Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>
Subject: Re: [RFC] how the ballooned memory should be accounted by the drivers
inside the guests? (was:[PATCH v6 1/2] Create debugfs file with virtio
balloon usage information)
Hi,
I put some more people on the CC, questions for you at the end , TIA.
On 01/08/2022 23:12, David Hildenbrand wrote:
>> / # cat /sys/kernel/debug/virtio-balloon
>> inflated: -2097152 kB
> What's the rationale of making it negative?
As suggested earlier indicate how the memory is accounted in the two
different cases. Negative means it is subtracted from MemTotal .
Positive means it is accounted as used .
>> To join the threads:
>>
>>>> Always account inflated memory as used for both cases - with and
>>>> without deflate on oom. Do not change total ram which can confuse
>>>> userspace and users.
>>> Sorry, but NAK.
>> Ok.
>>
>>> This would affect existing users / user space / balloon stats. For example
>>> HV just recently switch to properly using adjust_managed_page_count()
>>
>> I am wondering what's the rationale behind this i have never seen such users
>> that expect it to work like this. Do you have any pointers to such users, so
>> i can understood why they do so ?
> We adjust total pages and managed pages to simulate what memory is
> actually available to the system (just like during memory hot(un)plug).
> Even though the pages are "allocated" by the driver, they are actually
> unusable for the system, just as if they would have been offlined.
> Strictly speaking, the guest OS can kill as many processes as it wants,
> it cannot reclaim that memory, as it's logically no longer available.
>
> There is nothing (valid, well, except driver unloading) the guest can do
> to reuse these pages. The hypervisor has to get involved first to grant
> access to some of these pages again (deflate the balloon).
>
> It's different with deflate-on-oom: the guest will *itself* decide to
> reuse inflated pages to deflate them. So the allocated pages can become
> back usable easily. There was a recent discussion for virtio-balloon to
> change that behavior when it's known that the hypervisor essentially
> implements "deflate-on-oom" by looking at guest memory stats and
> adjusting the balloon size accordingly; however, as long as we don't
> know what the hypervisor does or doesn't do, we have to keep the
> existing behavior.
>
> Note that most balloon drivers under Linux share that behavior.
>
> In case of Hyper-V I remember a customer BUG report that requested that
> exact behavior, however, I'm not able to locate the BZ quickly.
> [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
> (note that I can't easily find the original mail in the archives)
VMWare does not, Xen do, HV do (but it didn't) - Virtio does both.
For me the confusion comes from mixing ballooning and hot plug.
Ballooning is like a heap inside the guest from which the host can
allocate/deallocate pages, if there is a mechanism for the guest to ask
the host for more/to free/ pages or the host have a heuristic to monitor
the guest and inflate/deflate the guest it is a matter of implementation.
Hot plug is adding to MemTotal and it is not a random event either in
real or virtual environment - so you can act upon it. MemTotal goes
down on hot unplug and if pages get marked as faulty RAM.
Historically MemTotal is a stable value ( i agree with most of David
Stevens points) and user space is expecting it to be stable ,
initialized at startup and it does not expect it to change.
Used is what changes and that is what user space expects to change.
Delfate on oom might have been a mistake but it is there and if anything
depends on changing MemTotal it will be broken by that option. How
that can be fixed?
I agree that the host can not reclaim what is marked as used but should
it be able to ? May be it will be good to teach oom killer that there
can be such ram that can not be reclaimed.
> Note: I suggested under [1] to expose inflated pages via /proc/meminfo
> directly. We could do that consistently over all balloon drivers ...
> doesn't sound too crazy.
Initally i wanted to do exactly this BUT:
- some drivers prefer to expose some more internal information in the file.
- a lot of user space is using meminfo so better keep it as is to avoid breaking something, ballooning is not very frequently used.
Please, share your view on how the ballooned memory should be accounted by the drivers inside the guests so we can work towards consistent behaviour:
Should the inflated memory be accounted as Used or MemTotal be adjusted?
Should the inflated memory be added to /proc/meminfo ?
--
Regards,
Alexander Atanasov
Powered by blists - more mailing lists