lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dfad5c8-59d2-69a1-cc4c-d530c12ceea9@virtuozzo.com>
Date:   Tue, 2 Aug 2022 11:53:42 +0300
From:   Alexander Atanasov <alexander.atanasov@...tuozzo.com>
To:     David Hildenbrand <david@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>
Cc:     kernel@...nvz.org, virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, stevensd@...omium.org,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Juergen Gross <jgross@...e.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Wei Liu <wei.liu@...nel.org>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Nadav Amit <namit@...are.com>, Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>
Subject: Re: [RFC] how the ballooned memory should be accounted by the drivers
 inside the guests? (was:[PATCH v6 1/2] Create debugfs file with virtio
 balloon usage information)

Hi,

I put some more people on the CC, questions for you at the end , TIA.

On 01/08/2022 23:12, David Hildenbrand wrote:
>> / # cat /sys/kernel/debug/virtio-balloon
>> inflated: -2097152 kB
> What's the rationale of making it negative?

As suggested earlier indicate how the memory is accounted in the two 
different cases. Negative means it is subtracted from MemTotal . 
Positive means it is accounted as used .

>> To join the threads:
>>
>>>> Always account inflated memory as used for both cases - with and
>>>> without deflate on oom. Do not change total ram which can confuse
>>>> userspace and users.
>>> Sorry, but NAK.
>> Ok.
>>
>>> This would affect existing users / user space / balloon stats. For example
>>> HV just recently switch to properly using adjust_managed_page_count()
>>
>> I am wondering what's the rationale behind this i have never seen such users
>> that expect it to work like this. Do you have any pointers to such users, so
>> i can understood why they do so ?
> We adjust total pages and managed pages to simulate what memory is
> actually available to the system (just like during memory hot(un)plug).
> Even though the pages are "allocated" by the driver, they are actually
> unusable for the system, just as if they would have been offlined.
> Strictly speaking, the guest OS can kill as many processes as it wants,
> it cannot reclaim that memory, as it's logically no longer available.
>
> There is nothing (valid, well, except driver unloading) the guest can do
> to reuse these pages. The hypervisor has to get involved first to grant
> access to some of these pages again (deflate the balloon).
>
> It's different with deflate-on-oom: the guest will *itself* decide to
> reuse inflated pages to deflate them. So the allocated pages can become
> back usable easily. There was a recent discussion for virtio-balloon to
> change that behavior when it's known that the hypervisor essentially
> implements "deflate-on-oom" by looking at guest memory stats and
> adjusting the balloon size accordingly; however, as long as we don't
> know what the hypervisor does or doesn't do, we have to keep the
> existing behavior.
>
> Note that most balloon drivers under Linux share that behavior.
>
> In case of Hyper-V I remember a customer BUG report that requested that
> exact behavior, however, I'm not able to locate the BZ quickly.
> [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
> (note that I can't easily find the original mail in the archives)

VMWare does not, Xen do, HV do (but it didn't) - Virtio does both.

For me the confusion comes from mixing ballooning and hot plug.

Ballooning is like a heap inside the guest from which the host can 
allocate/deallocate pages, if there is a mechanism for the guest to ask 
the host for more/to free/ pages or the host have a heuristic to monitor 
the guest and inflate/deflate the guest it is a matter of implementation.

Hot plug is adding  to MemTotal and it is not a random event either in 
real or virtual environment -  so you can act upon it. MemTotal  goes 
down on hot unplug and if pages get marked as faulty RAM.

Historically MemTotal is a stable value ( i agree with most of David 
Stevens points) and user space is expecting it to be stable , 
initialized at startup and it does not expect it to change.

Used is what changes and that is what user space expects to change.

Delfate on oom might have been a mistake but it is there and if anything 
depends on changing MemTotal  it will be broken by that option.  How 
that can be fixed?

I agree that the host can not reclaim what is marked as used  but should 
it be able to ? May be it will be good to teach oom killer that there 
can be such ram that can not be reclaimed.

> Note: I suggested under [1] to expose inflated pages via /proc/meminfo
> directly. We could do that consistently over all balloon drivers ...
> doesn't sound too crazy.

Initally i wanted to do exactly this BUT:
- some drivers prefer to expose some more internal information in the file.
- a lot of user space is using meminfo so better keep it as is to avoid breaking something, ballooning is not very frequently used.


Please, share your view on how the ballooned memory should be accounted by the drivers inside the guests so we can work towards consistent behaviour:

Should the inflated memory be accounted as Used or MemTotal be adjusted?

Should the inflated memory be added to /proc/meminfo ?

-- 
Regards,
Alexander Atanasov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ