linux-kernel - Re: On guest free page hinting and OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UcJuD-t+MqeS9geiGE1zsUiYUgZzeRrOJOJbOzn2C-KOw@mail.gmail.com>
Date:   Mon, 1 Apr 2019 13:56:30 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Nitesh Narayan Lal <nitesh@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, wei.w.wang@...el.com,
        Yang Zhang <yang.zhang.wz@...il.com>,
        Rik van Riel <riel@...riel.com>, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com, Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: On guest free page hinting and OOM

On Mon, Apr 1, 2019 at 7:47 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Mon, Apr 01, 2019 at 04:11:42PM +0200, David Hildenbrand wrote:
> > > The interesting thing is most probably: Will the hinting size usually be
> > > reasonable small? At least I guess a guest with 4TB of RAM will not
> > > suddenly get a hinting size of hundreds of GB. Most probably also only
> > > something in the range of 1GB. But this is an interesting question to
> > > look into.
> > >
> > > Also, if the admin does not care about performance implications when
> > > already close to hinting, no need to add the additional 1Gb to the ram size.
> >
> > "close to OOM" is what I meant.
>
> Problem is, host admin is the one adding memory. Guest admin is
> the one that knows about performance.

The thing we have to keep in mind with this is that we are not dealing
with the same behavior as the balloon driver. We don't need to inflate
a massive hint and hand that off. Instead we can focus on performing
the hints on much smaller amounts and do it incrementally over time
with the idea being as the system sits idle it frees up more and more
of the inactive memory on the system.

With that said, I still don't like the idea of us even trying to
target 1GB of RAM for hinting. I think it would be much better if we
stuck to smaller sizes and kept things down to a single digit multiple
of THP or higher order pages. Maybe something like 64MB of total
memory out for hinting.

All we really would need to make it work would be to possibly look at
seeing if we can combine PageType values. Specifically what I would be
looking at is a transition that looks something like Buddy -> Offline
-> (Buddy | Offline). We would have to hold the zone lock at each
transition, but that shouldn't be too big of an issue. If we are okay
with possibly combining the Offline and Buddy types we would have a
way of tracking which pages have been hinted and which have not. Then
we would just have to have a thread running in the background on the
guest that is looking at the higher order pages and pulling 64MB at a
time offline, and when the hinting is done put them back in the "Buddy
| Offline" state.

I view this all as working not too dissimilar to how a standard Rx
ring in a network device works. Only we would want to allocate from
the pool of "Buddy" pages, flag the pages as "Offline", and then when
the hint has been processed we would place them back in the "Buddy"
list with the "Offline" value still set. The only real changes needed
to the buddy allocator would be to add some logic for clearing/merging
the "Offline" setting as necessary, and to provide an allocator that
only works with non-"Offline" pages.