lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29f43d5796feed0dec8e8bb98b187d9dac03b900.camel@linux.intel.com>
Date:   Wed, 23 Oct 2019 15:24:41 -0700
From:   Alexander Duyck <alexander.h.duyck@...ux.intel.com>
To:     Nitesh Narayan Lal <nitesh@...hat.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        kvm@...r.kernel.org, mst@...hat.com, linux-kernel@...r.kernel.org,
        willy@...radead.org, mhocko@...nel.org, linux-mm@...ck.org,
        akpm@...ux-foundation.org, mgorman@...hsingularity.net,
        vbabka@...e.cz
Cc:     yang.zhang.wz@...il.com, konrad.wilk@...cle.com, david@...hat.com,
        pagupta@...hat.com, riel@...riel.com, lcapitulino@...hat.com,
        dave.hansen@...el.com, wei.w.wang@...el.com, aarcange@...hat.com,
        pbonzini@...hat.com, dan.j.williams@...el.com, osalvador@...e.de
Subject: Re: [PATCH v12 0/6] mm / virtio: Provide support for unused page
 reporting

On Wed, 2019-10-23 at 07:35 -0400, Nitesh Narayan Lal wrote:
> On 10/22/19 6:27 PM, Alexander Duyck wrote:
> > This series provides an asynchronous means of reporting unused guest
> > pages to a hypervisor so that the memory associated with those pages can
> > be dropped and reused by other processes and/or guests.
> > 

<snip>

> > 
> I think Michal Hocko suggested us to include a brief detail about the background
> explaining how we ended up with the current approach and what all things we have
> already tried.
> That would help someone reviewing the patch-series for the first time to
> understand it in a better way.

I'm not entirely sure it helps. The problem is that even the "brief"
version will probably be pretty long.

>From what I know the first real public discussion of guest memory
overcommit and free page hinting dates back to the 2011 KVM forum and a
presentation by Rik van Riel[0].

Before I got started in the code there was already virtio-balloon free
page hinting[1]. However it was meant to be an all-at-once reporting of
the free pages in the system at a given point in time, and used only for
VM migration. All it does is inflate a balloon until it encounters an OOM
and then it frees the memory back to the guest. One interesting piece that
came out of the work on that patch set was the suggestion by Linus to use
an array based incremental approach[2] which is what I based my later
implementation on.

I believe Nitesh had already been working on his own approach for unused
page hinting for some time at that point. Prior to submitting my RFC there
was already a v7 that had been submitted by Nitesh back in mid 2018[3].
The solution was an array based approach which appeared to instrument
arch_alloc_page and arch_free_page and would prevent allocations while
hinting was occurring.

The first RFC I had written[4] was a synchronous approach that made use of
arch_free_page to make a hypercall that would immediately flag the page as
being unused. However a hypercall per page can be expensive and we ideally
don't want the guest vCPU potentially being hung up while waiting on the
host mmap_sem.

At about this time I believe Nitesh's solution[5] was still trying to keep
an array of pages that were unused and tracking that via arch_free_page.
In the synchronous case it could cause OOM errors, and in the asynchronous
approach it had issues with being overrun and not being able to track
unused pages.

Later I switched to an asynchronous approach[6], originally calling it
"bubble hinting". With the asynchronous approach it is necessary to have a
way to track what pages have been reported and what haven't. I originally
was using the page type to track it as I had a Buddy and a TreatedBuddy,
but ultimately that moved to a "Reported" page flag. In addition I pulled
the counters and pointers out of the free_area/free_list  and instead now
have a stand-alone set of pointers and keep the reported statistics in a
separate dynamic allocation.

Then Nitesh's solution had changed to the bitmap approach[7]. However it
has been pointed out that this solution doesn't deal with sparse memory,
hotplug, and various other issues.

Since then both my approach and Nitesh's approach have been iterating with
mostly minor changes.

[0]: https://www.linux-kvm.org/images/f/ff/2011-forum-memory-overcommit.pdf
[1]: https://lore.kernel.org/lkml/1535333539-32420-1-git-send-email-wei.w.wang@intel.com/
[2]: https://lore.kernel.org/lkml/CA+55aFzqj8wxXnHAdUTiOomipgFONVbqKMjL_tfk7e5ar1FziQ@mail.gmail.com/
[3]: https://www.spinics.net/lists/kvm/msg170113.html
[4]: https://lore.kernel.org/lkml/20190204181118.12095.38300.stgit@localhost.localdomain/
[5]: https://lore.kernel.org/lkml/20190204201854.2328-1-nitesh@redhat.com/
[6]: https://lore.kernel.org/lkml/20190530215223.13974.22445.stgit@localhost.localdomain/
[7]: https://lore.kernel.org/lkml/20190603170306.49099-1-nitesh@redhat.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ