linux-kernel - Re: [PATCH v1 6/6] virtio-balloon: Add support for aerating memory via hinting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UeRy2eHKnz4CorefBAG8ro+3h4oFX+z1JY2qRm17fcV8w@mail.gmail.com>
Date:   Thu, 18 Jul 2019 13:29:14 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Nitesh Narayan Lal <nitesh@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yang Zhang <yang.zhang.wz@...il.com>, pagupta@...hat.com,
        Rik van Riel <riel@...riel.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        lcapitulino@...hat.com, wei.w.wang@...el.com,
        Andrea Arcangeli <aarcange@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>, dan.j.williams@...el.com,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>
Subject: Re: [PATCH v1 6/6] virtio-balloon: Add support for aerating memory
 via hinting

On Thu, Jul 18, 2019 at 9:07 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Thu, Jul 18, 2019 at 08:34:37AM -0700, Alexander Duyck wrote:
> > On Wed, Jul 17, 2019 at 10:14 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 09:43:52AM -0700, Alexander Duyck wrote:
> > > > On Wed, Jul 17, 2019 at 3:28 AM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2019 at 02:06:59PM -0700, Alexander Duyck wrote:
> > > > > > On Tue, Jul 16, 2019 at 10:41 AM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > > > >
> > > > > > <snip>
> > > > > >
> > > > > > > > > This is what I am saying. Having watched that patchset being developed,
> > > > > > > > > I think that's simply because processing blocks required mm core
> > > > > > > > > changes, which Wei was not up to pushing through.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If we did
> > > > > > > > >
> > > > > > > > >         while (1) {
> > > > > > > > >                 alloc_pages
> > > > > > > > >                 add_buf
> > > > > > > > >                 get_buf
> > > > > > > > >                 free_pages
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > We'd end up passing the same page to balloon again and again.
> > > > > > > > >
> > > > > > > > > So we end up reserving lots of memory with alloc_pages instead.
> > > > > > > > >
> > > > > > > > > What I am saying is that now that you are developing
> > > > > > > > > infrastructure to iterate over free pages,
> > > > > > > > > FREE_PAGE_HINT should be able to use it too.
> > > > > > > > > Whether that's possible might be a good indication of
> > > > > > > > > whether the new mm APIs make sense.
> > > > > > > >
> > > > > > > > The problem is the infrastructure as implemented isn't designed to do
> > > > > > > > that. I am pretty certain this interface will have issues with being
> > > > > > > > given small blocks to process at a time.
> > > > > > > >
> > > > > > > > Basically the design for the FREE_PAGE_HINT feature doesn't really
> > > > > > > > have the concept of doing things a bit at a time. It is either
> > > > > > > > filling, stopped, or done. From what I can tell it requires a
> > > > > > > > configuration change for the virtio balloon interface to toggle
> > > > > > > > between those states.
> > > > > > >
> > > > > > > Maybe I misunderstand what you are saying.
> > > > > > >
> > > > > > > Filling state can definitely report things
> > > > > > > a bit at a time. It does not assume that
> > > > > > > all of guest free memory can fit in a VQ.
> > > > > >
> > > > > > I think where you and I may differ is that you are okay with just
> > > > > > pulling pages until you hit OOM, or allocation failures. Do I have
> > > > > > that right?
> > > > >
> > > > > This is exactly what the current code does. But that's an implementation
> > > > > detail which came about because we failed to find any other way to
> > > > > iterate over free blocks.
> > > >
> > > > I get that. However my concern is that permeated other areas of the
> > > > implementation that make taking another approach much more difficult
> > > > than it needs to be.
> > >
> > > Implementation would have to change to use an iterator obviously. But I don't see
> > > that it leaked out to a hypervisor interface.
> > >
> > > In fact take a look at virtio_balloon_shrinker_scan
> > > and you will see that it calls shrink_free_pages
> > > without waiting for the device at all.
> >
> > Yes, and in case you missed it earlier I am pretty sure that leads to
> > possible memory corruption. I don't think it was tested enough to be
> > able to say that is safe.
>
> More testing would be good, for sure.
>
> > Specifically we cannot be clearing the dirty flag on pages that are in
> > use. We should only be clearing that flag for pages that are
> > guaranteed to not be in use.
>
> I think that clearing the dirty flag is safe if the flag was originally
> set and the page has been
> write-protected before reporting was requested.
> In that case we know that page has not been changed.
> Right?

I am just going to drop the rest of this thread as I agree we have
been running ourselves around in circles. The part I had missed was
the part where there are 2 bitmaps and that you are are using
migration_bitmap_sync_precopy() to align the two.

This is just running at the same time as the precopy code and is only
really meant to try and clear the bit before the precopy gets to it
from what I can tell.

So one thing that is still an issue then is that my approach would
only work on the first migration. The problem is the logic I have
implemented assumes that once we have hinted on a page we don't need
to do it again. However in order to support migration you would need
to reset the hinting entirely and start over again after doing a
migration.