linux-kernel - Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UdjaPy9TSNVGxmkUoxEpbiB84=n+a4MwYGgi_Hps+wQ_w@mail.gmail.com>
Date:   Tue, 19 Feb 2019 08:23:24 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     David Hildenbrand <david@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Nitesh Narayan Lal <nitesh@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, wei.w.wang@...el.com,
        Yang Zhang <yang.zhang.wz@...il.com>,
        Rik van Riel <riel@...riel.com>, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com
Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

On Mon, Feb 18, 2019 at 6:46 PM Andrea Arcangeli <aarcange@...hat.com> wrote:
>
> Hello,
>
> On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote:
> > essentially fragmented them. I guess hugepaged went through and
> > started trying to reassemble the huge pages and as a result there have
> > been apps that ended up consuming more memory than they would have
> > otherwise since they were using fragments of THP pages after doing an
> > MADV_DONTNEED on sections of the page.
>
> With relatively recent kernels MADV_DONTNEED doesn't necessarily free
> anything when it's applied to a THP subpage, it only splits the
> pagetables and queues the THP for deferred splitting. If there's
> memory pressure a shrinker will be invoked and the queue is scanned
> and the THPs are physically splitted, but to be reassembled/collapsed
> after a physical split it requires at least one young pte.
>
> If this is particularly problematic for page hinting, this behavior
> where the MADV_DONTNEED can be undoed by khugepaged (if some subpage is
> being frequently accessed), can be turned off by setting
> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none to
> 0. Then the THP will only be collapsed if all 512 subpages are mapped
> (i.e. they've all be re-allocated by the guest).
>
> Regardless of the max_ptes_none default, keeping the smaller guest
> buddy orders as the last target for page hinting should be good for
> performance.

Okay, this is good to know. Thanks.

> > Yeah, no problem. The only thing I don't like about MADV_FREE is that
> > you have to have memory pressure before the pages really start getting
> > scrubbed with is both a benefit and a drawback. Basically it defers
> > the freeing until you are under actual memory pressure so when you hit
> > that case things start feeling much slower, that and it limits your
> > allocations since the kernel doesn't recognize the pages as free until
> > it would have to start trying to push memory to swap.
>
> The guest allocation behavior should not be influenced by MADV_FREE vs
> MADV_DONTNEED, the guest can't see the difference anyway, so why
> should it limit the allocations?

Actually I was talking about the host. So if  have a guest that is
using MADV_FREE what I have to do is create an allocation that would
force us to have to access swap and that in turn ends up triggering
the freeing of the pages that were moved to the "Inactive(file)" list
by the MADV_FREE call.

The only reason it came up is that one of my test systems had a small
swap so I ended up having to do multiple allocations and frees in swap
sized increments to free up memory from a large guest that wasn't in
use.

> The benefit of MADV_FREE should be that when the same guest frees and
> reallocates an huge amount of RAM (i.e. guest app allocating and
> freeing lots of RAM in a loop, not so uncommon), there will be no KVM
> page fault during guest re-allocations. So in absence of memory
> pressure in the host it should be a major win. Overall it sounds like
> a good tradeoff compared to MADV_DONTNEED that forcefully invokes MMU
> notifiers and forces host allocations and KVM page faults in order to
> reallocate the same RAM in the same guest.

Right, and I do see that behavior.

> When there's memory pressure it's up to the host Linux VM to notice
> there's plenty of MADV_FREE material to free at zero I/O cost before
> starting swapping.

Right.