lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <962d007f-1bcc-2495-98ab-871160472f78@redhat.com>
Date:   Tue, 19 Feb 2019 07:52:45 -0500
From:   Nitesh Narayan Lal <nitesh@...hat.com>
To:     Andrea Arcangeli <aarcange@...hat.com>,
        Alexander Duyck <alexander.duyck@...il.com>
Cc:     David Hildenbrand <david@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, wei.w.wang@...el.com,
        Yang Zhang <yang.zhang.wz@...il.com>,
        Rik van Riel <riel@...riel.com>, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com
Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting


On 2/18/19 9:46 PM, Andrea Arcangeli wrote:
> Hello,
>
> On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote:
>> essentially fragmented them. I guess hugepaged went through and
>> started trying to reassemble the huge pages and as a result there have
>> been apps that ended up consuming more memory than they would have
>> otherwise since they were using fragments of THP pages after doing an
>> MADV_DONTNEED on sections of the page.
> With relatively recent kernels MADV_DONTNEED doesn't necessarily free
> anything when it's applied to a THP subpage, it only splits the
> pagetables and queues the THP for deferred splitting. If there's
> memory pressure a shrinker will be invoked and the queue is scanned
> and the THPs are physically splitted, but to be reassembled/collapsed
> after a physical split it requires at least one young pte.
>
> If this is particularly problematic for page hinting, this behavior
> where the MADV_DONTNEED can be undoed by khugepaged (if some subpage is
> being frequently accessed), can be turned off by setting
> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none to
> 0. Then the THP will only be collapsed if all 512 subpages are mapped
> (i.e. they've all be re-allocated by the guest).
>
> Regardless of the max_ptes_none default, keeping the smaller guest
> buddy orders as the last target for page hinting should be good for
> performance.
>
>> Yeah, no problem. The only thing I don't like about MADV_FREE is that
>> you have to have memory pressure before the pages really start getting
>> scrubbed with is both a benefit and a drawback. Basically it defers
>> the freeing until you are under actual memory pressure so when you hit
>> that case things start feeling much slower, that and it limits your
>> allocations since the kernel doesn't recognize the pages as free until
>> it would have to start trying to push memory to swap.
> The guest allocation behavior should not be influenced by MADV_FREE vs
> MADV_DONTNEED, the guest can't see the difference anyway, so why
> should it limit the allocations?
>
> The benefit of MADV_FREE should be that when the same guest frees and
> reallocates an huge amount of RAM (i.e. guest app allocating and
> freeing lots of RAM in a loop, not so uncommon), there will be no KVM
> page fault during guest re-allocations. So in absence of memory
> pressure in the host it should be a major win. Overall it sounds like
> a good tradeoff compared to MADV_DONTNEED that forcefully invokes MMU
> notifiers and forces host allocations and KVM page faults in order to
> reallocate the same RAM in the same guest.
This does makes sense.
Thanks for explaining this.
>
> When there's memory pressure it's up to the host Linux VM to notice
> there's plenty of MADV_FREE material to free at zero I/O cost before
> starting swapping.
>
> Thanks,
> Andrea
-- 
Regards
Nitesh



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ