[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd866a71-1d1a-1481-ffee-aefe0313ef38@redhat.com>
Date: Wed, 27 Nov 2019 18:37:21 +0100
From: David Hildenbrand <david@...hat.com>
To: Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
Alexander Duyck <alexander.duyck@...il.com>,
kvm@...r.kernel.org, mst@...hat.com, linux-kernel@...r.kernel.org,
willy@...radead.org, mhocko@...nel.org, linux-mm@...ck.org,
akpm@...ux-foundation.org, mgorman@...hsingularity.net,
vbabka@...e.cz
Cc: yang.zhang.wz@...il.com, nitesh@...hat.com, konrad.wilk@...cle.com,
pagupta@...hat.com, riel@...riel.com, lcapitulino@...hat.com,
dave.hansen@...el.com, wei.w.wang@...el.com, aarcange@...hat.com,
pbonzini@...hat.com, dan.j.williams@...el.com, osalvador@...e.de
Subject: Re: [PATCH v14 0/6] mm / virtio: Provide support for unused page
reporting
On 27.11.19 18:36, Alexander Duyck wrote:
> On Wed, 2019-11-27 at 11:01 +0100, David Hildenbrand wrote:
>> On 26.11.19 17:45, Alexander Duyck wrote:
>>> On Tue, 2019-11-26 at 13:20 +0100, David Hildenbrand wrote:
>>>> On 19.11.19 22:46, Alexander Duyck wrote:
>
> <snip>
>
>>>>> Below are the results from various benchmarks. I primarily focused on two
>>>>> tests. The first is the will-it-scale/page_fault2 test, and the other is
>>>>> a modified version of will-it-scale/page_fault1 that was enabled to use
>>>>> THP. I did this as it allows for better visibility into different parts
>>>>> of the memory subsystem. The guest is running with 32G for RAM on one
>>>>> node of a E5-2630 v3. The host has had some power saving features disabled
>>>>> by setting the /dev/cpu_dma_latency value to 10ms.
>>>>>
>>>>> Test page_fault1 (THP) page_fault2
>>>>> Name tasks Process Iter STDEV Process Iter STDEV
>>>>> Baseline 1 1203934.75 0.04% 379940.75 0.11%
>>>>> 16 8828217.00 0.85% 3178653.00 1.28%
>>>>>
>>>>> Patches applied 1 1207961.25 0.10% 380852.25 0.25%
>>>>> 16 8862373.00 0.98% 3246397.25 0.68%
>>>>>
>>>>> Patches enabled 1 1207758.75 0.17% 373079.25 0.60%
>>>>> MADV disabled 16 8870373.75 0.29% 3204989.75 1.08%
>>>>>
>>>>> Patches enabled 1 1261183.75 0.39% 373201.50 0.50%
>>>>> 16 8371359.75 0.65% 3233665.50 0.84%
>>>>>
>>>>> Patches enabled 1 1090201.50 0.25% 376967.25 0.29%
>>>>> page shuffle 16 8108719.75 0.58% 3218450.25 1.07%
>>>>>
>>>>> The results above are for a baseline with a linux-next-20191115 kernel,
>>>>> that kernel with this patch set applied but page reporting disabled in
>>>>> virtio-balloon, patches applied but the madvise disabled by direct
>>>>> assigning a device, the patches applied and page reporting fully
>>>>> enabled, and the patches enabled with page shuffling enabled. These
>>>>> results include the deviation seen between the average value reported here
>>>>> versus the high and/or low value. I observed that during the test memory
>>>>> usage for the first three tests never dropped whereas with the patches
>>>>> fully enabled the VM would drop to using only a few GB of the host's
>>>>> memory when switching from memhog to page fault tests.
>>>>>
>>>>> Most of the overhead seen with this patch set enabled seems due to page
>>>>> faults caused by accessing the reported pages and the host zeroing the page
>>>>> before giving it back to the guest. This overhead is much more visible when
>>>>> using THP than with standard 4K pages. In addition page shuffling seemed to
>>>>> increase the amount of faults generated due to an increase in memory churn.
>>>>
>>>> MADV_FREE would be interesting.
>>>
>>> I can probably code something up. However that is going to push a bunch of
>>> complexity into the QEMU code and doesn't really mean much to the kernel
>>> code. I can probably add it as another QEMU patch to the set since it is
>>> just a matter of having a function similar to ram_block_discard_range that
>>> uses MADV_FREE instead of MADV_DONTNEED.
>>
>> Yes, addon patch makes perfect sense. The nice thing about MADV_FREE is
>> that you only take back pages from a process when really under memory
>> pressure (before going to SWAP). You will still get a pagefault on the
>> next access (to identify that the page is still in use after all), but
>> don't have to fault in a fresh page.
>
> So I got things running with a proof of concept using MADV_FREE.
> Apparently another roadblock I hadn't realized is that you have to have
> the right version of glibc for MADV_FREE to be present.
>
> Anyway with MADV_FREE the numbers actually look pretty close to the
> numbers with the madvise disabled. Apparently the page fault overhead
> isn't all that significant. When I push the next patch set I will include
> the actual numbers, but even with shuffling enabled the results were in
> the 8.7 to 8.8 million iteration range.
>
Cool, thanks for evaluating!
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists