[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150224154318.GA14939@dhcp22.suse.cz>
Date: Tue, 24 Feb 2015 16:43:18 +0100
From: Michal Hocko <mhocko@...e.cz>
To: Minchan Kim <minchan@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Mel Gorman <mgorman@...e.de>, Shaohua Li <shli@...nel.org>,
Yalin.Wang@...ymobile.com
Subject: Re: [PATCH RFC 1/4] mm: throttle MADV_FREE
On Tue 24-02-15 17:18:14, Minchan Kim wrote:
> Recently, Shaohua reported that MADV_FREE is much slower than
> MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
> applications went to stall with direct reclaim since kswapd's
> reclaim speed isn't fast than applications's allocation speed
> so that it causes lots of stall and lock contention.
I am not sure I understand this correctly. So the issue is that there is
huge number of MADV_FREE on the LRU and they are not close to the tail
of the list so the reclaim has to do a lot of work before it starts
dropping them?
> This patch throttles MADV_FREEing so it works only if there
> are enough pages in the system which will not trigger backgroud/
> direct reclaim. Otherwise, MADV_FREE falls back to MADV_DONTNEED
> because there is no point to delay freeing if we know system
> is under memory pressure.
Hmm, this is still conforming to the documentation because the kernel is
free to free pages at its convenience. I am not sure this is a good
idea, though. Why some MADV_FREE calls should be treated differently?
Wouldn't that lead to hard to predict behavior? E.g. LIFO reused blocks
would work without long stalls most of the time - except when there is a
memory pressure.
Comparison to MADV_DONTNEED is not very fair IMHO because the scope of the
two calls is different.
> When I test the patch on my 3G machine + 12 CPU + 8G swap,
> test: 12 processes
>
> loop = 5;
> mmap(512M);
Who is eating the rest of the memory?
> while (loop--) {
> memset(512M);
> madvise(MADV_FREE or MADV_DONTNEED);
> }
>
> 1) dontneed: 6.78user 234.09system 0:48.89elapsed
> 2) madvfree: 6.03user 401.17system 1:30.67elapsed
> 3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapsed
>
> It's clearly win.
>
> Reported-by: Shaohua Li <shli@...nel.org>
> Signed-off-by: Minchan Kim <minchan@...nel.org>
I don't know. This looks like a hack with hard to predict consequences
which might trigger pathological corner cases.
> ---
> mm/madvise.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6d0fcb8921c2..81bb26ecf064 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -523,8 +523,17 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
> * XXX: In this implementation, MADV_FREE works like
> * MADV_DONTNEED on swapless system or full swap.
> */
> - if (get_nr_swap_pages() > 0)
> - return madvise_free(vma, prev, start, end);
> + if (get_nr_swap_pages() > 0) {
> + unsigned long threshold;
> + /*
> + * If we have trobule with memory pressure(ie,
> + * under high watermark), free pages instantly.
> + */
> + threshold = min_free_kbytes >> (PAGE_SHIFT - 10);
> + threshold = threshold + (threshold >> 1);
Why threshold += threshold >> 1 ?
> + if (nr_free_pages() > threshold)
> + return madvise_free(vma, prev, start, end);
> + }
> /* passthrough */
> case MADV_DONTNEED:
> return madvise_dontneed(vma, prev, start, end);
> --
> 1.9.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists