[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANcMJZB27S2DK_05WTfRAd40iacBr+hF0ivxAxh5Hs5eqaPyNA@mail.gmail.com>
Date: Tue, 1 Dec 2015 14:30:04 -0800
From: John Stultz <john.stultz@...aro.org>
To: Shaohua Li <shli@...nel.org>
Cc: Minchan Kim <minchan@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Michael Kerrisk <mtk.manpages@...il.com>,
linux-api@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Jason Evans <je@...com>, Daniel Micay <danielmicay@...il.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Michal Hocko <mhocko@...e.cz>, yalin.wang2010@...il.com,
bmaurer@...com
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
On Wed, Nov 4, 2015 at 12:00 PM, Shaohua Li <shli@...nel.org> wrote:
> Compared to MADV_DONTNEED, MADV_FREE's lazy memory free is a huge win to reduce
> page fault. But there is one issue remaining, the TLB flush. Both MADV_DONTNEED
> and MADV_FREE do TLB flush. TLB flush overhead is quite big in contemporary
> multi-thread applications. In our production workload, we observed 80% CPU
> spending on TLB flush triggered by jemalloc madvise(MADV_DONTNEED) sometimes.
> We haven't tested MADV_FREE yet, but the result should be similar. It's hard to
> avoid the TLB flush issue with MADV_FREE, because it helps avoid data
> corruption.
>
> The new proposal tries to fix the TLB issue. We introduce two madvise verbs:
>
> MARK_FREE. Userspace notifies kernel the memory range can be discarded. Kernel
> just records the range in current stage. Should memory pressure happen, page
> reclaim can free the memory directly regardless the pte state.
>
> MARK_NOFREE. Userspace notifies kernel the memory range will be reused soon.
> Kernel deletes the record and prevents page reclaim discards the memory. If the
> memory isn't reclaimed, userspace will access the old memory, otherwise do
> normal page fault handling.
>
> The point is to let userspace notify kernel if memory can be discarded, instead
> of depending on pte dirty bit used by MADV_FREE. With these, no TLB flush is
> required till page reclaim actually frees the memory (page reclaim need do the
> TLB flush for MADV_FREE too). It still preserves the lazy memory free merit of
> MADV_FREE.
>
> Compared to MADV_FREE, reusing memory with the new proposal isn't transparent,
> eg must call MARK_NOFREE. But it's easy to utilize the new API in jemalloc.
>
> We don't have code to backup this yet, sorry. We'd like to discuss it if it
> makes sense.
Sorry to be so slow to reply here!
As Minchan mentioned, this is very similar in concept to the volatile
ranges work Minchan and I tried to push for a few years.
Here's some of the coverage (in reverse chronological order)
https://lwn.net/Articles/602650/
https://lwn.net/Articles/592042/
https://lwn.net/Articles/590991/
http://permalink.gmane.org/gmane.linux.kernel.mm/98848
http://permalink.gmane.org/gmane.linux.kernel.mm/98676
https://lwn.net/Articles/522135/
https://lwn.net/Kernel/Index/#Volatile_ranges
If you are interested in reviving the patch set, I'd love to hear
about it. I think its a really compelling feature for kernel
right-sizing of userspace caches.
thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists