[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170801193341.GA24406@redhat.com>
Date: Tue, 1 Aug 2017 21:33:41 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: Minchan Kim <minchan@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kernel-team <kernel-team@....com>,
Nadav Amit <nadav.amit@...il.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2 4/4] mm: fix KSM data corruption
Hello,
On Tue, Aug 01, 2017 at 02:56:17PM +0900, Minchan Kim wrote:
> CPU0 CPU1 CPU2 CPU3
> ---- ---- ---- ----
> Write the same
> value on page
>
> [cache PTE as
> dirty in TLB]
>
> MADV_FREE
> pte_mkclean()
>
> 4 > clear_refs
> pte_wrprotect()
>
> write_protect_page()
> [ success, no flush ]
>
> pages_indentical()
> [ ok ]
>
> Write to page
> different value
>
> [Ok, using stale
> PTE]
>
> replace_page()
>
> Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0
> already wrote on the page, but KSM ignored this write, and it got lost.
> "
>
> In above scenario, MADV_FREE is fixed by changing TLB batching API
> including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part.
>
> This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm
> and KSM checks pending TLB flush by using mm_tlb_flush_pending so that
> it will flush TLB to avoid data lost if there are other parallel threads
> pending TLB flush.
>
> [1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
>
> Note:
> I failed to reproduce this problem through Nadav's test program which
> need to tune timing in my system speed so didn't confirm it work.
> Nadav, Could you test this patch on your test machine?
Reviewed-by: Andrea Arcangeli <aarcange@...hat.com>
Powered by blists - more mailing lists