linux-kernel - Re: [PATCH v2 3/4] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170801105924.h4u4ocplofdpylh5@techsingularity.net>
Date:   Tue, 1 Aug 2017 11:59:24 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kernel-team <kernel-team@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Russell King <linux@...linux.org.uk>,
        Tony Luck <tony.luck@...el.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        "David S. Miller" <davem@...emloft.net>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>,
        Jeff Dike <jdike@...toit.com>, linux-arch@...r.kernel.org,
        Nadav Amit <nadav.amit@...il.com>
Subject: Re: [PATCH v2 3/4] mm: fix MADV_[FREE|DONTNEED] TLB flush miss
 problem

On Tue, Aug 01, 2017 at 02:56:16PM +0900, Minchan Kim wrote:
> Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
> problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
> 
> Quote from Mel Gorman
> 
> "The race in question is CPU 0 running madv_free and updating some PTEs
> while CPU 1 is also running madv_free and looking at the same PTEs.
> CPU 1 may have writable TLB entries for a page but fail the pte_dirty
> check (because CPU 0 has updated it already) and potentially fail to flush.
> Hence, when madv_free on CPU 1 returns, there are still potentially writable
> TLB entries and the underlying PTE is still present so that a subsequent write
> does not necessarily propagate the dirty bit to the underlying PTE any more.
> Reclaim at some unknown time at the future may then see that the PTE is still
> clean and discard the page even though a write has happened in the meantime.
> I think this is possible but I could have missed some protection in madv_free
> that prevents it happening."
> 
> This patch aims for solving both problems all at once and is ready for
> other problem with KSM, MADV_FREE and soft-dirty story[3].
> 
> TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
> and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch
> there are parallel threads going on. In that case, forcefully, flush TLB
> to prevent for user to access memory via stale TLB entry although it fail
> to gather page table entry.
> 
> I confiremd this patch works with [4] test program Nadav gave so this patch
> supersedes "mm: Always flush VMA ranges affected by zap_page_range v2"
> in current mmotm.
> 
> NOTE:
> This patch modifies arch-specific TLB gathering interface(x86, ia64,
> s390, sh, um). It seems most of architecture are straightforward but s390
> need to be careful because tlb_flush_mmu works only if mm->context.flush_mm
> is set to non-zero which happens only a pte entry really is cleared by
> ptep_get_and_clear and friends. However, this problem never changes the
> pte entries but need to flush to prevent memory access from stale tlb.
> 
> Any thoughts?
> 

Acked-by: Mel Gorman <mgorman@...hsingularity.net>

-- 
Mel Gorman
SUSE Labs