linux-kernel - Re: [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty page state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201123220457.GB12069@willie-the-truck>
Date:   Mon, 23 Nov 2020 22:04:57 +0000
From:   Will Deacon <will@...nel.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...roid.com,
        Catalin Marinas <catalin.marinas@....com>,
        Minchan Kim <minchan@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty
 page state

On Fri, Nov 20, 2020 at 07:49:22PM -0700, Yu Zhao wrote:
> On Fri, Nov 20, 2020 at 01:22:53PM -0700, Yu Zhao wrote:
> > On Fri, Nov 20, 2020 at 02:35:55PM +0000, Will Deacon wrote:
> > > Since commit 0758cd830494 ("asm-generic/tlb: avoid potential double flush"),
> > > TLB invalidation is elided in tlb_finish_mmu() if no entries were batched
> > > via the tlb_remove_*() functions. Consequently, the page-table modifications
> > > performed by clear_refs_write() in response to a write to
> > > /proc/<pid>/clear_refs do not perform TLB invalidation. Although this is
> > > fine when simply aging the ptes, in the case of clearing the "soft-dirty"
> > > state we can end up with entries where pte_write() is false, yet a
> > > writable mapping remains in the TLB.
> > 
> > I don't think we need a TLB flush in this context, same reason as we
> > don't have one in copy_present_pte() which uses ptep_set_wrprotect()
> > to write-protect a src PTE.

Hmm. Afaict, copy_present_pte() is only called on the fork() path when
VM_WIPEONFORK is set. I think that's a bit different to the fault case,
and even then, there is a fullmm flush after the copy.

> > ptep_modify_prot_start/commit() and ptep_set_wrprotect() guarantee
> > either the dirty bit is set (when a PTE is still writable) or a PF
> > happens (when a PTE has become r/o) when h/w page table walker races
> > with kernel that modifies a PTE using the two APIs.
> 
> After we remove the writable bit, if we end up with a clean PTE, any
> subsequent write will trigger a page fault. We can't have a stale
> writable tlb entry. The architecture-specific APIs guarantee this.
> 
> If we end up with a dirty PTE, then yes, there will be a stale
> writable tlb entry. But this won't be a problem because when we
> write-protect a page (not PTE), we always check both pte_dirty()
> and pte_write(), i.e., write_protect_page() and page_mkclean_one().
> When they see this dirty PTE, they will flush. And generally, only
> callers of pte_mkclean() should flush tlb; otherwise we end up one
> extra if callers of pte_mkclean() and pte_wrprotect() both flush.

I just find this sort of analysis incredibly fragile: we're justifying the
lack of TLB invalidation on a case-by-case basis rather than some general
rules that mean it is not required by construction. Even if all current
users don't need it, what means that will still be true in six months time?
It's not like this stuff is easy to trigger in practice if we get it wrong.

Will