linux-kernel - Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <57864A6F.6070202@sr71.net>
Date:	Wed, 13 Jul 2016 07:04:31 -0700
From:	Dave Hansen <dave@...1.net>
To:	Vlastimil Babka <vbabka@...e.cz>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	linux-kernel@...r.kernel.org
Cc:	x86@...nel.org, linux-mm@...ck.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org, bp@...en8.de, ak@...ux.intel.com,
	mhocko@...e.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits
 erratum

On 07/13/2016 04:37 AM, Vlastimil Babka wrote:
> On 07/02/2016 12:28 AM, Benjamin Herrenschmidt wrote:
>> With the errata, don't you have a situation where a processor in
>> the second category will write and set D despite P having been
>> cleared (due to the race) and thus causing us to miss the transfer
>> of that D to the struct
>> page and essentially completely miss that the physical page is dirty ?
> 
> Seems to me like this is indeed possible, but...

No, this isn't possible with the erratum.

I had some off-list follow up with Ben, and included this description in
the later post of the patch:
> These bits are truly "stray".  In the case of the Dirty bit, the
> thread associated with the stray set was *not* allowed to write to
> the page.  This means that we do not have to launder the bit(s); we
> can simply ignore them.


>> (Leading to memory corruption).
> 
> ... what memory corruption, exactly?

In this (non-existent) scenario, we would lose writes to mmap()'d files
because we did not see the dirty bit during the "get" part of
ptep_get_and_clear().

> If a process is writing to its
> memory from one thread and unmapping it from other thread at the same
> time, there are no guarantees anyway?

It's not just unmapping, it's also swap, NUMA migration, etc...  We
clear the PTE, flush, then re-populate it.

> Would anything sensible rely on
> the guarantee that if the write in such racy scenario didn't end up as a
> segfault (i.e. unmapping was faster), then it must hit the disk? Or are
> there any other scenarios where zap_pte_range() is called? Hmm, but how
> does this affect the page migration scenario, can we lose the D bit there?

Yeah, it's not just zap_pte_range(), it's everywhere that we change a
present PTE.

> And maybe related thing that just occured to me, what if page is made
> non-writable during fork() to catch COW? Any race in that one, or just
> the P bit? But maybe the argument would be the same as above...

Yeah, the argument is the same.