[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c09c63c-5c2a-20a4-d68b-a6dc2f88ecaa@suse.cz>
Date: Wed, 13 Jul 2016 13:37:21 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Dave Hansen <dave@...1.net>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, linux-mm@...ck.org, torvalds@...ux-foundation.org,
akpm@...ux-foundation.org, bp@...en8.de, ak@...ux.intel.com,
mhocko@...e.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits
erratum
On 07/02/2016 12:28 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2016-07-01 at 10:46 -0700, Dave Hansen wrote:
>> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
>> Landing) has an erratum where a processor thread setting the Accessed
>> or Dirty bits may not do so atomically against its checks for the
>> Present bit. This may cause a thread (which is about to page fault)
>> to set A and/or D, even though the Present bit had already been
>> atomically cleared.
>
> Interesting.... I always wondered where in the Intel docs did it specify
> that present was tested atomically with setting of A and D ... I couldn't
> find it.
>
> Isn't there a more fundamental issue however that you may actually lose
> those bits ? For example if we do an munmap, in zap_pte_range()
>
> We first exchange all the PTEs with 0 with ptep_get_and_clear_full()
> and we then transfer D that we just read into the struct page.
>
> We rely on the fact that D will never be set again, what we go it a
> "final" D bit. IE. We rely on the fact that a processor either:
>
> - Has a cached PTE in its TLB with D set, in which case it can still
> write to the page until we flush the TLB or
>
> - Doesn't have a cached PTE in its TLB with D set and so will fail
> to do so due to the atomic P check, thus never writing.
>
> With the errata, don't you have a situation where a processor in the second
> category will write and set D despite P having been cleared (due to the
> race) and thus causing us to miss the transfer of that D to the struct
> page and essentially completely miss that the physical page is dirty ?
Seems to me like this is indeed possible, but...
> (Leading to memory corruption).
... what memory corruption, exactly? If a process is writing to its
memory from one thread and unmapping it from other thread at the same
time, there are no guarantees anyway? Would anything sensible rely on
the guarantee that if the write in such racy scenario didn't end up as a
segfault (i.e. unmapping was faster), then it must hit the disk? Or are
there any other scenarios where zap_pte_range() is called? Hmm, but how
does this affect the page migration scenario, can we lose the D bit there?
And maybe related thing that just occured to me, what if page is made
non-writable during fork() to catch COW? Any race in that one, or just
the P bit? But maybe the argument would be the same as above...
Powered by blists - more mailing lists