[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160708001909.FB2443E2@viggo.jf.intel.com>
Date: Thu, 07 Jul 2016 17:19:09 -0700
From: Dave Hansen <dave@...1.net>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, linux-mm@...ck.org, torvalds@...ux-foundation.org,
akpm@...ux-foundation.org, bp@...en8.de, ak@...ux.intel.com,
mhocko@...e.com, dave.hansen@...el.com, Dave Hansen <dave@...1.net>
Subject: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum
This patch survived a bunch of testing over the past week, including
on hardware affected by the issue. A debugging patch showed the
"stray" bits being set, and no ill effects were noticed.
Barring any heartburn from folks, I think this is ready for the tip
tree.
--
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
Landing) has an erratum where a processor thread setting the Accessed
or Dirty bits may not do so atomically against its checks for the
Present bit. This may cause a thread (which is about to page fault)
to set A and/or D, even though the Present bit had already been
atomically cleared.
These bits are truly "stray". In the case of the Dirty bit, the
thread associated with the stray set was *not* allowed to write to
the page. This means that we do not have to launder the bit(s); we
can simply ignore them.
More details can be found in the "Specification Update" under "KNL4":
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-phi-processor-specification-update.pdf
If the PTE is used for storing a swap index or a NUMA migration index,
the A bit could be misinterpreted as part of the swap type. The stray
bits being set cause a software-cleared PTE to be interpreted as a
swap entry. In some cases (like when the swap index ends up being
for a non-existent swapfile), the kernel detects the stray value
and WARN()s about it, but there is no guarantee that the kernel can
always detect it.
This patch changes the kernel to attempt to ignore those stray bits
when they get set. We do this by making our swap PTE format
completely ignore the A/D bits, and also by ignoring them in our
pte_none() checks.
Andi Kleen wrote the original version of this patch. Dave Hansen
wrote the later ones.
v4: complete rework: let the bad bits stay around, but try to
ignore them
v3: huge rework to keep batching working in unmap case
v2: out of line. avoid single thread flush. cover more clear
cases
Powered by blists - more mailing lists