linux-kernel - Re: [PATCH 14/15] mm: numa: Flush TLB if NUMA hinting faults race with PTE scan update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131206092400.GJ11295@suse.de>
Date:	Fri, 6 Dec 2013 09:24:00 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Rik van Riel <riel@...hat.com>
Cc:	Alex Thorlton <athorlton@....com>, Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>, hhuang@...hat.com
Subject: Re: [PATCH 14/15] mm: numa: Flush TLB if NUMA hinting faults race
 with PTE scan update

On Thu, Dec 05, 2013 at 03:05:19PM -0500, Rik van Riel wrote:
> On 12/05/2013 02:54 PM, Mel Gorman wrote:
> 
> >I think that's a better fit and a neater fix. Thanks! I think it barriers
> >more than it needs to (definite cost vs maybe cost), the flush can be
> >deferred until we are definitely trying to migrate and the pte case is
> >not guaranteed to be flushed before migration due to pte_mknonnuma causing
> >a flush in ptep_clear_flush to be avoided later. Mashing the two patches
> >together yields this.
> 
> I think this would fix the numa migrate case.
> 

Good. So far I have not been seeing any problems with it at least.

> However, I believe the same issue is also present in
> mprotect(..., PROT_NONE) vs. compaction, for programs
> that trap SIGSEGV for garbage collection purposes.
> 

I'm not 100% convinced we need to be concerned with races with
mprotect(PROT_NONE) and a parallel reference to that area from userspace. I
would consider it to be a buggy application if two threads were not
co-ordinating the protection of a region and referencing it.  I would also
expect garbage collectors to be managing smart pointers and using reference
counting to copy between heap generations (or similar mechanisms) instead
of trapping sigsegv.

Intel's architectural manual 3A covers what happens for delayed TLB
invalidations in section 4.10.4.4 (in the version I'm looking at at
least). The following two snippets are the most important

	Software developers should understand that, between the modification
	of a paging-structure entry and execution of the invalidation
	instruction recommended in Section 4.10.4.2, the processor may
	use translations based on either the old value or the new value
	of the paging- structure entry. The following items describe some
	of the potential consequences of delayed invalidation:

	o If a paging-structure entry is modified to change from 1 to 0 the P
	flag from 1 to 0, an access to a linear address whose translation is
	controlled by this entry may or may not cause a page-fault exception.

	o If a paging-structure entry is modified to change the R/W flag
	from 0 to 1, write accesses to linear addresses whose translation is
	controlled by this entry may or may not cause a page-fault exception.

After the PROT_NONE may happen until after the deferred TLB flush. In a
race with mprotect(PROT_NONE) it'll either complete the access or receive
SIGSEGV signal due to failed protections but this is pretty much
expected and unpredictable.

I do not think the present bit gets cleared on mprotect(PROT_NONE) due
to the relevant bits been

#define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
                         _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
#define PAGE_NONE   __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)

If the present bit remains then compaction should flush the TLB on the
call to ptep_clear_flush as pte_accessible check is based on the present
bit. So even though it is possible for a write to complete during a call
to mprotect(PROT_NONE), the same is not true for compaction.

> They could lose modifications done in-between when
> the pte was set to PROT_NONE, and the actual TLB
> flush, if compaction moves the page around in-between
> those two events.
> 
> I don't know if this is a case we need to worry about
> at all, but I think the same fix would apply to that
> code path, so I guess we might as well make it...

I might be going "la la la la we're fine" and deluding myself but we
appear to be covered here and it would be a shame to add expense to a
path unnecessarily.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/