linux-kernel - Re: [PATCH 3/7] mm/rmap: map_pte() was not handling private ZONE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180831161935.GB4111@redhat.com>
Date:   Fri, 31 Aug 2018 12:19:35 -0400
From:   Jerome Glisse <jglisse@...hat.com>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org,
        Ralph Campbell <rcampbell@...dia.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        stable@...r.kernel.org
Subject: Re: [PATCH 3/7] mm/rmap: map_pte() was not handling private
 ZONE_DEVICE page properly v2

On Fri, Aug 31, 2018 at 07:27:24PM +1000, Balbir Singh wrote:
> On Thu, Aug 30, 2018 at 10:41:56AM -0400, jglisse@...hat.com wrote:
> > From: Ralph Campbell <rcampbell@...dia.com>
> > 
> > Private ZONE_DEVICE pages use a special pte entry and thus are not
> > present. Properly handle this case in map_pte(), it is already handled
> > in check_pte(), the map_pte() part was lost in some rebase most probably.
> > 
> > Without this patch the slow migration path can not migrate back private
> > ZONE_DEVICE memory to regular memory. This was found after stress
> > testing migration back to system memory. This ultimatly can lead the
> > CPU to an infinite page fault loop on the special swap entry.
> > 
> > Changes since v1:
> >     - properly lock pte directory in map_pte()
> > 
> > Signed-off-by: Ralph Campbell <rcampbell@...dia.com>
> > Signed-off-by: Jérôme Glisse <jglisse@...hat.com>
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Cc: Balbir Singh <bsingharora@...il.com>
> > Cc: stable@...r.kernel.org
> > ---
> >  mm/page_vma_mapped.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index ae3c2a35d61b..bd67e23dce33 100644
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -21,7 +21,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
> >  			if (!is_swap_pte(*pvmw->pte))
> >  				return false;
> >  		} else {
> > -			if (!pte_present(*pvmw->pte))
> > +			if (is_swap_pte(*pvmw->pte)) {
> > +				swp_entry_t entry;
> > +
> > +				/* Handle un-addressable ZONE_DEVICE memory */
> > +				entry = pte_to_swp_entry(*pvmw->pte);
> > +				if (!is_device_private_entry(entry))
> > +					return false;
> 
> OK, so we skip this pte from unmap since it's already unmapped? This prevents
> try_to_unmap from unmapping it and it gets restored with MIGRATE_PFN_MIGRATE
> flag cleared?
> 
> Sounds like the right thing, if I understand it correctly

Well not exactly we do not skip it, we replace it with a migration
pte see try_to_unmap_one() which get call with TTU_MIGRATION flag
set (which do not translate in PVMW_MIGRATION being set on contrary).

>From migration point of view even if this is a swap pte, it is still
a valid mapping of the page and is counted as such for all intent and
purposes. The only thing we don't need is flushing CPU tlb or cache.

So this all happens when we are migrating something back to regular
memory either because of CPU fault or because the device driver want
to make room in its memory and decided to evict that page back to
regular memory.

Cheers,
Jérôme