[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADnJP=tOJbFR2hq_P+PvR0dxsrr6HR6iE5BMybEx_3zWjV4+Ng@mail.gmail.com>
Date: Tue, 26 Feb 2019 10:46:18 +0100
From: Lars Persson <lists@...h.nu>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: Lars Persson <lars.persson@...s.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-mips@...r.kernel.org,
Lars Persson <larper@...s.com>
Subject: Re: [PATCH] mm: migrate: add missing flush_dcache_page for non-mapped
page migrate
On Tue, Feb 26, 2019 at 10:23 AM Anshuman Khandual
<anshuman.khandual@....com> wrote:
> On 02/19/2019 06:02 PM, Lars Persson wrote:
> > Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
> > and SIGSEGV that could not be traced back to a userspace code
> > bug. They had all the magic signs of an I/D cache coherency issue.
> >
> > Now recently we noticed that the /proc/sys/vm/compact_memory interface
> > was quite efficient at provoking this class of userspace crashes.
> >
> > Studying the code in mm/migrate.c there is a distinction made between
> > migrating a page that is mapped at the instant of migration and one
> > that is not mapped. Our problem turned out to be the non-mapped pages.
> >
> > For the non-mapped page the code performs a copy of the page content
> > and all relevant meta-data of the page without doing the required
> > D-cache maintenance. This leaves dirty data in the D-cache of the CPU
> > and on the 1004K cores this data is not visible to the I-cache. A
> > subsequent page-fault that triggers a mapping of the page will happily
> > serve the process with potentially stale code.
>
> Just curious. Is not the code path which tries to map this page should
> do the invalidation just before setting it up in the page table via
> set_pte_at() or other similar variants ? How it maps without doing the
> necessary flush.
In fact this is what happens when the flush_dcache_page API was used
correctly, but it is an arch implementation detail. All kernel code
that writes to a page cage page must also call flush_dcache_page
before the page becomes eligible for mapping. The arch code has the
option to postpone the actual flush until set_pte_at maps the page.
Powered by blists - more mailing lists