[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201207023429.GA8986@hori.linux.bs1.fc.nec.co.jp>
Date: Mon, 7 Dec 2020 02:34:30 +0000
From: HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@....com>
To: Oscar Salvador <osalvador@...e.de>
CC: Vlastimil Babka <vbabka@...e.cz>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"n-horiguchi@...jp.nec.com" <n-horiguchi@...jp.nec.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without
MF_COUNT_INCREASED
On Sat, Dec 05, 2020 at 04:34:23PM +0100, Oscar Salvador wrote:
> On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote:
> > OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
> > already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
> > madvise_inject_error: Let memory_failure() optionally take a page reference") no?
>
> What about the following?
> CCing Dan as well.
Hi Oscar, Vlastimil,
Thanks for mentioning this. I agree with that direction.
>
> From: Oscar Salvador <osalvador@...e.de>
> Date: Sat, 5 Dec 2020 16:14:40 +0100
> Subject: [PATCH] mm,memory_failure: Always pin the page in
> madvise_inject_error
>
> madvise_inject_error() uses get_user_pages_fast to get the page
> from the addr we specified.
> After [1], we drop such extra reference for memory_failure() path.
> That commit says that memory_failure wanted to keep the pin in order
> to take the page out of circulation.
>
> The truth is that we need to keep the page pinned, otherwise the
> page might be re-used after the put_page(), and we can end up messing
> with someone else's memory.
> E.g:
>
> CPU0
> process X CPU1
> madvise_inject_error
> get_user_pages
> put_page
> page gets reclaimed
> process Y allocates the page
> memory_failure
> // We mess with process Y memory
>
> madvise() is meant to operate on a self address space, so messing with
> pages that do not belong to us seems the wrong thing to do.
> To avoid that, let us keep the page pinned for memory_failure as well.
>
> Pages for DAX mappings will release this extra refcount in
> memory_failure_dev_pagemap.
>
> [1] ("23e7b5c2e271: mm, madvise_inject_error:
> Let memory_failure() optionally take a page reference")
>
> Signed-off-by: Oscar Salvador <osalvador@...e.de>
> Suggested-by: Vlastimil Babka <vbabka@...e.cz>
> Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
> ---
> mm/madvise.c | 9 +--------
> mm/memory-failure.c | 6 ++++++
> 2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c6b5524add58..19edddba196d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -907,14 +907,7 @@ static int madvise_inject_error(int behavior,
> } else {
> pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
> pfn, start);
> - /*
> - * Drop the page reference taken by get_user_pages_fast(). In
> - * the absence of MF_COUNT_INCREASED the memory_failure()
> - * routine is responsible for pinning the page to prevent it
> - * from being released back to the page allocator.
> - */
> - put_page(page);
> - ret = memory_failure(pfn, 0);
> + ret = memory_failure(pfn, MF_COUNT_INCREASED);
> }
>
> if (ret)
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 869ece2a1de2..ba861169c9ae 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> if (!cookie)
> goto out;
>
> + if (flags & MF_COUNT_INCREASED)
> + /*
> + * Drop the extra refcount in case we come from madvise().
> + */
> + put_page(page);
> +
Should this if-block come before dax_lock_page() block?
It seems that if dax_lock_page returns NULL, memory_failure_dev_pagemap()
returns without releasing the refcount.
memory_failure() on dev_pagemap doesn't use page refcount (unlike other
type of memory), so we can release it unconditionally.
Thanks,
Naoya Horiguchi
Powered by blists - more mailing lists