[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <840d4669-ae3f-b7c4-6132-e20d1bf9e952@suse.cz>
Date: Fri, 4 Dec 2020 18:25:31 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Oscar Salvador <osalvador@...e.de>
Cc: akpm@...ux-foundation.org, n-horiguchi@...jp.nec.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Naoya Horiguchi <naoya.horiguchi@....com>,
Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without
MF_COUNT_INCREASED
On 12/1/20 12:35 PM, Oscar Salvador wrote:
> On Wed, Nov 25, 2020 at 07:20:33PM +0100, Vlastimil Babka wrote:
>> On 11/19/20 11:57 AM, Oscar Salvador wrote:
>> > From: Naoya Horiguchi <naoya.horiguchi@....com>
>> >
>> > The call to get_user_pages_fast is only to get the pointer to a struct
>> > page of a given address, pinning it is memory-poisoning handler's job,
>> > so drop the refcount grabbed by get_user_pages_fast().
>> >
>> > Note that the target page is still pinned after this put_page() because
>> > the current process should have refcount from mapping.
>>
>> Well, but can't it go away due to reclaim, migration or whatever?
>
> Yes, it can.
>
>> > @@ -900,20 +900,23 @@ static int madvise_inject_error(int behavior,
>> > */
>> > size = page_size(compound_head(page));
>> > + /*
>> > + * The get_user_pages_fast() is just to get the pfn of the
>> > + * given address, and the refcount has nothing to do with
>> > + * what we try to test, so it should be released immediately.
>> > + * This is racy but it's intended because the real hardware
>> > + * errors could happen at any moment and memory error handlers
>> > + * must properly handle the race.
>>
>> Sure they have to. We might just be unexpectedly messing with other process'
>> memory. Or does anything else prevent that?
>
> No, nothing does, and I have to confess that I managed to confuse myself here.
> If we release such page and that page ends up in buddy, nothing prevents someone
> else to get that page, and then we would be messing with other process memory.
>
> I guess the right thing to do is just to make sure we got that page and that
> that page remains pinned as long as the memory failure handling goes.
OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
madvise_inject_error: Let memory_failure() optionally take a page reference") no?
> I will remove those patches from the patchset and re-submit with only the
> refactoring and pcp-disabling.
>
> Thanks Vlastimil
>
Powered by blists - more mailing lists