lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Jan 2021 12:22:57 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Dan Williams <dan.j.williams@...el.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section
 collisions

On 06.01.21 11:42, Michal Hocko wrote:
> On Wed 06-01-21 10:56:19, David Hildenbrand wrote:
> [...]
>> Note that this is not sufficient in the general case. I already
>> mentioned that we effectively override an already initialized memmap.
>>
>> ---
>>
>> [        SECTION             ]
>> Before:
>> [ ZONE_NORMAL ][    Hole     ]
>>
>> The hole has some node/zone (currently 0/0, discussions ongoing on how
>> to optimize that to e.g., ZONE_NORMAL in this example) and is
>> PG_reserved - looks like an ordinary memory hole.
>>
>> After memremap:
>> [ ZONE_NORMAL ][ ZONE_DEVICE ]
>>
>> The already initialized memmap was converted to ZONE_DEVICE. Your
>> slowpath will work.
>>
>> After memunmap (no poisioning):
>> [ ZONE_NORMAL ][ ZONE_DEVICE ]
>>
>> The slow path is no longer working. pfn_to_online_page() might return
>> something that is ZONE_DEVICE.
>>
>> After memunmap (poisioning):
>> [ ZONE_NORMAL ][ POISONED    ]
>>
>> The slow path is no longer working. pfn_to_online_page() might return
>> something that will BUG_ON via page_to_nid() etc.
>>
>> ---
>>
>> Reason is that pfn_to_online_page() does no care about sub-sections. And
>> for now, it didn't had to. If there was an online section, it either was
>>
>> a) Completely present. The whole memmap is initialized to sane values.
>> b) Partially present. The whole memmap is initialized to sane values.
>>
>> memremap/memunmap messes with case b)
> 
> I do not see we ever clear the newly added flag and my understanding is
> that the subsection removed would lead to get_dev_pagemap returning a
> NULL. Which would obviously need to be checked for pfn_to_online_page.
> Or do I miss anything and the above is not the case and we could still
> get false positives?

See my example above ("After memunmap").

We're still in the slow pathg. pfn_to_online_page() will return a struct
page as get_dev_pagemap() is now  NULL.

Yet page_zone(page) will either
- BUG_ON (memmap was poisoned)
- return ZONE_DEVICE zone (memmap not poisoned when memunmapping)

As I said, can be tackled by checking for pfn_section_valid() at least
on the slow path. Ideally also on the fast path.

> 
>> Well have to further tweak pfn_to_online_page(). You'll have to also
>> check pfn_section_valid() *at least* on the slow path. Less-hacky would
>> be checking it also in the "somehwat-faster" path - that would cover
>> silently overriding a memmap that's visible via pfn_to_online_page().
>> Might slow down things a bit.
>>
>>
>> Not completely opposed to this, but I would certainly still prefer just
>> avoiding this corner case completely instead of patching around it. Thanks!
> 
> Well, I would love to have no surprises either. So far there was not
> actual argument why the pmem reserved space cannot be fully initialized.

Yes, I'm still hoping Dan can clarify that.

> On the other hand making sure that pfn_to_online_page sounds like the
> right thing to do. And having an explicit check for zone device there in
> a slow path makes sense to me.

As I said, I'd favor to simplify and just get rid of the special case,
instead of coming up with increasingly complex ways to deal with it.
pfn_to_online_page() used to be simple, essentially checking a single
flag was sufficient in most setups.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ