linux-kernel - Re: [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <5833D922.1070900@linux.vnet.ibm.com>
Date:   Tue, 22 Nov 2016 11:05:30 +0530
From:   Anshuman Khandual <khandual@...ux.vnet.ibm.com>
To:     Jerome Glisse <jglisse@...hat.com>
Cc:     akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, John Hubbard <jhubbard@...dia.com>,
        Russell King <linux@...linux.org.uk>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>,
        Rich Felker <dalias@...c.org>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool
 to set of flags

On 11/21/2016 05:57 PM, Jerome Glisse wrote:
> On Mon, Nov 21, 2016 at 12:11:50PM +0530, Anshuman Khandual wrote:
>> On 11/18/2016 11:48 PM, Jérôme Glisse wrote:
> 
> [...]
> 
>>> @@ -956,7 +963,7 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
>>>  	remove_pagetable(start, end, true);
>>>  }
>>>  
>>> -int __ref arch_remove_memory(u64 start, u64 size)
>>> +int __ref arch_remove_memory(u64 start, u64 size, int flags)
>>>  {
>>>  	unsigned long start_pfn = start >> PAGE_SHIFT;
>>>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>>> @@ -965,6 +972,12 @@ int __ref arch_remove_memory(u64 start, u64 size)
>>>  	struct zone *zone;
>>>  	int ret;
>>>  
>>> +	/* Need to add support for device and unaddressable memory if needed */
>>> +	if (flags & MEMORY_UNADDRESSABLE) {
>>> +		BUG();
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>  	/* With altmap the first mapped page is offset from @start */
>>>  	altmap = to_vmem_altmap((unsigned long) page);
>>>  	if (altmap)
>>
>> So with this patch none of the architectures support un-addressable
>> memory but then support will be added through later patches ?
>> zone_for_memory function's flag now takes MEMORY_DEVICE parameter.
>> Then we need to change all the previous ZONE_DEVICE changes which
>> ever took "for_device" to accommodate this new flag ? just curious.
> 
> Yes correct.
> 
> 
>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index 01033fa..ba9b12e 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -103,7 +103,7 @@ extern bool memhp_auto_online;
>>>  
>>>  #ifdef CONFIG_MEMORY_HOTREMOVE
>>>  extern bool is_pageblock_removable_nolock(struct page *page);
>>> -extern int arch_remove_memory(u64 start, u64 size);
>>> +extern int arch_remove_memory(u64 start, u64 size, int flags);
>>>  extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
>>>  	unsigned long nr_pages);
>>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>> @@ -275,7 +275,20 @@ extern int add_memory(int nid, u64 start, u64 size);
>>>  extern int add_memory_resource(int nid, struct resource *resource, bool online);
>>>  extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default,
>>>  		bool for_device);
>>> -extern int arch_add_memory(int nid, u64 start, u64 size, bool for_device);
>>> +
>>> +/*
>>> + * For device memory we want more informations than just knowing it is device
>>> + * memory. We want to know if we can migrate it (ie it is not storage memory
>>> + * use by DAX). Is it addressable by the CPU ? Some device memory like GPU
>>> + * memory can not be access by CPU but we still want struct page so that we
>>> + * can use it like regular memory.
>>
>> Some typos here. Needs to be cleaned up as well. But please have a
>> look at comment below over the classification itself.
>>
>>> + */
>>> +#define MEMORY_FLAGS_NONE 0
>>> +#define MEMORY_DEVICE (1 << 0)
>>> +#define MEMORY_MOVABLE (1 << 1)
>>> +#define MEMORY_UNADDRESSABLE (1 << 2)
>>
>> It should be DEVICE_MEMORY_* instead of MEMORY_* as we are trying to
>> classify device memory (though they are represented with struct page)
>> not regular system ram memory. This should attempt to classify device
>> memory which is backed by struct pages. arch_add_memory/arch_remove
>> _memory does not come into play if it's traditional device memory
>> which is just PFN and does not have struct page associated with it.
> 
> Good idea i will change that.
> 
> 
>> Broadly they are either CPU accessible or in-accessible. Storage
>> memory like persistent memory represented though ZONE_DEVICE fall
>> under the accessible (coherent) category. IIUC right now they are
>> not movable because page->pgmap replaces page->lru in struct page
>> hence its inability to be on standard LRU lists as one of the
>> reasons. As there was a need to have struct page to exploit more
>> core VM features on these memory going forward it will have to be
>> migratable one way or the other to accommodate features like
>> compaction, HW poison etc in these storage memory. Hence my point
>> here is lets not classify any of these memories as non-movable.
>> Just addressable or not should be the only classification.
> 
> Being on the lru or not is not and issue in respect to migration. Being

Right, provided we we create separate migration interfaces for these non
LRU pages (preferably through HMM migration API layer). But where it
stands today, for NUMA migrate_pages() interface device non LRU memory
is a problem and we cannot use it for migration. Hence I brought up the
non LRU issue here.

> on the lru was use as an indication that the page is manage through the
> standard mm code and thus that many assumptions hold which in turn do
> allow migration. But if one use device memory following all rules of
> regular memory then migration can be done to no matter if page is on
> lru or not.

Right.

> 
> I still think that the MOVABLE is an important distinction as i am pretty
> sure that the persistent folks do not want to see their page migrated in
> anyway. I might rename it to DEVICE_MEMORY_ALLOW_MIGRATION.

We should not classify memory based on whether there is a *requirement*
for migration or not at this point of time, the classification should
be done if its inherently migratable or not. I dont see any reason why
persistent memory cannot be migrated. I am not very familiar with DAX
file system and its use of persistent memory but I would guess that
their requirement for compaction and error handling happens way above
in file system layers, hence they never needed these support at struct
page level. I am just guessing.

Added Dan J Williams in this thread list, he might be able to give us
some more details regarding persistent memory migration requirements
and it's current state.