linux-kernel - Re: [RFC PATCH 1/3] mm, numa: rework do_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <32bec0c9-60e2-0362-9446-feb4de1b119c@linux.vnet.ibm.com>
Date:   Wed, 3 Jan 2018 15:06:49 +0530
From:   Anshuman Khandual <khandual@...ux.vnet.ibm.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc:     linux-mm@...ck.org, Zi Yan <zi.yan@...rutgers.edu>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Reale <ar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move

On 01/03/2018 02:28 PM, Michal Hocko wrote:
> On Wed 03-01-18 14:12:17, Anshuman Khandual wrote:
>> On 12/08/2017 09:45 PM, Michal Hocko wrote:
> [...]

[...]

>>
>> This reuses the existing page allocation helper from migrate_pages() system
>> call. But all these allocator helper names for migrate_pages() function are
>> really confusing. Even in this case alloc_new_node_page and the original
>> new_node_page() which is still getting used in do_migrate_range() sounds
>> similar even if their implementation is quite different. IMHO either all of
>> them should be moved to the header file with proper differentiating names
>> or let them be there in their respective files with these generic names and
>> clean them up later.
> 
> I believe I've tried that but I couldn't make them into a single header
> file easily because of header file dependencies. I agree that their
> names are quite confusing. Feel free to send a patch to clean this up.

Sure. Will try once this one gets into mmotm.

[...]

>>
>>
>> Just a nit. new_page_node() and store_status() seems different. Then why
>> the git diff looks so clumsy.
> 
> Kirill was suggesting to use --patience to general the diff which leads
> to a slightly better output. It has been posted as a separate email [1].
> Maybe you will find that one easier to review.
> 
> [1] http://lkml.kernel.org/r/20171213143948.GM25185@dhcp22.suse.cz

Yeah it does look better.

[...]

>>> -		return thp;
>>> -	} else
>>> -		return __alloc_pages_node(pm->node,
>>> -				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
>>> +	err = migrate_pages(pagelist, alloc_new_node_page, NULL, node,
>>> +			MIGRATE_SYNC, MR_SYSCALL);
>>> +	if (err)
>>> +		putback_movable_pages(pagelist);
>>> +	return err;
>>>  }
>>
>> Even this one. IIUC, do_move_pages_to_node() migrate a chunk of pages
>> at a time which belong to the same target node. Perhaps the name should
>> suggest so. All these helper page migration helper functions sound so
>> similar.
> 
> What do you suggest? I find do_move_pages_to_node quite explicit on its
> purpose.

Sure. Not a big deal.

[...]

>>>  {
>>> +	struct vm_area_struct *vma;
>>> +	struct page *page;
>>> +	unsigned int follflags;
>>>  	int err;
>>> -	struct page_to_node *pp;
>>> -	LIST_HEAD(pagelist);
>>>  
>>>  	down_read(&mm->mmap_sem);
>>
>> Holding mmap_sem for individual pages makes sense. Current
>> implementation is holding it for an entire batch.
> 
> I didn't bother to optimize this path to be honest. It is true that lock
> batching can lead to improvements but that would complicate the code
> (how many patches to batch?) so I've left that for later if somebody
> actually sees any problem.
> 
>>> +	err = -EFAULT;
>>> +	vma = find_vma(mm, addr);
>>> +	if (!vma || addr < vma->vm_start || !vma_migratable(vma))
>>
>> While here, should not we add 'addr > vma->vm_end' into this condition ?
> 
> No. See what find_vma does.
> 

Right.

> [...]
> 
> Please cut out the quoted reply to minimum

Sure will do.

> 
>>> @@ -1593,79 +1556,80 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>>  			 const int __user *nodes,
>>>  			 int __user *status, int flags)
>>>  {
>>> -	struct page_to_node *pm;
>>> -	unsigned long chunk_nr_pages;
>>> -	unsigned long chunk_start;
>>> -	int err;
>>> -
>>> -	err = -ENOMEM;
>>> -	pm = (struct page_to_node *)__get_free_page(GFP_KERNEL);
>>> -	if (!pm)
>>> -		goto out;
>>> +	int chunk_node = NUMA_NO_NODE;
>>> +	LIST_HEAD(pagelist);
>>> +	int chunk_start, i;
>>> +	int err = 0, err1;
>>
>> err init might not be required, its getting assigned to -EFAULT right away.
> 
> No, nr_pages might be 0 AFAICS.

Right but there is another err = 0 after the for loop.

> 
> [...]
>>> +		if (chunk_node == NUMA_NO_NODE) {
>>> +			chunk_node = node;
>>> +			chunk_start = i;
>>> +		} else if (node != chunk_node) {
>>> +			err = do_move_pages_to_node(mm, &pagelist, chunk_node);
>>> +			if (err)
>>> +				goto out;
>>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
>>> +			if (err)
>>> +				goto out;
>>> +			chunk_start = i;
>>> +			chunk_node = node;
>>>  		}

[...]

>>> +		err = do_move_pages_to_node(mm, &pagelist, chunk_node);
>>> +		if (err)
>>> +			goto out;
>>> +		if (i > chunk_start) {
>>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
>>> +			if (err)
>>> +				goto out;
>>> +		}
>>> +		chunk_node = NUMA_NO_NODE;
>>
>> This block of code is bit confusing.
> 
> I believe this is easier to grasp when looking at the resulting code.
>>
>> 1) Why attempt to migrate when just one page could not be isolated ?
>> 2) 'i' is always greater than chunk_start except the starting page
>> 3) Why reset chunk_node as NUMA_NO_NODE ?
> 
> This is all about flushing the pending state on an error and
> distinguising a fresh batch.

Okay. Will test it out on a multi node system once I get hold of one.