lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Apr 2021 10:44:38 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...e.com>, Oscar Salvador <osalvador@...e.de>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 4/8] mm,memory_hotplug: Allocate memmap from the added
 memory range

On 21.04.21 10:39, Michal Hocko wrote:
> On Wed 21-04-21 10:15:46, Oscar Salvador wrote:
>> On Tue, Apr 20, 2021 at 12:56:03PM +0200, Michal Hocko wrote:
> [...]
>>> necessary. Using two different iteration styles is also hurting the code
>>> readability. I would go with the following
>>> 	for (pfn = start_pfn; pfn < end_pfn; ) {
>>> 		unsigned long order = min(MAX_ORDER - 1UL, __ffs(pfn));
>>>
>>> 		while (start + (1UL << order) > end_pfn)
>>>                          order--;
>>> 		(*online_page_callback)(pfn_to_page(pfn), pageblock_order);
>>> 		pfn += 1 << order;
>>> 	}
>>>
>>> which is what __free_pages_memory does already.
>>
>> this is kinda what I used to have in the early versions, but it was agreed
>> with David to split it in two loops to make it explicit.
>> I can go back to that if it is preferred.
> 
> Not that I would insist but I find it better to use common constructs
> when it doesn't hurt readability. The order evaluation can be even done
> in a trivial helper.
> 
>>>> +	if (memmap_on_memory) {
>>>> +		nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
>>>> +						      get_nr_vmemmap_pages_cb);
>>>> +		if (nr_vmemmap_pages) {
>>>> +			if (size != memory_block_size_bytes()) {
>>>> +				pr_warn("Refuse to remove %#llx - %#llx,"
>>>> +					"wrong granularity\n",
>>>> +					start, start + size);
>>>> +				return -EINVAL;
>>>> +			}
>>>> +
>>>> +			/*
>>>> +			 * Let remove_pmd_table->free_hugepage_table do the
>>>> +			 * right thing if we used vmem_altmap when hot-adding
>>>> +			 * the range.
>>>> +			 */
>>>> +			mhp_altmap.alloc = nr_vmemmap_pages;
>>>> +			altmap = &mhp_altmap;
>>>> +		}
>>>> +	}
>>>> +
>>>>   	/* remove memmap entry */
>>>>   	firmware_map_remove(start, start + size, "System RAM");
>>>
>>> I have to say I still dislike this and I would just wrap it inside out
>>> and do the operation from within walk_memory_blocks but I will not
>>> insist.
>>
>> I have to confess I forgot about the details of that dicussion, as we were
>> quite focused on decoupling vmemmap pages from {online,offline} interface.
>> Would you mind elaborating a bit more?
> 
> As I've said I will not insist and this can be done in the follow up.
> You are iterating over memory blocks just to refuse to do an operation
> which can be split to several memory blocks. See
> http://lkml.kernel.org/r/YFtPxH0CT5QZsnR1@dhcp22.suse.cz and follow
> walk_memory_blocks(start, size, NULL, remove_memory_block_cb)
> 

We'll have to be careful in general when removing memory in different 
granularity than it was added, especially calling arch_remove_memory() 
in smaller granularity than it was added via arch_add_memory(). We might 
fail to tear down the direct map, imagine having mapped a 1GiB page but 
decide to remove individual 128 MiB chunks -- that won't work and the 
direct map would currently remain.

So this should be handled separately in the future.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ