lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YF2ct/UZUBG1GcM3@dhcp22.suse.cz>
Date:   Fri, 26 Mar 2021 09:35:03 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added
 memory range

On Thu 25-03-21 23:06:50, Oscar Salvador wrote:
> On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote:
> > On Thu 25-03-21 17:36:22, Michal Hocko wrote:
> > > If all it takes is to make pfn_to_online_page work (and my
> > > previous attempt is incorrect because it should consult block rather
> > > than section pfn range)
> > 
> > This should work.
> 
> Sorry, but while this solves some of the issues with that approach, I really
> think that overcomplicates things and buys us not so much in return.
> To me it seems that we are just patching things to make it work that
> way.

I do agree that special casing vmemmap areas is something I do not
really like but we are in that schrödinger situation when this memory is
not onlineable unless it shares memory section with an onlineable
memory. There are two ways around that, either special case it on
pfn_to_online_page or mark the vmemmap section online even though it is
not really.

> To be honest, I dislike this, and I guess we can only agree to disagree
> here.

No problem there. I will not insist on my approach unless I can convince
you that it is a better solution. It seems I have failed and I can live
with that.

> I find the following much easier, cleaner, and less risky to encounter
> pitfalls in the future:
> 
> (!!!It is untested and incomplete, and I would be surprised if it even
> compiles, but it is enough as a PoC !!!)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 5ea2b3fbce02..799d14fc2f9b 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v)
>  	return blocking_notifier_call_chain(&memory_chain, val, v);
>  }
> 
> +static int memory_block_online(unsigned long start_pfn, unsigned long nr_pages,
> +			       unsigned long nr_vmemmap_pages, int online_type,
> +			       int nid)
> +{
> +	int ret;
> +	/*
> +	 * Despite vmemmap pages having a different lifecycle than the pages
> +	 * they describe, initialiating and accounting vmemmap pages at the
> +	 * online/offline stage eases things a lot.

This requires quite some explaining.

> +	 * We do it out of {online,offline}_pages, so those routines only have
> +	 * to deal with pages that are actual usable memory.
> +	 */
> +	if (nr_vmemmap_pages) {
> +		struct zone *z;
> +
> +		z = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
> +		move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL,
> +				       MIGRATE_UNMOVABLE);
> +		/*
> +		 * The below could go to a helper to make it less bulky here,
> +		 * so {online,offline}_pages could also use it.
> +		 */
> +		z->present_pages += nr_pages;
> +		pgdat_resize_lock(z->zone_pgdat, &flags);
> +		z->zone_pgdat->node_present_pages += nr_pages;
> +		pgdat_resize_unlock(z->zone_pgdat, &flags);
> +	}
> +
> +	ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages,
> +			   online_type);
> +
> +	/*
> +	 * In case online_pages() failed for some reason, we should cleanup vmemmap
> +	 * accounting as well.
> +	 */
> +	return ret;
> +}

Yes this is much better! Just a minor suggestion would be to push
memory_block all the way to memory_block_online (it oline a memory
block). I would also slightly prefer to provide 2 helpers that would make
it clear that this is to reserve/cleanup the vmemamp space (defined in
the memory_hotplug proper).

Thanks!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ