netdev - Re: [PATCH mmotm] mm: alloc_large_system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0905011202530.8513@blonde.anvils>
Date:	Fri, 1 May 2009 12:30:03 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	Mel Gorman <mel@....ul.ie>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Andi Kleen <andi@...stfloor.org>,
	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH mmotm] mm: alloc_large_system_hash check order

On Thu, 30 Apr 2009, Mel Gorman wrote:
> On Wed, Apr 29, 2009 at 10:09:48PM +0100, Hugh Dickins wrote:
> > On an x86_64 with 4GB ram, tcp_init()'s call to alloc_large_system_hash(),
> > to allocate tcp_hashinfo.ehash, is now triggering an mmotm WARN_ON_ONCE on
> > order >= MAX_ORDER - it's hoping for order 11.  alloc_large_system_hash()
> > had better make its own check on the order.
> > 
> > Signed-off-by: Hugh Dickins <hugh@...itas.com>
> 
> Looks good
> 
> Reviewed-by: Mel Gorman <mel@....ul.ie>

Thanks.

> 
> As I was looking there, it seemed that alloc_large_system_hash() should be
> using alloc_pages_exact() instead of having its own "give back the spare
> pages at the end of the buffer" logic. If alloc_pages_exact() was used, then
> the check for an order >= MAX_ORDER can be pushed down to alloc_pages_exact()
> where it may catch other unwary callers.
> 
> How about adding the following patch on top of yours?

Well observed, yes indeed.  In fact, it even looks as if, shock horror,
alloc_pages_exact() was _plagiarized_ from alloc_large_system_hash().
Blessed be the GPL, I'm sure we can skip the lengthy lawsuits!

> 
> ==== CUT HERE ====
> Use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic
> 
> alloc_large_system_hash() has logic for freeing unused pages at the end
> of an power-of-two-pages-aligned buffer that is a duplicate of what is in
> alloc_pages_exact(). This patch converts alloc_large_system_hash() to use
> alloc_pages_exact().
> 
> Signed-off-by: Mel Gorman <mel@....ul.ie>
> --- 
>  mm/page_alloc.c |   27 +++++----------------------
>  1 file changed, 5 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1b3da0f..c94b140 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1942,6 +1942,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned int order = get_order(size);
>  	unsigned long addr;
>  
> +	if (order >= MAX_ORDER)
> +		return NULL;
> +

I suppose there could be an argument about whether we do or do not
want to skip the WARN_ON when it's in alloc_pages_exact().

I have no opinion on that; but DaveM's reply on large_system_hash
does make it clear that we're not interested in the warning there.

>  	addr = __get_free_pages(gfp_mask, order);
>  	if (addr) {
>  		unsigned long alloc_end = addr + (PAGE_SIZE << order);
> @@ -4755,28 +4758,8 @@ void *__init alloc_large_system_hash(const char *tablename,
>  			table = alloc_bootmem_nopanic(size);
>  		else if (hashdist)
>  			table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
> -		else {
> -			unsigned long order = get_order(size);
> -
> -			if (order < MAX_ORDER)
> -				table = (void *)__get_free_pages(GFP_ATOMIC,
> -								order);
> -			/*
> -			 * If bucketsize is not a power-of-two, we may free
> -			 * some pages at the end of hash table.
> -			 */

That's actually a helpful comment, it's easy to think we're dealing
in powers of two here when we may not be.  Maybe retain it with your
alloc_pages_exact call?

> -			if (table) {
> -				unsigned long alloc_end = (unsigned long)table +
> -						(PAGE_SIZE << order);
> -				unsigned long used = (unsigned long)table +
> -						PAGE_ALIGN(size);
> -				split_page(virt_to_page(table), order);
> -				while (used < alloc_end) {
> -					free_page(used);
> -					used += PAGE_SIZE;
> -				}
> -			}
> -		}
> +		else
> +			table = alloc_pages_exact(PAGE_ALIGN(size), GFP_ATOMIC);

Do you actually need that PAGE_ALIGN on the size?

>  	} while (!table && size > PAGE_SIZE && --log2qty);
>  
>  	if (!table)

Andrew noticed another oddity: that if it goes the hashdist __vmalloc()
way, it won't be limited by MAX_ORDER.  Makes one wonder whether it
ought to fall back to __vmalloc() if the alloc_pages_exact() fails.
I think that's a change we could make _if_ the large_system_hash
users ever ask for it, but _not_ one we should make surreptitiously.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html