linux-kernel - Re: [PATCH] hugetlb: Fix pool resizing corner case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1191436392.19775.43.camel@localhost.localdomain>
Date:	Wed, 03 Oct 2007 13:33:12 -0500
From:	Adam Litke <agl@...ibm.com>
To:	Dave Hansen <haveblue@...ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hugetlb: Fix pool resizing corner case

On Wed, 2007-10-03 at 10:40 -0700, Dave Hansen wrote:
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 84c795e..7af3908 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
> >  	for (i = 0; i < MAX_NUMNODES; ++i) {
> >  		struct page *page, *next;
> >  		list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> > +			if (count >= nr_huge_pages)
> > +				return;
> >  			if (PageHighMem(page))
> >  				continue;
> >  			list_del(&page->lru);
> >  			update_and_free_page(page);
> >  			free_huge_pages--;
> >  			free_huge_pages_node[page_to_nid(page)]--;
> > -			if (count >= nr_huge_pages)
> > -				return;
> >  		}
> >  	}
> >  }
> 
> That's an excellent problem description.  I'm just a bit hazy on how the
> patch fixes it. :)
> 
> What is the actual error in this loop?  The fact that we can go trying
> to free pages when the count is actually OK?

The above hunk serves only to change the behavior of try_to_free_low()
so that rather than always freeing _at_least_ one huge page, it can
return without having freed any pages. 

> BTW, try_to_free_low(count) kinda sucks for a function name.  Is that
> count the number of pages we're trying to end up with, or the total
> number of low pages that we're trying to free?

I agree the name sucks, but this is a bugfix patch.

> Also, as I look at try_to_free_low(), why do we need to #ifdef it out in
> the case of !HIGHMEM?  If we have CONFIG_HIGHMEM=yes, we still might not
> have any _actual_ high memory.  So, they loop obviously doesn't *hurt*
> when there is no high memory.

Maybe, but not really in-scope of what this patch is trying to
accomplish.

> > @@ -251,7 +251,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
> >  		return nr_huge_pages;
> > 
> >  	spin_lock(&hugetlb_lock);
> > -	count = max(count, resv_huge_pages);
> > +	count = max(count, resv_huge_pages + nr_huge_pages - free_huge_pages);
> >  	try_to_free_low(count);
> >  	while (count < nr_huge_pages) {
> >  		struct page *page = dequeue_huge_page(NULL, 0);
> 
> The real problem with this line is that "count" is too ambiguous. :)

I agree, count is almost always a bad variable name :)

> We could rewrite the original max() line this way:
> 
> 	if (resv_huge_pages > nr_of_pages_to_end_up_with)
> 		nr_of_pages_to_end_up_with = resv_huge_pages;
> 	try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);
> 
> Which makes it more clear that you're setting the number of total pages
> to the number of reserved pages, which is obviously screwy.
> 
> OK, so this is actually saying: "count can never go below
> resv_huge_pages+nr_huge_pages"?

Not quite.  Count can never go below the number of reserved pages plus
pages allocated to MAP_PRIVATE mappings.  That number is computed by:
(resv + (total - free)).

> Could we change try_to_free_low() to free a distinct number of pages?
> 
> 	if (count > free_huge_pages)
> 		count = free_huge_pages;
> 	try_to_free_nr_huge_pages(count);
> 
> I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
> free_huge_pages" logic.  Could you elaborate a bit there on what the
> rules are?

The key is that we don't want to shrink the pool below the number of
pages we are committed to keeping around.  Before this patch, we only
accounted for the pages we plan to hand out (reserved huge pages) but
not the ones we've already handed out (total - free).  Does that make
sense?

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/