linux-kernel - Re: [PATCH 02/22] Do not sanity check order in the fast path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1240421415.10627.93.camel@nimitz>
Date:	Wed, 22 Apr 2009 10:30:15 -0700
From:	Dave Hansen <dave@...ux.vnet.ibm.com>
To:	Mel Gorman <mel@....ul.ie>
Cc:	Linux Memory Management List <linux-mm@...ck.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lin Ming <ming.m.lin@...el.com>,
	Zhang Yanmin <yanmin_zhang@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 02/22] Do not sanity check order in the fast path

On Wed, 2009-04-22 at 18:11 +0100, Mel Gorman wrote:
> On Wed, Apr 22, 2009 at 09:13:11AM -0700, Dave Hansen wrote:
> > On Wed, 2009-04-22 at 14:53 +0100, Mel Gorman wrote:
> > > No user of the allocator API should be passing in an order >= MAX_ORDER
> > > but we check for it on each and every allocation. Delete this check and
> > > make it a VM_BUG_ON check further down the call path.
> > 
> > Should we get the check re-added to some of the upper-level functions,
> > then?  Perhaps __get_free_pages() or things like alloc_pages_exact()? 
> 
> I don't think so, no. It just moves the source of the text bloat and
> for the few callers that are asking for something that will never
> succeed.

Well, it's a matter of figuring out when it can succeed.  Some of this
stuff, we can figure out at compile-time.  Others are a bit harder.

> > I'm selfishly thinking of what I did in profile_init().  Can I slab
> > alloc it?  Nope.  Page allocator?  Nope.  Oh, well, try vmalloc():
> > 
> >         prof_buffer = kzalloc(buffer_bytes, GFP_KERNEL);
> >         if (prof_buffer)
> >                 return 0;
> > 
> >         prof_buffer = alloc_pages_exact(buffer_bytes, GFP_KERNEL|__GFP_ZERO);
> >         if (prof_buffer)
> >                 return 0;
> > 
> >         prof_buffer = vmalloc(buffer_bytes);
> >         if (prof_buffer)
> >                 return 0;
> > 
> >         free_cpumask_var(prof_cpu_mask);
> >         return -ENOMEM;
> > 
> 
> Can this ever actually be asking for an order larger than MAX_ORDER
> though? If so, you're condemning it to always behave poorly.

Yeah.  It is based on text size.  Smaller kernels with trimmed configs
and no modules have no problem fitting under MAX_ORDER, as do kernels
with larger base page sizes.  

> > Same thing in __kmalloc_section_memmap():
> > 
> >         page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
> >         if (page)
> >                 goto got_map_page;
> > 
> >         ret = vmalloc(memmap_size);
> >         if (ret)
> >                 goto got_map_ptr;
> > 
> 
> If I'm reading that right, the order will never be a stupid order. It can fail
> for higher orders in which case it falls back to vmalloc() .  For example,
> to hit that limit, the section size for a 4K kernel, maximum usable order
> of 10, the section size would need to be 256MB (assuming struct page size
> of 64 bytes). I don't think it's ever that size and if so, it'll always be
> sub-optimal which is a poor choice to make.

I think the section size default used to be 512M on x86 because we
concentrate on removing whole DIMMs.  

> > I depend on the allocator to tell me when I've fed it too high of an
> > order.  If we really need this, perhaps we should do an audit and then
> > add a WARN_ON() for a few releases to catch the stragglers.
> 
> I consider it buggy to ask for something so large that you always end up
> with the worst option - vmalloc(). How about leaving it as a VM_BUG_ON
> to get as many reports as possible on who is depending on this odd
> behaviour?
> 
> If there are users with good reasons, then we could convert this to WARN_ON
> to fix up the callers. I suspect that the allocator can already cope with
> recieving a stupid order silently but slowly. It should go all the way to the
> bottom and just never find anything useful and return NULL.  zone_watermark_ok
> is the most dangerous looking part but even it should never get to MAX_ORDER
> because it should always find there are not enough free pages and return
> before it overruns.

Whatever we do, I'd agree that it's fine that this is a degenerate case
that gets handled very slowly and as far out of hot paths as possible.
Anybody who can fall back to a vmalloc is not doing these things very
often.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/