lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Apr 2009 14:32:20 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Pekka Enberg <penberg@...helsinki.fi>,
	Eduard - Gabriel Munteanu <eduard.munteanu@...ux360.ro>,
	linux-kernel@...r.kernel.org, mm-commits@...r.kernel.org,
	alexn@....su.se, akpm@...ux-foundation.org, alexn@...ia.com,
	apw@...dowen.org, cl@...ux-foundation.org, haveblue@...ibm.com,
	kamezawa.hiroyu@...fujitu.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Steven Rostedt <rostedt@...dmis.org>,
	Fr?d?ric Weisbecker <fweisbec@...il.com>
Subject: Re: + page-owner-tracking.patch added to -mm tree

On Wed, Apr 01, 2009 at 03:17:13PM +0200, Ingo Molnar wrote:
> 
> * Mel Gorman <mel@....ul.ie> wrote:
> 
> > On Wed, Apr 01, 2009 at 01:15:40PM +0200, Ingo Molnar wrote:
> > > > <VAST AMOUNTS OF SNIPPAGE>
> > > >
> > > > +static inline void __stack_trace(struct page *page, unsigned long *stack,
> > > > +			unsigned long bp)
> > > > +{
> > > > +	int i = 0;
> > > > +	unsigned long addr;
> > > > +	struct thread_info *tinfo = (struct thread_info *)
> > > > +		((unsigned long)stack & (~(THREAD_SIZE - 1)));
> > > > +
> > > > +	memset(page->trace, 0, sizeof(long) * 8);
> > > > +
> > > > +#ifdef CONFIG_FRAME_POINTER
> > > > +	if (bp) {
> > > > +		while (valid_stack_ptr(tinfo, (void *)bp)) {
> > > > +			addr = *(unsigned long *)(bp + sizeof(long));
> > > > +			page->trace[i] = addr;
> > > > +			if (++i >= 8)
> > > > +				break;
> > > > +			bp = *(unsigned long *)bp;
> > > > +		}
> > > > +		return;
> > > > +	}
> > > > +#endif /* CONFIG_FRAME_POINTER */
> > > > +	while (valid_stack_ptr(tinfo, stack)) {
> > > > +		addr = *stack++;
> > > > +		if (__kernel_text_address(addr)) {
> > > > +			page->trace[i] = addr;
> > > > +			if (++i >= 8)
> > > > +				break;
> > > > +		}
> > > > +	}
> > > > +}
> > > 
> > > Uhm, this is not acceptable and broken, we have generic stacktrace 
> > > saving facilities for this precise purpose. It has other problems 
> > > too.
> > > 
> > > The purpose of the patch seems genuinely useful and i support the 
> > > concept, but the current design is limiting (we could do much 
> > > better) and the implementation is awful. Please.
> > > 
> > > Has this patch been reviewed by or Cc:-ed to anyone versed in kernel 
> > > instrumentation details, before it was applied to -mm? Those folks 
> > > hang around the tracing tree usually so they are easy to find. :)
> > > 
> > 
> > This patch is ancient, predating most of the instrumentation stuff 
> > by years. It was dropped from -mm a few months ago because of 
> > changes in proc and this is a rebase because it came up as being 
> > potentially useful pinning down memory leaks when they occur.
> > 
> > I'm not sure when exactly it was introduced to -mm, but I see 
> > references going back as far as 2.6.12-rc1 so it's no surprise 
> > this is now extremly odd looking. However, there is no plan to 
> > merge this to mainline making the effort of redoing it from 
> > scratch a questionable expenditure of time.
> 
> Hm, why not merge the concept upstream?
> 

I suspect at the time the patch was put together, it was not merged upstream
because it severely bloated struct page and would be something that was never
enabled by default in distros. Anyone wanted it for debugging could easily
apply the patch. It's a decision that simply has never been revisited as
it's not used that often - it's just seriously useful when you do need it.

I'm guessing though, I wasn't active in kernel development at the time
and I haven't dug through the archives to see the history.

> I'm sure the kmemtrace maintainers (Eduardo and Pekka) would be 
> interested in this too: they are already tracking slab allocation 
> events in a very finegrained way, extending that scheme to the page 
> allocator seems genuinely useful to me.
> 

In light of kmemtrace's success, it does make sense to revisit if someone
has the cycles to spare. Again bear in mind that this owner patch was in
-mm long before kmemtrace came along so it was a good idea at the time whose
implementation has not quite stood the test of time.

> And that way the rather ugly bloating of struct page by this patch 
> would be solved too: it's not needed and there's no need to actually 
> save the callchain permanently, it's usually enough to _trace_ it - 
> user-space can then save it into a permanent map if it's interested. 
> 

It's picky but you lose details from boot-time allocations that way but
that's a relatively small detail. If it's leaks you really care about though,
then tracing is probably sufficient.

> So this feature could go into Linux distros, available by default 
> and integrated into the user-space side of kmemtrace.
> 
> ( You could trace the gfp-type like kmemtrace does already - that 
>   way user-space can do GFP_MOVABLE / GFP_RECLAIMABLE summary counts 
>   (and more). )
> 

I concede that there is overlap but GFP_MOVABLE and GFP_RECLAIMABLE tracking
would be useful I have to say. In the past, I modified this patch to track
that information so I could figure out what was going on in the system. It'd
be nice to do similar in the field.

> So hooking up this info into tracepoints and ftrace seems genuinely 
> useful to me. You could trace the whole system, or you could use 
> filter expressions to do things like:
> 
>    echo "comm == Xorg" > /debug/tracing/events/mm/page_alloc/filter
> 
> To only track the memory allocations of Xorg task(s) for example. 
> 
> Later on a tracepoint based sw event could be made out of it too, 
> hooked up to sw-perfcounters, to dynamically profile page usage by 
> owning function(s) and task(s).
> 
> Later on we could add more info to the tracepoints - for example the 
> d_path() of the vma that is attached to the page could be traced (if 
> it's not an anonymous vma and not a kernel-internal allocation).
> 
> etc.
> 

Ok, I fully concede that this would all be great to have and I've taken
note to investigate it at some time in the future but it won't be any
time soon I'm afraid. Maybe this thread will prod someone familiar with
tracing with some time to spare to take a look at what that provides
reimplement in a sensible manner as you suggest?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ