linux-kernel - Re: kmemleak not always catching stuff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170908174518.3i74qvdbruf3lynl@armageddon.cambridge.arm.com>
Date:   Fri, 8 Sep 2017 18:45:20 +0100
From:   Catalin Marinas <catalin.marinas@....com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        kasan-dev@...glegroups.com
Subject: Re: kmemleak not always catching stuff

On Tue, Sep 05, 2017 at 10:15:45AM -0400, Steven Rostedt wrote:
> On Mon, 4 Sep 2017 11:09:05 +0100
> Catalin Marinas <catalin.marinas@....com> wrote:
> > On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote:
> > > Now I was thinking that it may be due to the fact that the trampoline
> > > is allocated with module_alloc(), and that has some magic kasan goo in
> > > it. But when forcing the issue with adding the following code:
> > > 
> > > 	void **pblah;
> > > 	void *blah;
> > > 
> > > 	pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);	
> > > 	blah = module_alloc(PAGE_SIZE);
> > > 	*pblah = blah;
> > > 	printk("allocated blah %p\n", blah);
> > > 	kfree(pblah);
> > > 
> > > in a path that I could control, it would catch it only after doing it
> > > several times. I was never able to have kmemleak catch the actual bug
> > > from user space no matter how many times I triggered it.  
> > 
> > module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you
> > probably hit a false negative with the blah pointer lingering somewhere
> > on some stack.
> 
> Hmm, could this also be what causes the miss of catching the lingering
> ftrace trampoline?

Not sure (not without some additional support in kmemleak to help track
down the source of false negatives; I'm on a long flight to LA next
week, maybe I manage to hack something up ;)).

BTW, I had a quick look at the trace_selftest_ops() function (without
pretending I understand the ftrace code) and there is one case where
this function can exit without freeing dyn_ops. Is this intentional?

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index cb917cebae29..b17ec642793b 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -273,7 +273,7 @@ static int trace_selftest_ops(struct trace_array *tr, int cnt)
 		goto out_free;
 	if (cnt > 1) {
 		if (trace_selftest_test_global_cnt == 0)
-			goto out;
+			goto out_free;
 	}
 	if (trace_selftest_test_dyn_cnt == 0)
 		goto out_free;

> > >  [do the test]
> > > 
> > >  # echo scan > /sys/kernel/debug/kmemleak  
> > 
> > Some heuristics in kmemleak cause the first leak of an object not to be
> > reported (too many false positives). You'd need to do "echo scan" at
> > least twice after an allocation.
> 
> OK, is this documented somewhere?

Apparently not. I'll add some notes to the kmemleak documentation.

> > > Most of the times it found nothing. Even when I switched the above from
> > > module_alloc() to kmalloc().
> > > 
> > > Is this normal?  
> > 
> > In general, a leak would eventually appear after a few scans or in time
> > when some memory location is overridden.
> > 
> > Yet another heuristics in kmemleak is to treat pointers at some offset
> > inside an object as valid references (because of the container_of
> > tricks). However, the downside is that the bigger the object, the
> > greater chances of finding some random data that looks like a pointer.
> > We could change this logic to require precise pointers above a certain
> > size (e.g. PAGE_SIZE) where the use of container_of() is less likely.
> > 
> > Kmemleak doesn't have a way to inspect false negatives but if you are
> > interested in digging further, I could add a "find=0x..." command to
> > print all references to an object during scanning. I also need to find
> > some time to implement a "stopscan" command which uses stop_machine()
> > and skips the heuristics for reducing false positives.
> 
> Without the patch in the link above, there's a memory leak with the
> ftrace trampoline with the following commands:
> 
> Requires: CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER
> 
>  # cd /sys/kernel/debug/tracing
>  # mkdir instances/foo
>  # echo wakeup > instances/foo/current_tracer
>  # echo 0 > /proc/sys/kernel/ftrace_enabled
>  # echo nop > instances/foo/current_tracer
>  # rmdir instances/foo
> 
> What the above does, is creates a new (allocated/non-static) buffer in
> the instance directory. Then we enable the wakeup tracer which
> enables function tracing and also creates a dynamic ftrace trampoline
> for it. We disable function tracing for all tracers with the proc
> sysctl ftrace_enabled set to zero. The nop removes the wakeup tracer
> and unregisters its function tracing handler. This is where the leak
> happens. The unregister path sees that function tracing is disabled and
> exits out early, without releasing the trampoline.

Are the ftrace_ops allocated dynamically in this case (and freed when
unregistered)? Otherwise, you may have an ops->trampoline still around
that kmemleak finds.

-- 
Catalin