[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171206141213.GD12234@kernel.org>
Date: Wed, 6 Dec 2017 11:12:13 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Namhyung Kim <namhyung@...nel.org>,
Fengguang Wu <fengguang.wu@...el.com>,
linux-kernel@...r.kernel.org, Wang Nan <wangnan0@...wei.com>,
Ingo Molnar <mingo@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Will Deacon <will.deacon@....com>, lkp@...org,
Dmitry Vyukov <dvyukov@...gle.com>, kasan-dev@...glegroups.com,
kernel-team@....com
Subject: Re: BUG: KASAN: slab-out-of-bounds in perf_callchain_user+0x494/0x530
Em Wed, Dec 06, 2017 at 02:47:06PM +0100, Peter Zijlstra escreveu:
> On Tue, Dec 05, 2017 at 11:47:18PM +0900, Namhyung Kim wrote:
> > Sure, I mean the following code:
> >
> > mutex_lock(&callchain_mutex);
> >
> > count = atomic_inc_return(&nr_callchain_events);
> > if (WARN_ON_ONCE(count < 1)) {
> > err = -EINVAL;
> > goto exit;
> > }
> >
> > if (count > 1) {
> > /* If the allocation failed, give up */
> > if (!callchain_cpus_entries)
> > err = -ENOMEM;
> >
> > goto exit;
> > }
> >
> > err = alloc_callchain_buffers();
> > exit:
> > if (err)
> > atomic_dec(&nr_callchain_events);
> >
> > mutex_unlock(&callchain_mutex);
> >
> >
> > The callchain_cpus_entries is allocated in alloc_callchain_buffers()
> > only when the count is 1. But if it failed to allocate, it decrease
> > the count so next event would try to allocate it again. Thus it seems
> > not possible to see the callchain_cpus_entries being NULL in the
> > 'if (count > 1)' block. If you want to make next event give up, it'd
> > need to take an additional count IMHO.
>
> There's also a race against put_callchain_buffers() there, consider:
>
>
> get_callchain_buffers() put_callchain_buffers()
> mutex_lock();
> inc()
> dec_and_test() // false
>
> dec() // 0
>
>
> And the buffers leak.
Yeah, this code is complicated, and there are several csets to consider,
by Frédéric that may help to understando why the code ended up like
that, I started from git blame going first to
9251f904f95175b4a1d8cbc0449e748f9edd7629, where the test seemed to make
sense, to then go back, but still reading this...
commit fc3b86d673e41ac66b4ba5b75a90c2fcafb90089
Author: Frederic Weisbecker <fweisbec@...il.com>
Date: Fri Aug 2 18:29:54 2013 +0200
perf: Roll back callchain buffer refcount under the callchain mutex
commit 90983b16078ab0fdc58f0dab3e8e3da79c9579a2
Author: Frederic Weisbecker <fweisbec@...il.com>
Date: Tue Jul 23 02:31:00 2013 +0200
perf: Sanitize get_callchain_buffer()
commit fd45c15f13e754f3c106427e857310f3e0813951
Author: Namhyung Kim <namhyung.kim@....com>
Date: Fri Jan 20 10:12:45 2012 +0900
perf: Don't call release_callchain_buffers() if allocation fails
commit 9251f904f95175b4a1d8cbc0449e748f9edd7629
Author: Borislav Petkov <borislav.petkov@....com>
Date: Sun Oct 16 17:15:04 2011 +0200
perf: Carve out callchain functionality
Powered by blists - more mailing lists