lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87381rg9zx.fsf@ashishki-desk.ger.corp.intel.com>
Date:	Tue, 16 Jun 2015 14:37:06 +0300
From:	Alexander Shishkin <alexander.shishkin@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Vince Weaver <vincent.weaver@...ne.edu>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Stephane Eranian <eranian@...il.com>
Subject: Re: perf: aux area related crash and warnings

Alexander Shishkin <alexander.shishkin@...ux.intel.com> writes:

> Peter Zijlstra <peterz@...radead.org> writes:
>
>> Alex, any clue?
>
> Let me look into it. Definitely haven't seen anything like that in my
> tests.
>
>> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>>> 
>>> > and while I was trying to cut and paste that, the locked haswell just took 
>>> > down the network switch so I can't get the rest until tomorrow.
>>> 
>>> here are the full dumps if anyone is interested
>>> 
>>> the warning is reproducible, the spinlock disaster is not.
>>> 
>>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>>> [36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
>>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>>> [36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>>> [36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>>> [36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>>> [36299.046338] Call Trace:
>>> [36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
>>> [36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
>>> [36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
>>> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>>> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>>> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>>> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>>> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>>> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>>> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>>> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
>>> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
>
> This one goes to free aux pages from nmi context, looks like aux buffer
> was unmapped while the event was running, so here it dropped the last
> reference.

Ok, here's what I propose for this one.

>From b8cd18bc440c318e8b30825bf654c815b42aa1e0 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Date: Tue, 16 Jun 2015 14:14:04 +0300
Subject: [PATCH] perf: Free AUX area pages in rcu callback

Currently, if the user unmaps AUX area while the corresponding event
is active, perf_aux_output_end() may be the last one to drop the aux
area refcount, and end up freeing the pages in NMI context or scheduler's
fast path. Same can happen in the error path of perf_aux_output_begin().

To avoid the bug, this patch moves actual freeing code to rb_free_rcu(),
which will know whether it is called for AUX area or the ring buffer
proper and act accordingly.

Signed-off-by: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Reported-by: Vince Weaver <vincent.weaver@...ne.edu>
---
 kernel/events/core.c        | 31 ++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |  1 +
 kernel/events/ring_buffer.c |  8 +-------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index eddf1ed415..5f1cc5976f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4381,7 +4381,36 @@ static void rb_free_rcu(struct rcu_head *rcu_head)
 	struct ring_buffer *rb;
 
 	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
-	rb_free(rb);
+
+	/*
+	 * are we called for AUX or the rb:
+	 * AUX always goes first, then if rb::refcount drops to zero,
+	 * free rb synchronously
+	 */
+	if (atomic_read(&rb->refcount)) {
+		__rb_free_aux(rb);
+
+		/* matches the increment in rb_free_aux() */
+		if (atomic_dec_and_test(&rb->refcount))
+			rb_free(rb);
+	} else {
+		rb_free(rb);
+	}
+}
+
+void rb_free_aux(struct ring_buffer *rb)
+{
+	/*
+	 * hold rb::refcount to make sure rb doesn't disappear
+	 * before aux pages are freed
+	 */
+	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
+		return;
+
+	if (atomic_dec_and_test(&rb->aux_refcount))
+		call_rcu(&rb->rcu_head, rb_free_rcu);
+	else
+		ring_buffer_put(rb);	/* matches the increment above */
 }
 
 struct ring_buffer *ring_buffer_get(struct perf_event *event)
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 9f6ce9ba4a..7f8242ed85 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -61,6 +61,7 @@ extern void perf_event_wakeup(struct perf_event *event);
 extern int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 			pgoff_t pgoff, int nr_pages, long watermark, int flags);
 extern void rb_free_aux(struct ring_buffer *rb);
+extern void __rb_free_aux(struct ring_buffer *rb);
 extern struct ring_buffer *ring_buffer_get(struct perf_event *event);
 extern void ring_buffer_put(struct ring_buffer *rb);
 
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 725c416085..343121e943 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -537,7 +537,7 @@ out:
 	return ret;
 }
 
-static void __rb_free_aux(struct ring_buffer *rb)
+void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
@@ -554,12 +554,6 @@ static void __rb_free_aux(struct ring_buffer *rb)
 	rb->aux_nr_pages = 0;
 }
 
-void rb_free_aux(struct ring_buffer *rb)
-{
-	if (atomic_dec_and_test(&rb->aux_refcount))
-		__rb_free_aux(rb);
-}
-
 #ifndef CONFIG_PERF_USE_VMALLOC
 
 /*
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ