linux-kernel - [PATCH v2] tracepoint: Fix out of sync data passing by static caller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201001212757.339a5520@oasis.local.home>
Date:   Thu, 1 Oct 2020 21:27:57 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>
Subject: [PATCH v2] tracepoint: Fix out of sync data passing by static
 caller

From: Steven Rostedt (VMware) <rostedt@...dmis.org>

Naresh reported a bug discovered in linux-next that I can reliably
trigger myself. It appears to be a side effect of the static calls. It
happens when going from more than one tracepoint callback to a single
one, and removing the first callback on the list. The list of
tracepoint callbacks holds data and a function to call with the
parameters of that tracepoint and a handler to the associated data.

 old_list:
	0: func = foo; data = NULL;
	1: func = bar; data = &bar_struct;

 new_list:
	0: func = bar; data = &bar_struct;


	CPU 0				CPU 1
	-----				-----
   tp_funcs = old_list;
   tp_static_caller = tp_interator

   __DO_TRACE()
 
    data = tp_funcs[0].data = NULL;

				   tp_funcs = new_list;
				   tracepoint_update_call()
				      tp_static_caller = tp_funcs[0] = bar;
    tp_static_caller(data)
       bar(data)
         x = data->item = NULL->item

       BOOM!

[ Update: The current code is actually worse, which is probably why it
  is so easy to trigger. It does the tp_funcs = new_list *after* the
  static_caller update! ]

Funny, I was able to reliably trigger this bug, and always on the
sched_switch tracepoint. Which does make sense, because the
sched_switch tracepoint is a utility tracepoint that is attached to
collect information about tasks when tracing is enabled (like mapping
pids to comms). And most of these utility helpers do not have a data
item attached. But the trace events that attach to tracepoints do have
a data item that is used to find state and know what tracing buffer to
write to.

The race window is probably extended by any synchronization the text
poke may do, which would cause the sched switch to be triggered at
vulnerable times.

I've seen this:

 Testing event sched_migrate_task: OK
 Testing event sched_switch:
 BUG: kernel NULL pointer dereference, address: 0000000000000048
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 4 PID: 158 Comm: kworker/4:2 Not tainted 5.9.0-rc7-test-next-20201001+ #12
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Workqueue:  0x0 (events)
 RIP: 0010:trace_event_raw_event_sched_switch+0x1d/0x160
 Code: 75 c3 e9 4e ff ff ff e8 01 bd 9f 00 90 55 48 89 e5 41 57 49 89 cf 41 56 49 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 ec 40 <48> 8b 5f 48 65 48 8b 04 25 28 00 00 00 48
1 c0 f6 c7
 RSP: 0018:ffffa93680487dc8 EFLAGS: 00010082
 RAX: ffff98a053129bb0 RBX: ffff98a05ab2df98 RCX: ffff98a059a70000
 RDX: ffff98a052ce5180 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffa93680487e30 R08: 0000000000000000 R09: 0000000000000001
 R10: ffff98a052ce5180 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: ffff98a052ce5180 R15: ffff98a059a70000
 FS:  0000000000000000(0000) GS:ffff98a05ab00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000048 CR3: 000000003b612002 CR4: 00000000001706e0
 Call Trace:
  ? trace_event_raw_event_sched_move_numa+0x100/0x100
  __schedule+0x5dd/0xa40
  schedule+0x45/0xe0
  worker_thread+0xc6/0x3a0
  ? process_one_work+0x570/0x570
  kthread+0x128/0x170
  ? kthread_park+0x90/0x90
  ret_from_fork+0x22/0x30

And that's called directly by the static call to the sched_switch trace
event callback (not the iterator), and it triggers with the data
pointer passed as NULL.

To solve this, add a tracepoint_synchronize_unregister() between
changing tp_funcs and updating the static tracepoint, that does both a
synchronize_rcu() and synchronize_srcu(). This will ensure that when
the static call is updated to the single callback that it will be
receiving the data that it registered with.

Note, to avoid over calling the synchronization functions, it is only
needed when going from the iterator back to a single caller, and if
that single caller wasn't the first one on the list before the update.

Link: https://lore.kernel.org/linux-next/CA+G9fYvPXVRO0NV7yL=FxCmFEMYkCwdz7R=9W+_votpT824YJA@mail.gmail.com

Reported-by: Naresh Kamboju <naresh.kamboju@...aro.org>
Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@...dmis.org>
---

Changes since v1:

  - Noticed that tp_funcs wasn't assigned first before the static update.

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1b4be44d1d2b..3f659f855074 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -221,7 +221,7 @@ static void *func_remove(struct tracepoint_func **funcs,
 	return old;
 }
 
-static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs)
+static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs, bool sync)
 {
 	void *func = tp->iterator;
 
@@ -229,8 +229,17 @@ static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
 	if (!tp->static_call_key)
 		return;
 
-	if (!tp_funcs[1].func)
+	if (!tp_funcs[1].func) {
 		func = tp_funcs[0].func;
+		/*
+		 * If going from the iterator back to a single caller,
+		 * we need to synchronize with __DO_TRACE to make sure
+		 * that the data passed to the callback is the one that
+		 * belongs to that callback.
+		 */
+		if (sync)
+			tracepoint_synchronize_unregister();
+	}
 
 	__static_call_update(tp->static_call_key, tp->static_call_tramp, func);
 }
@@ -265,7 +274,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
 	 * include/linux/tracepoint.h using rcu_dereference_sched().
 	 */
 	rcu_assign_pointer(tp->funcs, tp_funcs);
-	tracepoint_update_call(tp, tp_funcs);
+	tracepoint_update_call(tp, tp_funcs, false);
 	static_key_enable(&tp->key);
 
 	release_probes(old);
@@ -297,11 +306,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 			tp->unregfunc();
 
 		static_key_disable(&tp->key);
+		rcu_assign_pointer(tp->funcs, tp_funcs);
 	} else {
-		tracepoint_update_call(tp, tp_funcs);
+		rcu_assign_pointer(tp->funcs, tp_funcs);
+		tracepoint_update_call(tp, tp_funcs,
+				       tp_funcs[0].func != old[0].func);
 	}
-
-	rcu_assign_pointer(tp->funcs, tp_funcs);
 	release_probes(old);
 	return 0;
 }