lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <52A22FD6.8070105@broadcom.com>
Date:	Fri, 6 Dec 2013 12:13:10 -0800
From:	Sherman Yin <syin@...adcom.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: timer: catching new timers in migrated tvec_base

Hi all,

I'm looking for feedback regarding explicitly invalidating the 
list_heads in tvec_base.tv1 .. tv5 after a migration, for example by 
doing a list_del(head) at the end of migrate_timer_list().  This is not 
meant to fix any bugs, but to help catch attempts to add timers to the 
migrated tvec_base.  Bugs like this could be difficult to catch without 
this change (example below).

If I understand correctly, after migration of timers from one cpu (the 
one being shut down, say cpu1) to another (say cpu0), no one should be 
adding any timers to the tvec_base of cpu1.  When cpu1 later comes back 
up, new timers can be added only after init_timers_cpu(1).

If a buggy driver tries to add a timer to cpu1 after it has shut down, 
eg with add_timer_on(1), there is no indication that this is an invalid 
operation (no BUG_ON, no return code).  The following is what happened 
in my case:

1. cpu1 shuts down, all timers migrated to cpu0
2. driverA adds a timerX to cpu1's tvec_base.tv1[n] (bug)
3. cpu1 returns, all lists in tvec_base.tv1 re-initialized without 
checking if the lists are indeed empty.
4. timerX is now "orphaned"; its entry->next and entry->prev points to 
cpu1's tvec_base.tv1[n], but tvec_base.tv1[n] is an empty list.
5. driverB adds a timerY to cpu1's tvec_base.tv[n] and becomes the only 
entry in that list.
6. driverA removes timerX, which eventually calls __list_del on 
timerX.entry.
7. Since timerX.entry points to cpu1's tvec_base.tv1[n], 
tvec_base.tv1[n] will now become an empty list, and timerY is now 
"orphaned".
8. timerY never fires

This bug was difficult to catch because it will only happen if timerX 
and timerY are both added to the same slot tv1[n].  The likelihood may 
be low so the bug might be very difficult to reproduce.

By invalidating the list_heads in tv[1..5] right after migration, we can 
catch this bug at step 2.  A patch might look like:
------------------------
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1585,6 +1585,14 @@ static void migrate_timer_list(struct tvec_base 
*new_base, struct list_head *hea
  		timer_set_base(timer, new_base);
  		internal_add_timer(new_base, timer);
  	}
+
+	/*
+	 * After existing timers are migrated, new timers should never be added
+	 * to this list until after the next init_timers_cpu().  Deleting
+	 * list_head here will not affect timer behavior but will help catch
+	 * anyone trying to add to this list when they shouldn't.
+	 */
+	list_del(head);
  }

  static void __cpuinit migrate_timers(int cpu)
------------------------

Is there anything I missed or failed to consider here?

Thanks,
Sherman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ