linux-kernel - Re: [bisected] system hang after boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Nov 2017 18:26:59 +0000
From:   Will Deacon <will.deacon@....com>
To:     Sebastian Ott <sebott@...ux.vnet.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [bisected] system hang after boot

Hi Sebastian,

Thanks for the report.

On Wed, Nov 22, 2017 at 06:46:01PM +0100, Sebastian Ott wrote:
> One of my test systems (s390) hangs after boot. One of the cpus is idle
> the other is in arch_spin_relax. Bisect pointed to this one:
> 
> a8a217c22116eff6c120d753c9934089fb229af0 is the first bad commit
> commit a8a217c22116eff6c120d753c9934089fb229af0
> Author: Will Deacon <will.deacon@....com>
> Date:   Tue Oct 3 19:25:27 2017 +0100
> 
>     locking/core: Remove {read,spin,write}_can_lock()
> 
>     Outside of the locking code itself, {read,spin,write}_can_lock() have no
>     users in tree. Apparmor (the last remaining user of write_can_lock()) got
>     moved over to lockdep by the previous patch.
> 
>     This patch removes the use of {read,spin,write}_can_lock() from the
>     BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
>     lock status, and subsequently removes the unused macros altogether. They
>     aren't guaranteed to work in a concurrent environment and can give
>     incorrect results in the case of qrwlock.
> 
>     Signed-off-by: Will Deacon <will.deacon@....com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>     Cc: Linus Torvalds <torvalds@...ux-foundation.org>
>     Cc: Peter Zijlstra <peterz@...radead.org>
>     Cc: Thomas Gleixner <tglx@...utronix.de>
>     Cc: paulmck@...ux.vnet.ibm.com
>     Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.com
>     Signed-off-by: Ingo Molnar <mingo@...nel.org>
> 
> 
> Bisect log and config attached. The same config but with lockdep enabled
> works ok.

I suspect this is related to CONFIG_GENERIC_LOCKBREAK and the use of the
lock->break_lock field when that is enabled. I've always been suspicious
about the lack of ordering guarantees there, and I think my patch makes
that worse.

The old code was:

#define BUILD_LOCK_OPS(op, locktype)					\
void __lockfunc __raw_##op##_lock(locktype##_t *lock)			\
{									\
	for (;;) {							\
		preempt_disable();					\
		if (likely(do_raw_##op##_trylock(lock)))		\
			break;						\
		preempt_enable();					\
									\
		if (!(lock)->break_lock)				\
			(lock)->break_lock = 1;				\
		while (!raw_##op##_can_lock(lock) && (lock)->break_lock)\
			arch_##op##_relax(&lock->raw_lock);		\
	}								\
	(lock)->break_lock = 0;						\
}									\

which is dodgy for a few reasons:

  1. There is a write-write race on break_lock, so if >1 CPU tries to take
     lock concurrently then break_lock can end up as either 0 or 1.

  2. break_lock is just a plain variable, so it can happily sit in a
     local store buffer (for a finite time), meaning that waiters will
     almost certainly see it as 1 in the while loop and the lock holder
     will almost certainly see it as 0

Now, the raw_##op##_can_lock macro is also problematic, which is why I
removed it: it suffers from (2) above, but not from (1) and will eventually
cause the waiters to break redo the trylock() once the unlockers store
buffer has drained.

The new code removes the raw_##op##_can_lock, so the waiters get stuck.

Now, I can't see what the break_lock is doing here other than causing
problems. Is there a good reason for it, or can you just try removing it
altogether? Patch below.

Will

--->8

diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index cc0072e93e36..857a72ceb794 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,9 +10,6 @@
  */
 typedef struct {
 	arch_rwlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
-	unsigned int break_lock;
-#endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned int magic, owner_cpu;
 	void *owner;
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index a39186194cd6..3bf273538840 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -107,16 +107,11 @@ do {								\
 
 #define raw_spin_is_locked(lock)	arch_spin_is_locked(&(lock)->raw_lock)
 
-#ifdef CONFIG_GENERIC_LOCKBREAK
-#define raw_spin_is_contended(lock) ((lock)->break_lock)
-#else
-
 #ifdef arch_spin_is_contended
 #define raw_spin_is_contended(lock)	arch_spin_is_contended(&(lock)->raw_lock)
 #else
 #define raw_spin_is_contended(lock)	(((void)(lock), 0))
 #endif /*arch_spin_is_contended*/
-#endif
 
 /*
  * This barrier must provide two things:
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb13a5d..24b4e6f2c1a2 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -19,9 +19,6 @@
 
 typedef struct raw_spinlock {
 	arch_spinlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
-	unsigned int break_lock;
-#endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned int magic, owner_cpu;
 	void *owner;
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 1fd1a7543cdd..7d0a58399c1d 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -65,13 +65,8 @@ void __lockfunc __raw_##op##_lock(locktype##_t *lock)			\
 		if (likely(do_raw_##op##_trylock(lock)))		\
 			break;						\
 		preempt_enable();					\
-									\
-		if (!(lock)->break_lock)				\
-			(lock)->break_lock = 1;				\
-		while ((lock)->break_lock)				\
-			arch_##op##_relax(&lock->raw_lock);		\
+		arch_##op##_relax(&lock->raw_lock);			\
 	}								\
-	(lock)->break_lock = 0;						\
 }									\
 									\
 unsigned long __lockfunc __raw_##op##_lock_irqsave(locktype##_t *lock)	\
@@ -85,13 +80,8 @@ unsigned long __lockfunc __raw_##op##_lock_irqsave(locktype##_t *lock)	\
 			break;						\
 		local_irq_restore(flags);				\
 		preempt_enable();					\
-									\
-		if (!(lock)->break_lock)				\
-			(lock)->break_lock = 1;				\
-		while ((lock)->break_lock)				\
-			arch_##op##_relax(&lock->raw_lock);		\
+		arch_##op##_relax(&lock->raw_lock);			\
 	}								\
-	(lock)->break_lock = 0;						\
 	return flags;							\
 }									\
 									\