lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Aug 2020 09:30:31 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     linux-kernel@...r.kernel.org, mingo@...nel.org, will@...nel.org
Cc:     npiggin@...il.com, elver@...gle.com, jgross@...e.com,
        paulmck@...nel.org, rostedt@...dmis.org, rjw@...ysocki.net,
        joel@...lfernandes.org, svens@...ux.ibm.com, tglx@...utronix.de,
        peterz@...radead.org
Subject: [PATCH 0/9] TRACE_IRQFLAGS wreckage


TRACE_IRQFLAGS

local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.

raw_local_irq_*() does not update the software state, no tracing.

---

Problem 1:

	raw_local_irq_save(); // software state on
	local_irq_save(); // software state off
	...
	local_irq_restore(); // software state still off, because we don't enable IRQs
	raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

 - lock_acquire()
     raw_local_irq_save()
     __lock_acquire()
       arch_spin_lock(&graph_lock)
         pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
           local_irq_save()

 - trace_clock_global()
     raw_local_irq_save()
     arch_spin_lock()
       pv_wait() := kvm_wait()
	 local_irq_save()

 - apic_retrigger_irq()
     raw_local_irq_save()
     apic->send_IPI() := default_send_IPI_single_phys()
       local_irq_save()

Possible solutions:

 A) make it work by enabling the tracing inside raw_*()
 B) make it work by keeping tracing disabled inside raw_*()
 C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

	raw_local_irq_save()
	local_irq_save()
	lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.

---

Problem 2:

	raw_local_irq_save(); // software state on
	trace_*()
          ...
          perf_tp_event()
            ...
            perf_callchain()
	    <#PF>
	      trace_hardirqs_off(); // software state off
	      ...
	      if (regs_irqs_disabled(regs)) // false
	        trace_hardirqs_on();
	    </#PF>
	raw_local_irq_restore(); // software state stays off, *whoopsie*

existing instances:

 - lock_acquire() / lock_release()
     raw_local_irq_save()
     trace_lock_acquire() / trace_lock_release()

 - function tracing

Possible solutions:

 A) fix every architecture's entry code
 B) only fix kernel/entry/common.c
 C) fix lockdep tracepoints and pray

This series does C, AFAICT this problem has existed forever.

---

Problem 3:

	raw_local_irq_save(); // software state on
	<#NMI>
	  trace_hardirqs_off(); // software state off
	  ...
	  if (regs_irqs_disabled(regs)) // false
	    trace_hardirqs_on();
	</#NMI>
	raw_local_irq_restore(); // software state stays off, *whoopsie*

Possible solutions:

This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().

In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).

There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.

---

The patches in this series do (in reverse order):

 - 2C
 - 1B
 - fix fallout in idle due to the trace_lock_*() tracepoints suddenly
   being visible to rcu-lockdep.

---
 arch/arm/mach-omap2/pm34xx.c      |    4 --
 arch/arm64/kernel/process.c       |    2 -
 arch/nds32/include/asm/irqflags.h |    5 ++
 arch/powerpc/include/asm/hw_irq.h |   11 ++---
 arch/s390/kernel/idle.c           |    3 -
 arch/x86/entry/thunk_32.S         |    5 --
 arch/x86/include/asm/mmu.h        |    1 
 arch/x86/kernel/process.c         |    4 --
 arch/x86/mm/tlb.c                 |   13 +-----
 drivers/cpuidle/cpuidle.c         |   19 +++++++--
 drivers/idle/intel_idle.c         |   16 --------
 include/linux/cpuidle.h           |   13 +++---
 include/linux/irqflags.h          |   73 ++++++++++++++++++++------------------
 include/linux/lockdep.h           |   18 ++++++---
 include/linux/mmu_context.h       |    5 ++
 kernel/locking/lockdep.c          |   18 +++++----
 kernel/sched/idle.c               |   21 ++++------
 17 files changed, 112 insertions(+), 119 deletions(-)



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ