lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190822152945.GC28441@linux.ibm.com>
Date:   Thu, 22 Aug 2019 08:29:45 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Scott Wood <swood@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Clark Williams <williams@...hat.com>
Subject: Re: [PATCH RT v2 3/3] rcu: Disable use_softirq on PREEMPT_RT

On Thu, Aug 22, 2019 at 09:59:53AM -0400, Joel Fernandes wrote:
> On Wed, Aug 21, 2019 at 06:19:06PM -0500, Scott Wood wrote:
> > Besides restoring behavior that used to be default on RT, this avoids
> > a deadlock on scheduler locks:
> > 
> > [  136.894657] 039: ============================================
> > [  136.900401] 039: WARNING: possible recursive locking detected
> > [  136.906146] 039: 5.2.9-rt3.dbg+ #174 Tainted: G            E
> > [  136.912152] 039: --------------------------------------------
> > [  136.917896] 039: rcu_torture_rea/13474 is trying to acquire lock:
> > [  136.923990] 039: 000000005f25146d
> > [  136.927310] 039:  (
> > [  136.929414] 039: &p->pi_lock
> > [  136.932303] 039: ){-...}
> > [  136.934840] 039: , at: try_to_wake_up+0x39/0x920
> > [  136.939461] 039:
> > but task is already holding lock:
> > [  136.944425] 039: 000000005f25146d
> > [  136.947745] 039:  (
> > [  136.949852] 039: &p->pi_lock
> > [  136.952738] 039: ){-...}
> > [  136.955274] 039: , at: try_to_wake_up+0x39/0x920
> > [  136.959895] 039:
> > other info that might help us debug this:
> > [  136.965555] 039:  Possible unsafe locking scenario:
> > 
> > [  136.970608] 039:        CPU0
> > [  136.973493] 039:        ----
> > [  136.976378] 039:   lock(
> > [  136.978918] 039: &p->pi_lock
> > [  136.981806] 039: );
> > [  136.983911] 039:   lock(
> > [  136.986451] 039: &p->pi_lock
> > [  136.989336] 039: );
> > [  136.991444] 039:
> >  *** DEADLOCK ***
> > 
> > [  136.995194] 039:  May be due to missing lock nesting notation
> > 
> > [  137.001115] 039: 3 locks held by rcu_torture_rea/13474:
> > [  137.006341] 039:  #0:
> > [  137.008707] 039: 000000005f25146d
> > [  137.012024] 039:  (
> > [  137.014131] 039: &p->pi_lock
> > [  137.017015] 039: ){-...}
> > [  137.019558] 039: , at: try_to_wake_up+0x39/0x920
> > [  137.024175] 039:  #1:
> > [  137.026540] 039: 0000000011c8e51d
> > [  137.029859] 039:  (
> > [  137.031966] 039: &rq->lock
> > [  137.034679] 039: ){-...}
> > [  137.037217] 039: , at: try_to_wake_up+0x241/0x920
> > [  137.041924] 039:  #2:
> > [  137.044291] 039: 00000000098649b9
> > [  137.047610] 039:  (
> > [  137.049714] 039: rcu_read_lock
> > [  137.052774] 039: ){....}
> > [  137.055314] 039: , at: cpuacct_charge+0x33/0x1e0
> > [  137.059934] 039:
> > stack backtrace:
> > [  137.063425] 039: CPU: 39 PID: 13474 Comm: rcu_torture_rea Kdump: loaded Tainted: G            E     5.2.9-rt3.dbg+ #174
> > [  137.074197] 039: Hardware name: Intel Corporation S2600BT/S2600BT, BIOS SE5C620.86B.01.00.0763.022420181017 02/24/2018
> > [  137.084886] 039: Call Trace:
> > [  137.087773] 039:  <IRQ>
> > [  137.090226] 039:  dump_stack+0x5e/0x8b
> > [  137.093997] 039:  __lock_acquire+0x725/0x1100
> > [  137.098358] 039:  lock_acquire+0xc0/0x240
> > [  137.102374] 039:  ? try_to_wake_up+0x39/0x920
> > [  137.106737] 039:  _raw_spin_lock_irqsave+0x47/0x90
> > [  137.111534] 039:  ? try_to_wake_up+0x39/0x920
> > [  137.115910] 039:  try_to_wake_up+0x39/0x920
> > [  137.120098] 039:  rcu_read_unlock_special+0x65/0xb0
> > [  137.124977] 039:  __rcu_read_unlock+0x5d/0x70
> > [  137.129337] 039:  cpuacct_charge+0xd9/0x1e0
> > [  137.133522] 039:  ? cpuacct_charge+0x33/0x1e0
> > [  137.137880] 039:  update_curr+0x14b/0x420
> > [  137.141894] 039:  enqueue_entity+0x42/0x370
> > [  137.146080] 039:  enqueue_task_fair+0xa9/0x490
> > [  137.150528] 039:  activate_task+0x5a/0xf0
> > [  137.154539] 039:  ttwu_do_activate+0x4e/0x90
> > [  137.158813] 039:  try_to_wake_up+0x277/0x920
> > [  137.163086] 039:  irq_exit+0xb6/0xf0
> > [  137.166661] 039:  smp_apic_timer_interrupt+0xe3/0x3a0
> > [  137.171714] 039:  apic_timer_interrupt+0xf/0x20
> > [  137.176249] 039:  </IRQ>
> > [  137.178785] 039: RIP: 0010:__schedule+0x0/0x8e0
> > [  137.183319] 039: Code: 00 02 48 89 43 20 e8 0f 5a 00 00 48 8d 7b 28 e8 86 f2 fd ff 31 c0 5b 5d 41 5c c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 41 57 41 56 49 c7 c6 c0 ca 1e 00 41 55 41 89 fd 41 54
> > [  137.202498] 039: RSP: 0018:ffffc9005835fbc0 EFLAGS: 00000246
> > [  137.208158] 039:  ORIG_RAX: ffffffffffffff13
> > [  137.212428] 039: RAX: 0000000000000000 RBX: ffff8897c3e1bb00 RCX: 0000000000000001
> > [  137.219994] 039: RDX: 0000000080004008 RSI: 0000000000000006 RDI: 0000000000000001
> > [  137.227560] 039: RBP: ffff8897c3e1bb00 R08: 0000000000000000 R09: 0000000000000000
> > [  137.235126] 039: R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff81001fd1
> > [  137.242694] 039: R13: 0000000000000044 R14: 0000000000000000 R15: ffffc9005835fcac
> > [  137.250259] 039:  ? ___preempt_schedule+0x16/0x18
> > [  137.254969] 039:  preempt_schedule_common+0x32/0x80
> > [  137.259846] 039:  ___preempt_schedule+0x16/0x18
> > [  137.264379] 039:  rcutorture_one_extend+0x33a/0x510 [rcutorture]
> > [  137.270397] 039:  rcu_torture_one_read+0x18c/0x450 [rcutorture]
> > [  137.276334] 039:  rcu_torture_reader+0xac/0x1f0 [rcutorture]
> > [  137.281998] 039:  ? rcu_torture_reader+0x1f0/0x1f0 [rcutorture]
> > [  137.287920] 039:  kthread+0x106/0x140
> > [  137.291591] 039:  ? rcu_torture_one_read+0x450/0x450 [rcutorture]
> > [  137.297681] 039:  ? kthread_bind+0x10/0x10
> > [  137.301783] 039:  ret_from_fork+0x3a/0x50
> > 
> > Signed-off-by: Scott Wood <swood@...hat.com>
> > ---
> > I think the prohibition on use_softirq can be dropped once RT gets the
> > latest RCU code, but the question of what use_softirq should default
> > to on PREEMPT_RT remains.
> 
> Independent of the question of what use_softirq should default to, could we
> test RT with latest RCU code now to check if the deadlock goes away?  That
> way, maybe we can find any issues in current RCU that cause scheduler
> deadlocks in the situation you pointed. The reason I am asking is because
> recently additional commits [1] try to prevent deadlock and it'd be nice to
> ensure that other conditions are not lingering (I don't think they are but
> it'd be nice to be sure).
> 
> I am happy to do such testing myself if you want, however what does it take
> to apply the RT patchset to the latest mainline? Is it an achievable feat?

I suggest just using the -rt git tree, which I believe lives here:

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

I would guess that branch linux-5.2.y-rt is the one you want, but I
would ask Scott instead of blindly trusting my guess.  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ