linux-kernel - Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.02.1711201706190.24006@file01.intranet.prod.int.rdu2.redhat.com>
Date:   Mon, 20 Nov 2017 17:11:06 -0500 (EST)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     Sebastian Siewior <bigeasy@...utronix.de>
cc:     Mike Galbraith <efault@....de>, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        linux-rt-users@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper



On Mon, 20 Nov 2017, Mikulas Patocka wrote:

> 
> 
> On Mon, 20 Nov 2017, Sebastian Siewior wrote:
> 
> > On 2017-11-18 19:37:10 [+0100], Mike Galbraith wrote:
> > > Below is my 2012 3.0-rt version of that for reference; at that time we
> > > were using slab, and slab_lock referenced below was a local_lock.  The
> > > comment came from crash analysis of a deadlock I met before adding the
> > > (yeah, hacky) __migrate_disabled() blocker.  At the time, that was not
> > > an optional hack, you WOULD deadlock if you ground disks to fine powder
> > > the way SUSE QA did.  Pulling the plug before blocking cured the
> > > xfs/ext[34] IO deadlocks they griped about, but I had to add that hack
> > > to not trade their nasty IO deadlocks for the more mundane variety.  So
> > > my question is: are we sure that in the here and now flush won't want
> > > any lock we may be holding?  In days of yore, it most definitely would
> > > turn your box into a doorstop if you tried hard enough.
> > 
> > I have a deadlock in ftest01/LTP which is cured by that.
> > The root-problem (as I understand it) is that !RT does
> >   schedule() -> sched_submit_work() -> blk_schedule_flush_plug()
> > 
> > on a lock contention (that is that bit_spinlock or rwsem in jbd2/ext4
> > for instance). On RT this does not happen because of tsk_is_pi_blocked()
> > check in sched_submit_work(). That check is needed because an additional
> > (rtmutex) lock can not be acquired at this point.
> 
> bit_spin_lock on non-RT kernel doesn't call blk_schedule_flush_plug(). So, 
> if not calling it causes deadlock, it should be fixed in non-RT kernel as 
> well.
> 
> It is highly questionable - how could bit_spin_lock depend on 
> blk_schedule_flush_plug() at all? bit_spin_lock spins in a loop until the 
> specified bit is clear. blk_schedule_flush_plug() submits disk requests to 
> the disk driver.
> 
> If some part of kernel spins until disk requests are completed, it is 
> already seriously misdesigned. Spinning until disk requests complete 
> wouldn't work on uniprocessor non-preemptive kernel at all.
> 
> So, I suspect that the cause of the deadlock is something completely 
> different.
> 
> Mikulas

BTW. if you have some deadlock on jbd_lock_bh_state or 
jbd_lock_bh_journal_head or bh_uptodate_lock_irqsave (these functions use 
bit_spin_lock on non-RT kernel) - then, there must be some other task that 
holds this lock, so identify such task and send the stacktrace for 
analysis.

It may be a generic bug in jbd2 code (that is just triggered because the 
RT patch changes timings) or it may be jbd2 code incorrectly patched by 
the RT patch.

Spinlocks have nothing to do with blk_schedule_flush_plug().

Mikulas