lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1711151023150.5420@file01.intranet.prod.int.rdu2.redhat.com>
Date:   Wed, 15 Nov 2017 12:08:32 -0500 (EST)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     Sebastian Siewior <bigeasy@...utronix.de>
cc:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper



On Tue, 14 Nov 2017, Sebastian Siewior wrote:

> [ minimize CC ]
> 
> On 2017-11-13 12:56:53 [-0500], Mikulas Patocka wrote:
> > Hi
> > 
> > I'm submitting this patch for the CONFIG_PREEMPT_RT patch. It fixes 
> > deadlocks in device mapper when real time preemption is used.
> 
> I run into a EXT4 deadlock which is fixed by your patch. I think we had
> earlier issues which were duct taped via
>    fs-jbd2-pull-your-plug-when-waiting-for-space.patch
> 
> What I observed from that deadlock I meant to debug is that I had one
> task owning a bit_spinlock (buffer lock) and wanting W i_data_sem and
> another task owning W i_data_sem and waiting for the same bit_spinlock.
> Here it was wb_writeback() vs vfs_truncate() (to keep it short).

So, send the stacktraces to VFS maintainers. This could deadlock on non-RT 
kernel too.

> Could you please explain how this locking is supposed to work

The scenario for the deadlock is explained here 
https://www.redhat.com/archives/dm-devel/2014-May/msg00089.html
It was fixed by commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755

> and why RT deadlocks while !RT does not? Or does !RT rely on the flush 
> in sched_submit_work() which is often skipped in RT because most locks 
> are rtmutex based?

Yes - that's the reason. The non-rt kernel uses rt mutexes very rarely 
(they are only used in kernel/rcu/tree.h, include/linux/i2c.h and 
kernel/futex.c).

If non-rt kernel used rt mutexes in the I/O stack, the deadlock would also 
happen there.

> Because if that is
> the case then we might get the deadlock upstream, too once we get a
> rtmutex somewhere in VFS (since I doubt it would be possible with a
> futex based test case).
> 
> > Mikulas
> 
> Sebastian

Mikulas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ