linux-kernel - Re: possible deadlock in shmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.2004161718030.16488@eggly.anvils>
Date:   Thu, 16 Apr 2020 17:19:43 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Yang Shi <shy828301@...il.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        syzbot <syzbot+c8a8197c8852f566b9d9@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>, syzkaller-bugs@...glegroups.com,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: possible deadlock in shmem_uncharge

On Wed, 15 Apr 2020, Hugh Dickins wrote:
> On Wed, 15 Apr 2020, Yang Shi wrote:
> > On Wed, Apr 15, 2020 at 7:04 PM Hugh Dickins <hughd@...gle.com> wrote:
> > > On Mon, 13 Apr 2020, Yang Shi wrote:
> > > >
> > > > It looks shmem_uncharge() is just called by __split_huge_page() and
> > > > collapse_file(). The collapse_file() has acquired xa_lock with irq
> > > > disabled before acquiring info->lock, so it is safe.
> > > > __split_huge_page() is called with holding xa_lock with irq enabled,
> > > > but lru_lock is acquired with irq disabled before acquiring xa_lock.
> > > >
> > > > So, it is unnecessary to acquire info->lock with irq disabled in
> > > > shmem_uncharge(). Can syzbot try the below patch?
> > >
> > > But I disagree with the patch below.  You're right that IRQ-disabling
> > > here is unnecessary, given its two callers; but I'm not sure that we
> > > want it to look different from shmem_charge() and all other info->lock
> > > takers; and, more importantly, I don't see how removing the redundant
> > > IRQ-saving below could make it any less liable to deadlock.
> > 
> > Yes, I realized the patch can't suppress the lockdep splat. But,
> > actually I didn't understand how this deadlock could happen because
> > info_lock is acquired with IRQ disabled before acquiring
> > user_shm_lock. So, interrupt can't come in at all if I didn't miss
> > anything.
> 
> I think the story it's trying to tell is this (but, like most of us,
> I do find Mr Lockdep embarrassingly difficult to understand; and I'm
> not much good at drawing race diagrams either):
> 
> CPU0 was in user_shm_unlock(), it's got shmlock_user_lock, then an
> interrupt comes in. It's an endio kind of interrupt, which goes off
> to test_clear_page_writeback(), which wants the xa_lock on i_pages.
> 
> Meanwhile, CPU1 was doing some SysV SHM locking, it's got as far as
> shmem_lock(), it has acquired info->lock, and goes off to user_shm_lock()
> which wants shmlock_user_lock.
> 
> But sadly, CPU2 is splitting a shmem THP, calling shmem_uncharge()
> that wants info->lock while outer level holds xa_lock on i_pages:
> with interrupts properly disabled, but that doesn't help.
> 
> Now, that story doesn't quite hold up as a deadlock, because shmem
> doesn't use writeback tags; and (unless you set shmem_enabled "force")
> I don't think there's a way to get shmem THPs in SysV SHM (and are
> they hole-punchable? maybe through MADV_REMOVE); so it looks like
> we're talking about different inodes.
> 
> But lockdep is right to report it, and more thought might arrive at
> a more convincing scenario.  Anyway, easily fixed and best fixed.
> 
> (But now I think my patch must wait until tomorrow.)

https://lore.kernel.org/lkml/alpine.LSU.2.11.2004161707410.16322@eggly.anvils/