linux-kernel - Re: [PATCH] fs/pstore: Perform erase from a worker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGXu5jLML8yUM4bi=pKikwGUcyDapwgw6Wh_YCtU_BTMJ+OgQA@mail.gmail.com>
Date:   Tue, 21 Mar 2017 11:16:19 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Chris Wilson <chris@...is-wilson.co.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        intel-gfx@...ts.freedesktop.org,
        Tomi Sarvela <tomi.p.sarvela@...el.com>,
        Anton Vorontsov <anton@...msg.org>,
        Colin Cross <ccross@...roid.com>,
        Tony Luck <tony.luck@...el.com>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        "# v4 . 10+" <stable@...r.kernel.org>, kernel-team@....com
Subject: Re: [PATCH] fs/pstore: Perform erase from a worker

On Mon, Mar 20, 2017 at 10:58 PM, Namhyung Kim <namhyung@...nel.org> wrote:
> Hello,
>
> On Mon, Mar 20, 2017 at 10:49:16AM -0700, Kees Cook wrote:
>> On Fri, Mar 17, 2017 at 2:52 AM, Chris Wilson <chris@...is-wilson.co.uk> wrote:
>> > In order to prevent a cyclic recursion between psi->read_mutex and the
>> > inode_lock, we need to move the pse->erase to a worker.
>> >
>> > [  605.374955] ======================================================
>> > [  605.381281] [ INFO: possible circular locking dependency detected ]
>> > [  605.387679] 4.11.0-rc2-CI-CI_DRM_2352+ #1 Not tainted
>> > [  605.392826] -------------------------------------------------------
>> > [  605.399196] rm/7298 is trying to acquire lock:
>> > [  605.403720]  (&psinfo->read_mutex){+.+.+.}, at: [<ffffffff813e183f>] pstore_unlink+0x3f/0xa0
>> > [  605.412300]
>> > [  605.412300] but task is already holding lock:
>> > [  605.418237]  (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff812157ec>] vfs_unlink+0x4c/0x19
>> > 0
>> > [  605.427397]
>> > [  605.427397] which lock already depends on the new lock.
>> > [  605.427397]
>> > [  605.435770]
>> > [  605.435770] the existing dependency chain (in reverse order) is:
>> > [  605.443396]
>> > [  605.443396] -> #1 (&sb->s_type->i_mutex_key#14){++++++}:
>> > [  605.450347]        lock_acquire+0xc9/0x220
>> > [  605.454551]        down_write+0x3f/0x70
>> > [  605.458484]        pstore_mkfile+0x1f4/0x460
>> > [  605.462835]        pstore_get_records+0x17a/0x320
>> > [  605.467664]        pstore_fill_super+0xa4/0xc0
>> > [  605.472205]        mount_single+0x89/0xb0
>> > [  605.476314]        pstore_mount+0x13/0x20
>> > [  605.480411]        mount_fs+0xf/0x90
>> > [  605.484122]        vfs_kern_mount+0x66/0x170
>> > [  605.488464]        do_mount+0x190/0xd50
>> > [  605.492397]        SyS_mount+0x90/0xd0
>> > [  605.496212]        entry_SYSCALL_64_fastpath+0x1c/0xb1
>> > [  605.501496]
>> > [  605.501496] -> #0 (&psinfo->read_mutex){+.+.+.}:
>> > [  605.507747]        __lock_acquire+0x1ac0/0x1bb0
>> > [  605.512401]        lock_acquire+0xc9/0x220
>> > [  605.516594]        __mutex_lock+0x6e/0x990
>> > [  605.520755]        mutex_lock_nested+0x16/0x20
>> > [  605.525279]        pstore_unlink+0x3f/0xa0
>> > [  605.529465]        vfs_unlink+0xb5/0x190
>> > [  605.533477]        do_unlinkat+0x24c/0x2a0
>> > [  605.537672]        SyS_unlinkat+0x16/0x30
>> > [  605.541781]        entry_SYSCALL_64_fastpath+0x1c/0xb1
>>
>> If I'm reading this right it's a race between mount and unlink...
>> that's quite a corner case. :)
>>
>> > [  605.547067]
>> > [  605.547067] other info that might help us debug this:
>> > [  605.547067]
>> > [  605.555221]  Possible unsafe locking scenario:
>> > [  605.555221]
>> > [  605.561280]        CPU0                    CPU1
>> > [  605.565883]        ----                    ----
>> > [  605.570502]   lock(&sb->s_type->i_mutex_key#14);
>> > [  605.575217]                                lock(&psinfo->read_mutex);
>> > [  605.581803]                                lock(&sb->s_type->i_mutex_key#14);
>> > [  605.589159]   lock(&psinfo->read_mutex);
>>
>> I haven't had time to dig much yet, but I wonder if the locking order
>> on unlink could just be reversed, and the deadlock would go away?
>
> IIUC, the unlink path locks a file in the root directory, while the
> mount path locks the root directory.  Maybe we can use a subclass?
> (not tested)
>
> Thanks,
> Namhyung
>
>
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 06504b69575b..6eea6bcf90c8 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -394,7 +394,7 @@ int pstore_mkfile(struct pstore_record *record)
>                 break;
>         }
>
> -       inode_lock(d_inode(root));
> +       inode_lock_nested(d_inode(root), I_MUTEX_PARENT);
>
>         dentry = d_alloc_name(root, name);
>         if (!dentry)

No luck, unfortunately...

In looking at other examples, I don't think the inode_lock is needed
at all? I see other uses of d_alloc_name() and d_add() without an
inode lock (proc, libfs, etc), and the locking documentation doesn't
seem to imply it either? This solves the lockdep, though it's unclear
to me if it is somehow unsafe (apologies for whitespace damage...):


diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 06504b69575b..3d83b13d2338 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -394,11 +394,9 @@ int pstore_mkfile(struct pstore_record *record)
                break;
        }

-       inode_lock(d_inode(root));
-
        dentry = d_alloc_name(root, name);
        if (!dentry)
-               goto fail_lockedalloc;
+               goto fail_private;

        inode->i_size = private->total_size = size;

@@ -413,16 +411,12 @@ int pstore_mkfile(struct pstore_record *record)
        list_add(&private->list, &allpstore);
        spin_unlock_irqrestore(&allpstore_lock, flags);

-       inode_unlock(d_inode(root));
-
        return 0;

-fail_lockedalloc:
-       inode_unlock(d_inode(root));
+fail_private:
        free_pstore_private(private);
 fail_alloc:
        iput(inode);
-
 fail:
        return rc;
 }


-Kees

-- 
Kees Cook
Pixel Security