linux-kernel - Re: BUG_ON in rcu_sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJFSNy4EZzL7+aWC8xD63rTgcQ3OaBokNB_scpzRDRA53sukEA@mail.gmail.com>
Date:   Fri, 23 Sep 2016 16:35:04 +0300
From:   Nikolay Borisov <kernel@...p.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>
Subject: Re: BUG_ON in rcu_sync_func triggered

On Wed, Sep 14, 2016 at 3:58 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> On 09/14, Nikolay Borisov wrote:
>>
>> [  557.006656]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
>> [  557.012737]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
>> [  557.019781]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
>> [  557.026645]  [<ffffffff810ab9a8>] rcu_sync_enter+0x148/0x1a0
>> [  557.033309]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
>> [  557.040074]  [<ffffffff81315683>] ? call_rwsem_down_write_failed+0x13/0x20
>> [  557.048092]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
>> [  557.054456]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
>> [  557.060920]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
>> [  557.067480]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
>> [  557.073465]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
>> [  557.081015] ---[ end trace fc087420ac1d8f16 ]---
>> [  557.086507] XXX: ffff880473326b08 gp=2 cnt=-1 cb=1
>> [  557.092326] rbd: rbd19: added with size 0x500000000
>>
>> This is: if (WARN_ON(rsp->gp_count < 0)) xxx(rsp);
>
> Thanks a lot. This is what I wanted to see. However, I can't understand why
> you did not hit the similar WARN_ON(rsp->gp_count <= 0) in rcu_sync_exit()
> before that.
>
> OK, in any case this doesn't look as a bug in rcu/sync.c, could you please
> try the fix below? Not sure it will help, perhaps there is something else...
> No need to revert the previous debugging patch.
>
> Thanks,
>
> Oleg.
>
>
> diff --git a/fs/super.c b/fs/super.c
> index d78b984..a90bdff 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1344,7 +1344,9 @@ int thaw_super(struct super_block *sb)
>         int error;
>
>         down_write(&sb->s_umount);
> -       if (sb->s_writers.frozen == SB_UNFROZEN) {
> +       if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
> +               if (sb->s_writers.frozen != SB_UNFROZEN)
> +                       pr_crit("THAW: hit the race: %d\n", sb->s_writers.frozen);
>                 up_write(&sb->s_umount);
>                 return -EINVAL;
>         }
>

I was away on holiday so that's why I was silent. However, with this
patch applied I couldn't reproduce the issue nor the pr_crit
triggered. Have you had any moments of epiphany re. this issue? Should
some FS people be involved in the discussion?