linux-kernel - reply:just thank you all for your reply: [PATCH]--fix the race condition of remove_session

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <HK0PR02MB2563F5F9BE8E368DB9571EDCAB310@HK0PR02MB2563.apcprd02.prod.outlook.com>
Date:   Fri, 17 Jan 2020 08:33:09 +0000
From:   陈安庆 <chenanqing@...o.com>
To:     "Yan, Zheng" <zyan@...hat.com>, Jeff Layton <jlayton@...nel.org>,
        Sage Weil <sage@...hat.com>, Ilya Dryomov <idryomov@...il.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: reply:just thank you all for your reply: [PATCH]--fix the race
 condition of remove_session_caps of ceph which tirgger bugon,

thank you all for you reply ,it's my first time to submit patch ,although it is fixed by :
commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15

I'm sorry for my careless which waste your some time .



On 1/16/20 9:31 PM, Jeff Layton wrote:
> On Thu, 2020-01-16 at 02:02 +0000, 陈安庆 wrote:
>> Hi All，I think I find a bug in the ceph .
>>
>>
>> background:
>> [3418687.123610] kernel BUG at fs/ceph/mds_client.c:1325!
>> [3418687.124102] invalid opcode: 0000 [#1] SMP [3418687.130132] CPU:
>> 27 PID: 453692 Comm: kworker/27:2 Kdump: loaded Tainted: P OE
>> ------------ T 3.10.0-957.27.2.el7.x86_64 #1 [3418687.131427]
>> Hardware name: Inspur NF5288M5/YZMB-00834-101, BIOS 4.0.08 09/19/2019
>> [3418687.132109] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>> [3418687.132792] task: ffff94c932652080 ti: ffff94b73dae4000 task.ti:
>> ffff94b73dae4000 [3418687.133488] RIP: 0010:[<ffffffffc1acdc9f>]
>> [<ffffffffc1acdc9f>] remove_session_caps+0x1bf/0x1d0 [ceph]
>> [3418687.134213] RSP: 0018:ffff94b73dae7be0 EFLAGS: 00010202
>> [3418687.134933] RAX: 0000000000000001 RBX: ffff94a09a49ed38 RCX:
>> ffff94b73dae7be8 [3418687.135668] RDX: ffff94c409f78118 RSI:
>> ffff94b73dae7be8 RDI: ffff94a09a49e800 [3418687.136407] RBP:
>> ffff94b73dae7c38 R08: ffff94c409f78118 R09: 0000000000000001
>> [3418687.137147] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffff94a09a49e800 [3418687.137895] R13: ffff94b73dae7be8 R14:
>> ffff94c409f78000 R15: ffff94a09a49ed40 [3418687.138648] FS:
>> 0000000000000000(0000) GS:ffff94e0bc0c0000(0000)
>> knlGS:0000000000000000 [3418687.139414] CS: 0010 DS: 0000 ES: 0000
>> CR0: 0000000080050033 [3418687.140175] CR2: 00007f49c861a330 CR3: 0000002e4eec4000 CR4: 00000000007607e0 [3418687.140939] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3418687.141694] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [3418687.142439] PKRU: 00000000 [3418687.143173] Call Trace:
>> [3418687.143907] [<ffffffffc1ad399c>] dispatch+0x39c/0xb00 [ceph]
>> [3418687.144635] [<ffffffffbd81b56a>] ? kernel_recvmsg+0x3a/0x50
>> [3418687.145361] [<ffffffffc1a60fb4>] try_read+0x514/0x12c0 [libceph]
>> [3418687.146081] [<ffffffffc1a61f64>] ceph_con_workfn+0xe4/0x1530
>> [libceph] [3418687.146795] [<ffffffffbd2d1b60>] ?
>> finish_task_switch+0xe0/0x1c0 [3418687.147502] [<ffffffffbd969aba>] ?
>> __schedule+0x42a/0x860 [3418687.148201] [<ffffffffbd2baf9f>]
>> process_one_work+0x17f/0x440 [3418687.148895] [<ffffffffbd2bc036>]
>> worker_thread+0x126/0x3c0 [3418687.149579] [<ffffffffbd2bbf10>] ?
>> manage_workers.isra.25+0x2a0/0x2a0
>> [3418687.150261] [<ffffffffbd2c2e81>] kthread+0xd1/0xe0
>> [3418687.150933] [<ffffffffbd2c2db0>] ? insert_kthread_work+0x40/0x40
>> [3418687.151603] [<ffffffffbd976c1d>]
>> ret_from_fork_nospec_begin+0x7/0x21
>> [3418687.152266] [<ffffffffbd2c2db0>] ? insert_kthread_work+0x40/0x40
>> [3418687.152921] Code: 5d 41 5e 41 5f 5d c3 48 89 fa 48 c7 c6 68 31
>> ae c1 48 c7 c7 98 0f af c1 31 c0 e8 8d 5c ad fb e9 96 fe ff ff e8 03
>> a9 7c fb 0f 0b <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00
>> 0f 1f 44 00 [3418687.154318] RIP [<ffffffffc1acdc9f>]
>> remove_session_caps+0x1bf/0x1d0 [ceph] [3418687.154998] RSP
>> <ffff94b73dae7be0>
>>
>>
>> and I find another thread which is waiting for the spinlock of session->s_cap_lock.
>> PID: 512130 TASK: ffff94bcb5afc100 CPU: 62 COMMAND: "kworker/62:4"
>> #0 [ffff94e0bc588e48] crash_nmi_callback at ffffffffbd256027
>> #1 [ffff94e0bc588e58] nmi_handle at ffffffffbd96e91c
>> #2 [ffff94e0bc588eb0] do_nmi at ffffffffbd96eb3d
>> #3 [ffff94e0bc588ef0] end_repeat_nmi at ffffffffbd96dd89 [exception
>> RIP: native_queued_spin_lock_slowpath+462]
>> RIP: ffffffffbd3135de RSP: ffff94cfbf2cf800 RFLAGS: 00000202
>> RAX: 0000000000000001 RBX: ffff94c409f78000 RCX: 0000000000000001
>> RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff94a09a49ed38
>> RBP: ffff94cfbf2cf800 R8: 0000000000000101 R9: ffff94a1e6d2e180
>> R10: ffff94e1bfbba0e0 R11: ffffffffffffffff R12: ffff94a6c763e9b0
>> R13: ffff94a09a49ed38 R14: ffff94a09a49e800 R15: 0000000000000400
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> --- <NMI exception stack> ---
>> #4 [ffff94cfbf2cf800] native_queued_spin_lock_slowpath at
>> ffffffffbd3135de
>> #5 [ffff94cfbf2cf808] queued_spin_lock_slowpath at ffffffffbd95e2cb
>> #6 [ffff94cfbf2cf818] _raw_spin_lock at ffffffffbd96c7a0
>> #7 [ffff94cfbf2cf828] __ceph_remove_cap at ffffffffc1ac1366
>> [ceph]----------this function is waiting for spinlock of
>> session->s_cap_lock
>> #8 [ffff94cfbf2cf870] ceph_queue_caps_release at ffffffffc1ac156c
>> [ceph]
>> #9 [ffff94cfbf2cf890] ceph_destroy_inode at ffffffffc1aaabd5 [ceph]
>> #10 [ffff94cfbf2cf8d0] destroy_inode at ffffffffbd45fe4b
>> #11 [ffff94cfbf2cf8e8] evict at ffffffffbd45ff85
>> #12 [ffff94cfbf2cf910] iput at ffffffffbd46082c
>> #13 [ffff94cfbf2cf940] ceph_put_wrbuffer_cap_refs at ffffffffc1ac46e4
>> [ceph]
>> #14 [ffff94cfbf2cf9b0] writepages_finish at ffffffffc1ab9ecc [ceph]
>> #15 [ffff94cfbf2cfa30] __complete_request at ffffffffc1a67ebe
>> [libceph]
>> #16 [ffff94cfbf2cfa50] handle_reply at ffffffffc1a70d1d [libceph]
>> #17 [ffff94cfbf2cfc10] dispatch at ffffffffc1a72ae3 [libceph]
>> #18 [ffff94cfbf2cfcd0] try_read at ffffffffc1a60fb4 [libceph]
>> #19 [ffff94cfbf2cfd90] ceph_con_workfn at ffffffffc1a61f64 [libceph]
>> #20 [ffff94cfbf2cfe20] process_one_work at ffffffffbd2baf9f
>> #21 [ffff94cfbf2cfe68] worker_thread at ffffffffbd2bc036
>> #22 [ffff94cfbf2cfec8] kthread at ffffffffbd2c2e81
>> #23 [ffff94cfbf2cff50] ret_from_fork_nospec_begin at ffffffffbd976c1d
>>
>> after I read the ceph module of https://github.com/torvalds/linux.git
>> and kernel/git/next/linux-next.git ,I think the bug is existed all the same.
>>
>> so I make a patch,maybe I should replace the pr_warn_ratelimited by some function like udelay?
>>
>

I think this bug is fixed by.

commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15
Author: Yan, Zheng <zyan@...hat.com>
Date:   Sun Jun 2 09:45:38 2019 +0800

ceph: use ceph_evict_inode to cleanup inode's resource

remove_session_caps() relies on __wait_on_freeing_inode(), to wait for freeing inode to remove its caps. But VFS wakes freeing inode waiters before calling destroy_inode().

Cc: stable@...r.kernel.org
Link: https://tracker.ceph.com/issues/40102
Signed-off-by: "Yan, Zheng" <zyan@...hat.com>
Reviewed-by: Jeff Layton <jlayton@...hat.com>
Signed-off-by: Ilya Dryomov <idryomov@...il.com>




> Thanks for the patch.
>
> First, be sure you follow the patch submission guidelines when sending
> a patch. In particular, git-send-email is really the best way to
> submit patches.
>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index
>> 4cfc4df9fc34..3734d2afb3c5 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -1503,8 +1503,14 @@ static void remove_session_caps(struct ceph_mds_session *session)
>>                  while (!list_empty(&session->s_caps)) {
>>                          cap = list_entry(session->s_caps.next,
>>                                           struct ceph_cap, session_caps);
>> -                       if (cap == prev)
>> -                                break;
>> +                       if (cap == prev) {
>> +                               spin_unlock(&session->s_cap_lock);
>> +                               pr_warn_ratelimited(
>> +                               "removing cap %p, inode is %p be delayed\n",
>> +                               cap,  inode);
>> +                               spin_lock(&session->s_cap_lock);
>> +                               continue;
>> +                       }
>
> (cc'ing Zheng)
>
> That seems like it might be prone to spinning for a very long time if
> we have an inode with an outstanding reference for a long time. I get
> that this is a problem though.
>
> I think the real bug is that this is not moving the cap to the end of
> the list, so that it can attempt to clean up more than one that might
> be there. Probably we just need to do a list_move_tail on the entry
> before dropping the spinlock?
>
>
>
>>                          prev = cap;
>>                          vino = cap->ci->i_vino;
>>                          spin_unlock(&session->s_cap_lock);
>> @@ -1520,6 +1526,13 @@ static void remove_session_caps(struct ceph_mds_session *session)
>>          // drop cap expires and unlock s_cap_lock
>>          detach_cap_releases(session, &dispose);
>>
>> +       /*
>> +        * if ceph_async_iput execute ceph_destroy_inode which
>> +        * call __ceph_remove_cap finally to dec the session->s_nr_caps
>> +        * maybe after than BUGON,because it need session->s_cap_lock
>> +        * then BUGON(session->s_nr_caps > 0) must be triggered ,
>> +        * although it is just a race condition.
>> +        */
>>          BUG_ON(session->s_nr_caps > 0);
>>          BUG_ON(!list_empty(&session->s_cap_flushing));
>>          spin_unlock(&session->s_cap_lock);
>>
________________________________
OPPO

本电子邮件及其附件含有OPPO公司的保密信息，仅限于邮件指明的收件人使用（包含个人及群组）。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件，请立即以电子邮件通知发件人并删除本邮件及其附件。

This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!