[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211130184656.6958a442@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Tue, 30 Nov 2021 18:46:56 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Dust Li <dust.li@...ux.alibaba.com>
Cc: Karsten Graul <kgraul@...ux.ibm.com>,
"David S . Miller" <davem@...emloft.net>,
Ursula Braun <ubraun@...ux.ibm.com>,
Tony Lu <tonylu@...ux.alibaba.com>,
Wen Gu <guwen@...ux.alibaba.com>, linux-s390@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH net v2] net/smc: fix wrong list_del in
smc_lgr_cleanup_early
On Wed, 1 Dec 2021 10:31:47 +0800 Dust Li wrote:
> smc_lgr_cleanup_early() meant to delete the link
> group from the link group list, but it deleted
> the list head by mistake.
>
> This may cause memory corruption since we didn't
> remove the real link group from the list and later
> memseted the link group structure.
> We got a list corruption panic when testing:
>
> [ 231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000
> [ 231.278222] ------------[ cut here ]------------
> [ 231.278726] kernel BUG at lib/list_debug.c:53!
> [ 231.279326] invalid opcode: 0000 [#1] SMP NOPTI
> [ 231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435
> [ 231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
> [ 231.281248] Workqueue: events smc_link_down_work
> [ 231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90
> [ 231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c
> 60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f>
> 0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc
> [ 231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292
> [ 231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000
> [ 231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040
> [ 231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001
> [ 231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001
> [ 231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003
> [ 231.288337] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
> [ 231.289160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0
> [ 231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 231.291940] Call Trace:
> [ 231.292211] smc_lgr_terminate_sched+0x53/0xa0
> [ 231.292677] smc_switch_conns+0x75/0x6b0
> [ 231.293085] ? update_load_avg+0x1a6/0x590
> [ 231.293517] ? ttwu_do_wakeup+0x17/0x150
> [ 231.293907] ? update_load_avg+0x1a6/0x590
> [ 231.294317] ? newidle_balance+0xca/0x3d0
> [ 231.294716] smcr_link_down+0x50/0x1a0
> [ 231.295090] ? __wake_up_common_lock+0x77/0x90
> [ 231.295534] smc_link_down_work+0x46/0x60
> [ 231.295933] process_one_work+0x18b/0x350
>
> Fixes: a0a62ee15a829 ("net/smc: separate locks for SMCD and SMCR link group lists")
> Signed-off-by: Dust Li <dust.li@...ux.alibaba.com>
> Acked-by: Karsten Graul <kgraul@...ux.ibm.com>
> net/smc/smc_core.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index bb52c8b5f148..8759f9fd8113 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -625,18 +625,16 @@ int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb)
> void smc_lgr_cleanup_early(struct smc_connection *conn)
> {
> struct smc_link_group *lgr = conn->lgr;
> - struct list_head *lgr_list;
> spinlock_t *lgr_lock;
>
> if (!lgr)
> return;
>
> smc_conn_free(conn);
> - lgr_list = smc_lgr_list_head(lgr, &lgr_lock);
> spin_lock_bh(lgr_lock);
> /* do not use this link group for new connections */
> - if (!list_empty(lgr_list))
> - list_del_init(lgr_list);
> + if (!list_empty(&lgr->list))
> + list_del_init(&lgr->list);
> spin_unlock_bh(lgr_lock);
> __smc_lgr_terminate(lgr, true);
> }
clang has something to say about that:
net/smc/smc_core.c:634:15: warning: variable 'lgr_lock' is uninitialized when used here [-Wuninitialized]
spin_lock_bh(lgr_lock);
^~~~~~~~
net/smc/smc_core.c:628:22: note: initialize the variable 'lgr_lock' to silence this warning
spinlock_t *lgr_lock;
^
= NULL
Powered by blists - more mailing lists