[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8dcc39ab-be6d-6bb8-9f43-ff8695908f15@sangfor.com.cn>
Date: Fri, 8 Sep 2023 09:03:52 +0800
From: Ding Hui <dinghui@...gfor.com.cn>
To: Mike Christie <michael.christie@...cle.com>,
Huang Cun <huangcun@...gfor.com.cn>, jejb@...ux.ibm.com,
martin.petersen@...cle.com, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: pengdonglin@...gfor.com.cn
Subject: Re: [PATCH] scsi: scsi_dh_rdac: Avoid crash when a disk attach failed
On 2023/9/7 9:45, Ding Hui wrote:
> On 2023/9/6 23:51, Mike Christie wrote:
>> On 8/3/23 6:28 AM, Huang Cun wrote:
>>> When a disk fails to attach, the struct rdac_dh_data is released,
>>> but it is not removed from the ctlr->dh_list. When attaching another
>>> disk, the released rdac_dh_data will be accessed and the following
>>> BUG_ON() may be observed:
>>>
>>> [ 414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
>>> ...
>>> [ 423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
>>> [ 423.615731] invalid opcode: 0000 [#1] SMP NOPTI
>>> ...
>>> [ 423.623247] Call Trace:
>>> [ 423.623598] rdac_bus_attach+0x203/0x4c0
>>> [ 423.623949] ? scsi_dh_handler_attach+0x2d/0x90
>>> [ 423.624300] scsi_dh_handler_attach+0x2d/0x90
>>> [ 423.624652] scsi_sysfs_add_sdev+0x88/0x270
>>> [ 423.625004] scsi_probe_and_add_lun+0xc47/0xd50
>>> [ 423.625354] scsi_report_lun_scan+0x339/0x3b0
>>> [ 423.625705] __scsi_scan_target+0xe9/0x220
>>> [ 423.626056] scsi_scan_target+0xf6/0x100
>>> [ 423.626404] fc_scsi_scan_rport+0xa5/0xb0
>>> [ 423.626757] process_one_work+0x15e/0x3f0
>>> [ 423.627106] worker_thread+0x4c/0x440
>>> [ 423.627453] ? rescuer_thread+0x350/0x350
>>> [ 423.627804] kthread+0xf8/0x130
>>> [ 423.628153] ? kthread_destroy_worker+0x40/0x40
>>> [ 423.628509] ret_from_fork+0x1f/0x40
>>>
>>> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
>>> Signed-off-by: Huang Cun <huangcun@...gfor.com.cn>
>>> Signed-off-by: Ding Hui <dinghui@...gfor.com.cn>
>>> Cc: Donglin Peng <pengdonglin@...gfor.com.cn>
>>> ---
>>> drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> index c5538645057a..9d487c2b7708 100644
>>> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
>>> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>>> clean_ctlr:
>>> spin_lock(&list_lock);
>>> + list_del_rcu(&h->node);
>>> kref_put(&h->ctlr->kref, release_controller);
>>> spin_unlock(&list_lock);
>>> + synchronize_rcu();
>>
>> Should this be:
>>
>> spin_lock(&list_lock);
>> list_del_rcu(&h->node);
>> spin_unlock(&list_lock);
>>
>> synchronize_rcu();
>>
>> kref_put(&h->ctlr->kref, release_controller);
>>
>>
>> ?
>>
>> If you do the synchronize_rcu after the kref_put, then the kref_put
>> could free the rdac_dh_data, while check_ownership is still
>> accessing the rdac_dh_data, right?
>>
>
> You are right.
>
> But I think we should keep the kref_put() and release callback be protected by list_lock, and only free
> the ctlr after synchronize_rcu().
>
Sorry, I thought again, maybe we don't need to worry about it.
The ctlr->kref is protected by list_lock, and release_controller() DO free the ctlr, NOT rdac_dh_data,
kfree(rdac_dh_data) is already after synchronize_rcu().
If check_ownership() shared the same ctlr, the kref must be greater than or equal to 2, so the ctlr will
not be released by kref_put(). On the other hand, if the ctlr is different, releasing it will not affect
others.
--
Thanks,
- Ding Hui
Powered by blists - more mailing lists