[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <257dcd6c-2ffd-f518-9b13-c309348088d9@oracle.com>
Date: Mon, 19 Dec 2022 15:55:01 +0000
From: John Garry <john.g.garry@...cle.com>
To: Jason Yan <yanaijie@...wei.com>,
Xingui Yang <yangxingui@...wei.com>, jejb@...ux.ibm.com,
martin.petersen@...cle.com, damien.lemoal@...nsource.wdc.com,
linux-ide@...r.kernel.org, hare@...e.com, hch@....de
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
linuxarm@...wei.com, prime.zeng@...ilicon.com,
kangfenglong@...wei.com
Subject: Re: [PATCH V2] scsi: libsas: Directly kick-off EH when ATA device
fell off
On 19/12/2022 15:28, Jason Yan wrote:
>>> + if (test_bit(SAS_DEV_GONE, &dev->state) && dev_is_sata(dev))
>>> + sas_ata_device_link_abort(dev, false);
>>
>> Firstly, I think that there is a bug in sas_ata_device_link_abort() ->
>> ata_link_abort() code in that the host lock in not grabbed, as the
>> comment in ata_port_abort() mentions. Having said that, libsas had
>> already some dodgy host locking usage - specifically dropping the lock
>> for the queuing path (that's something else to be fixed up ... I think
>
> Taking big locks in queuing path is not a good idea. This will bring
> down performance.
But it is expected that ata_qc_issue() should be called with that the
host lock grabbed (and keep it).
I think that the reason libsas drops the lock is because some LLDD
queuecommand CBs calls task_done() in some error paths. If we kept the
lock held, then we could have a deadlock, for example:
sas_ata_qc_issue (has lock) -> lldd_execute_task() =
pm8001_queue_command() -> task_done() = sas_ata_task_done() -> grab host
lock => deadlock.
Thanks,
John
Powered by blists - more mailing lists