lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Jun 2024 21:29:00 +0800
From: Yihang Li <liyihang9@...wei.com>
To: <dlemoal@...nel.org>
CC: <cassel@...nel.org>, <James.Bottomley@...senPartnership.com>,
	<martin.petersen@...cle.com>, <john.g.garry@...cle.com>,
	<yanaijie@...wei.com>, <linux-kernel@...r.kernel.org>,
	<linux-scsi@...r.kernel.org>, <linuxarm@...wei.com>, <liyihang9@...wei.com>,
	<chenxiang66@...ilicon.com>, <prime.zeng@...wei.com>
Subject: [bug report] scsi: SATA devices missing after FLR is triggered during HBA suspended

Hi Damien,

I found out that two issues is caused by commit 0c76106cb975 ("scsi: sd:
Fix TCG OPAL unlock on system resume") and 626b13f015e0 ("scsi: Do not
rescan devices with a suspended queue").

The two issues as follows for the situation that there are ATA disks
connected with SAS controller:
(1) FLR is triggered after all disks and controller are suspended. As a
result, the number of disks is abnormal.
(2) After all disks and controller are suspended, and resuming all disks
again, the driver reference counting is not 0 (The value of "Used" in the
lsmod command output is not 0).

For the issue 1, After all disks and controller are suspended, FLR command
will resuming the controller and all sas ports. libsas layer will call
ata_sas_port_resume() to resume ata port and schedule EH to recover it.
In libata standard error handler ata_std_error_handler(), it will call ata
reset function, revalidate ATA devices and issue ATA device command
ATA_CMD_READ_NATIVE_MAX_EXT to read native max address. This command will
failed due to the controller enter suspend state again and libata disable
the device finally. The controller enter suspend state again because FLR
command completes and the runtime PM usage counter is 0.

In commit 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
and 626b13f015e0 ("scsi: Do not rescan devices with a suspended queue"),
use blk_queue_pm_only() to check the device request queue state, if the
device request queue is not running, the device will not be rescanned.
Therefore, the runtime PM usage counter of the controller will not
increase so that the controller enters the suspended state again.

For the issue 2, the cause is unknown.

How to solve these two issues?

regards,
Yihang


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ