lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 27 Feb 2023 21:17:49 +0800
From:   yangxingui <yangxingui@...wei.com>
To:     "Martin K. Petersen" <martin.petersen@...cle.com>,
        <jejb@...ux.ibm.com>, <linux-scsi@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, Linuxarm <linuxarm@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Kangfenglong <kangfenglong@...wei.com>,
        John Garry <john.g.garry@...cle.com>
Subject: [bug report] scsi: libsas: Fix hung when disable phys


Hi, All

If disabling remote PHY just after disabling all local PHYs in expander
envirnment,as follows:
echo 0 > /sys/class/sas_phy/phy-4\:0/enable
echo 0 > /sys/class/sas_phy/phy-4\:1/enable
echo 0 > /sys/class/sas_phy/phy-4\:2/enable
echo 0 > /sys/class/sas_phy/phy-4\:3/enable
echo 0 > /sys/class/sas_phy/phy-4\:4/enable
echo 0 > /sys/class/sas_phy/phy-4\:5/enable
echo 0 > /sys/class/sas_phy/phy-4\:6/enable
echo 0 > /sys/class/sas_phy/phy-4\:7/enable
echo 0 > /sys/class/sas_phy/phy-4:0:7/enable

a hung as follows occurs.

[  245.564088] INFO: task kworker/u256:1:883 blocked for more than 120 
seconds.
[  245.571115]       Tainted: G           O      5.16.0-rc4+ #1
[  245.576759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  245.584557] task:kworker/u256:1  state:D stack:    0 pid:  883 ppid: 
    2 flags:0x00000008
[  245.592878] Workqueue: 0000:74:02.0_event_q sas_phy_event_worker [libsas]
[  245.599652] Call trace:
[  245.602092]  __switch_to+0xd8/0x114
[  245.605574]  __schedule+0x2f0/0x85c
[  245.609054]  schedule+0x60/0x100
[  245.612273]  __kernfs_remove.part.0+0x288/0x2e0
[  245.616791]  kernfs_remove_by_name_ns+0x70/0xc0
[  245.621307]  sysfs_remove_file_ns+0x24/0x30
[  245.625477]  device_remove_file+0x24/0x34
[  245.629475]  attribute_container_remove_attrs+0x50/0x8c
[  245.634684]  attribute_container_class_device_del+0x24/0x3c
[  245.640237]  transport_remove_classdev+0x64/0x80
[  245.644839]  attribute_container_device_trigger+0x11c/0x124
[  245.650393]  transport_remove_device+0x24/0x30
[  245.654823]  sas_phy_delete+0x34/0x60
[  245.658475]  do_sas_phy_delete+0x60/0x70
[  245.662385]  device_for_each_child+0x68/0xb0
[  245.666640]  sas_remove_children+0x44/0x54
[  245.670723]  sas_destruct_devices+0x5c/0xa0 [libsas]
[  245.675676]  sas_deform_port+0x178/0x1bc [libsas]
[  245.680371]  sas_phye_loss_of_signal+0x28/0x34 [libsas]
[  245.685583]  sas_phy_event_worker+0x3c/0x60 [libsas]
[  245.690536]  process_one_work+0x1e0/0x46c
[  245.694534]  worker_thread+0x15c/0x464
[  245.698272]  kthread+0x188/0x194
[  245.701491]  ret_from_fork+0x10/0x20
[  245.705120] INFO: task bash:25579 blocked for more than 120 seconds.
[  245.711450]       Tainted: G           O      5.16.0-rc4+ #1
[  245.717087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  245.724883] task:bash            state:D stack:    0 pid:25579 ppid: 
25113 flags:0x00000200
[  245.733202] Call trace:
[  245.735639]  __switch_to+0xd8/0x114
[  245.739117]  __schedule+0x2f0/0x85c
[  245.742595]  schedule+0x60/0x100
[  245.745814]  schedule_timeout+0x180/0x1bc
[  245.749811]  wait_for_completion+0x8c/0x100
[  245.753984]  flush_workqueue+0x108/0x3d4
[  245.757896]  drain_workqueue+0xc8/0x16c
[  245.761722]  __sas_drain_work+0x54/0x90 [libsas]
[  245.766328]  sas_drain_work+0x68/0x70 [libsas]
[  245.770760]  queue_phy_enable+0x9c/0xec [libsas]
[  245.775368]  store_sas_phy_enable+0xf0/0x10c
[  245.779624]  dev_attr_store+0x24/0x40
[  245.783275]  sysfs_kf_write+0x50/0x60
[  245.786930]  kernfs_fop_write_iter+0x124/0x1b4
[  245.791361]  new_sync_write+0xf0/0x190
[  245.795098]  vfs_write+0x23c/0x2a0
[  245.798490]  ksys_write+0x78/0x104
[  245.801882]  __arm64_sys_write+0x28/0x3c
[  245.805794]  invoke_syscall.constprop.0+0x58/0xf0
[  245.810483]  do_el0_svc+0x19c/0x1b0
[  245.813962]  el0_svc+0x28/0xec
[  245.817009]  el0t_64_sync_handler+0x1a8/0x1ac
[  245.821351]  el0t_64_sync+0x1a0/0x1a4

We find that when all local PHYs are disabled, all the devices will be
removed in work PHY_LOSS_OF_SIGNAL which will try to wait the kn->active
of the device to be deactivated (in function kernfs_drain),but
kn->active may be still activated as we use sysfs interface to disable
remote PHYs at the same time, meanwhile it will drain libsas work
including work PHY_LOSS_OF_SIGNAL in the sysfs interface, so hung
occurs.

How to fix the problem in this scenario?

regards,

Xingui

.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ