lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cc52e18a-41d2-cff1-a86c-de114d8a140e@hisilicon.com>
Date:   Thu, 9 Mar 2023 09:01:47 +0800
From:   "chenxiang (M)" <chenxiang66@...ilicon.com>
To:     yangxingui <yangxingui@...wei.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        <jejb@...ux.ibm.com>, <linux-scsi@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, Linuxarm <linuxarm@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Kangfenglong <kangfenglong@...wei.com>,
        John Garry <john.g.garry@...cle.com>
Subject: Re: [bug report] scsi: libsas: Fix hung when disable phys

Hi,


在 2023/2/27 21:17, yangxingui 写道:
>
> Hi, All
>
> If disabling remote PHY just after disabling all local PHYs in expander
> envirnment,as follows:
> echo 0 > /sys/class/sas_phy/phy-4\:0/enable
> echo 0 > /sys/class/sas_phy/phy-4\:1/enable
> echo 0 > /sys/class/sas_phy/phy-4\:2/enable
> echo 0 > /sys/class/sas_phy/phy-4\:3/enable
> echo 0 > /sys/class/sas_phy/phy-4\:4/enable
> echo 0 > /sys/class/sas_phy/phy-4\:5/enable
> echo 0 > /sys/class/sas_phy/phy-4\:6/enable
> echo 0 > /sys/class/sas_phy/phy-4\:7/enable
> echo 0 > /sys/class/sas_phy/phy-4:0:7/enable
>
> a hung as follows occurs.
>
> [  245.564088] INFO: task kworker/u256:1:883 blocked for more than 120 
> seconds.
> [  245.571115]       Tainted: G           O      5.16.0-rc4+ #1
> [  245.576759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [  245.584557] task:kworker/u256:1  state:D stack:    0 pid:  883 
> ppid:    2 flags:0x00000008
> [  245.592878] Workqueue: 0000:74:02.0_event_q sas_phy_event_worker 
> [libsas]
> [  245.599652] Call trace:
> [  245.602092]  __switch_to+0xd8/0x114
> [  245.605574]  __schedule+0x2f0/0x85c
> [  245.609054]  schedule+0x60/0x100
> [  245.612273]  __kernfs_remove.part.0+0x288/0x2e0
> [  245.616791]  kernfs_remove_by_name_ns+0x70/0xc0
> [  245.621307]  sysfs_remove_file_ns+0x24/0x30
> [  245.625477]  device_remove_file+0x24/0x34
> [  245.629475]  attribute_container_remove_attrs+0x50/0x8c
> [  245.634684]  attribute_container_class_device_del+0x24/0x3c
> [  245.640237]  transport_remove_classdev+0x64/0x80
> [  245.644839]  attribute_container_device_trigger+0x11c/0x124
> [  245.650393]  transport_remove_device+0x24/0x30
> [  245.654823]  sas_phy_delete+0x34/0x60
> [  245.658475]  do_sas_phy_delete+0x60/0x70
> [  245.662385]  device_for_each_child+0x68/0xb0
> [  245.666640]  sas_remove_children+0x44/0x54
> [  245.670723]  sas_destruct_devices+0x5c/0xa0 [libsas]
> [  245.675676]  sas_deform_port+0x178/0x1bc [libsas]
> [  245.680371]  sas_phye_loss_of_signal+0x28/0x34 [libsas]
> [  245.685583]  sas_phy_event_worker+0x3c/0x60 [libsas]
> [  245.690536]  process_one_work+0x1e0/0x46c
> [  245.694534]  worker_thread+0x15c/0x464
> [  245.698272]  kthread+0x188/0x194
> [  245.701491]  ret_from_fork+0x10/0x20
> [  245.705120] INFO: task bash:25579 blocked for more than 120 seconds.
> [  245.711450]       Tainted: G           O      5.16.0-rc4+ #1
> [  245.717087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [  245.724883] task:bash            state:D stack:    0 pid:25579 
> ppid: 25113 flags:0x00000200
> [  245.733202] Call trace:
> [  245.735639]  __switch_to+0xd8/0x114
> [  245.739117]  __schedule+0x2f0/0x85c
> [  245.742595]  schedule+0x60/0x100
> [  245.745814]  schedule_timeout+0x180/0x1bc
> [  245.749811]  wait_for_completion+0x8c/0x100
> [  245.753984]  flush_workqueue+0x108/0x3d4
> [  245.757896]  drain_workqueue+0xc8/0x16c
> [  245.761722]  __sas_drain_work+0x54/0x90 [libsas]
> [  245.766328]  sas_drain_work+0x68/0x70 [libsas]
> [  245.770760]  queue_phy_enable+0x9c/0xec [libsas]
> [  245.775368]  store_sas_phy_enable+0xf0/0x10c
> [  245.779624]  dev_attr_store+0x24/0x40
> [  245.783275]  sysfs_kf_write+0x50/0x60
> [  245.786930]  kernfs_fop_write_iter+0x124/0x1b4
> [  245.791361]  new_sync_write+0xf0/0x190
> [  245.795098]  vfs_write+0x23c/0x2a0
> [  245.798490]  ksys_write+0x78/0x104
> [  245.801882]  __arm64_sys_write+0x28/0x3c
> [  245.805794]  invoke_syscall.constprop.0+0x58/0xf0
> [  245.810483]  do_el0_svc+0x19c/0x1b0
> [  245.813962]  el0_svc+0x28/0xec
> [  245.817009]  el0t_64_sync_handler+0x1a8/0x1ac
> [  245.821351]  el0t_64_sync+0x1a0/0x1a4
>
> We find that when all local PHYs are disabled, all the devices will be
> removed in work PHY_LOSS_OF_SIGNAL which will try to wait the kn->active
> of the device to be deactivated (in function kernfs_drain),but
> kn->active may be still activated as we use sysfs interface to disable
> remote PHYs at the same time, meanwhile it will drain libsas work
> including work PHY_LOSS_OF_SIGNAL in the sysfs interface, so hung
> occurs.
>
> How to fix the problem in this scenario?

It seems be a common issue in libsas layer.
What about directly calling callback function of  phy_enable_work and 
phy_reset_work in function
queue_phy_enable/queue_phy_reset instead of (queue those works + 
sas_drain_work)?


>
> regards,
>
> Xingui
>
> .
>
> .
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ