lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d477d789-3e73-9d00-1daf-ff0ed6f18e6c@easystack.cn>
Date:   Tue, 18 Jul 2023 14:51:08 +0800
From:   Jirong Feng <jirong.feng@...ystack.cn>
To:     nab@...ux-iscsi.org
Cc:     linux-scsi@...r.kernel.org, target-devel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Close connection aborting an out-of-order cmd will hang

Hi,

I recently encountered a hanging issue as follow:
[root@...e-6 ~]# ps -aux | grep ' D '
root      8648  0.4  0.0      0     0 ?        D    Jul12  21:04 [iscsi_np]
root     17572  0.0  0.0      0     0 ?        D    Jul12   0:09 
[kworker/7:3+events]
root     56555  0.0  0.0 216576  1536 pts/1    S+   14:57   0:00 grep 
--color=auto  D
root     59853  0.0  0.0      0     0 ?        D    Jul12   0:04 [iscsi_trx]

the call stack:
kworker:
PID: 17572  TASK: ffff862470df0e00  CPU: 7   COMMAND: "kworker/7:3"
  #0 [ffff0000528afab0] __switch_to at ffff4a49c69e74b8
  #1 [ffff0000528afad0] __schedule at ffff4a49c72b60f4
  #2 [ffff0000528afb60] schedule at ffff4a49c72b6754
  #3 [ffff0000528afb70] schedule_timeout at ffff4a49c72ba980
  #4 [ffff0000528afc30] wait_for_common at ffff4a49c72b7504
  #5 [ffff0000528afcb0] wait_for_completion at ffff4a49c72b7594
  #6 [ffff0000528afcd0] target_put_cmd_and_wait at ffff4a49a3dad38c 
[target_core_mod]
  #7 [ffff0000528afd30] core_tmr_abort_task at ffff4a49a3da55c8 
[target_core_mod]
  #8 [ffff0000528afd80] target_tmr_work at ffff4a49a3daa1c8 
[target_core_mod]
  #9 [ffff0000528afdb0] process_one_work at ffff4a49c6a603c0
#10 [ffff0000528afe00] worker_thread at ffff4a49c6a60640
#11 [ffff0000528afe60] kthread at ffff4a49c6a67474

iscsi_trx:
PID: 59853  TASK: ffff8624fe0b5200  CPU: 7   COMMAND: "iscsi_trx"
  #0 [ffff000095f6fa50] __switch_to at ffff4a49c69e74b8
  #1 [ffff000095f6fa70] __schedule at ffff4a49c72b60f4
  #2 [ffff000095f6fb00] schedule at ffff4a49c72b6754
  #3 [ffff000095f6fb10] schedule_timeout at ffff4a49c72ba870
  #4 [ffff000095f6fbd0] wait_for_common at ffff4a49c72b7504
  #5 [ffff000095f6fc50] wait_for_completion_timeout at ffff4a49c72b75d0
  #6 [ffff000095f6fc70] __transport_wait_for_tasks at ffff4a49a3da9c28 
[target_core_mod]
  #7 [ffff000095f6fcb0] transport_generic_free_cmd at ffff4a49a3da9dd0 
[target_core_mod]
  #8 [ffff000095f6fd20] iscsit_free_cmd at ffff4a49a3fc4464 
[iscsi_target_mod]
  #9 [ffff000095f6fd50] iscsit_close_connection at ffff4a49a3fccf48 
[iscsi_target_mod]
#10 [ffff000095f6fdf0] iscsit_take_action_for_connection_exit at 
ffff4a49a3fb7614 [iscsi_target_mod]
#11 [ffff000095f6fe20] iscsi_target_rx_thread at ffff4a49a3fcc064 
[iscsi_target_mod]
#12 [ffff000095f6fe60] kthread at ffff4a49c6a67474

inspect the aborting cmd in kworker:
crash> struct iscsi_cmd FFFFA62592F4B400
struct iscsi_cmd {
   dataout_timer_flags = (unknown: 0),
   dataout_timeout_retries = 0 '\000',
   error_recovery_count = 0 '\000',
   deferred_i_state = ISTATE_NEW_CMD,
   i_state = ISTATE_DEFERRED_CMD,
   immediate_cmd = 0 '\000',
   immediate_data = 0 '\000',
   iscsi_opcode = 1 '\001',
   iscsi_response = 0 '\000',
   logout_reason = 0 '\000',
   logout_response = 0 '\000',
   maxcmdsn_inc = 0 '\000',
   unsolicited_data = 0 '\000',
   reject_reason = 0 '\000',
   logout_cid = 0,
   cmd_flags = ICF_OOO_CMDSN,
   init_task_tag = 2415919152,
   targ_xfer_tag = 205,
   cmd_sn = 2860352639,
   exp_stat_sn = 2502541166,
   stat_sn = 0,
   data_sn = 0,
...

so this is an out-of-order cmd. In my conclusion, trx is waiting for 
kworker to abort the cmd,  while kworker is waiting for someone to 
complete the cmd, and that is never going to happen, hence the hanging.

Could someone please help me to confirm the case?

Regards,
Jirong Feng


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ