lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250324084933.15932-1-a.kovaleva@yadro.com>
Date: Mon, 24 Mar 2025 11:49:32 +0300
From: Anastasia Kovaleva <a.kovaleva@...ro.com>
To: <James.Bottomley@...senPartnership.com>, <martin.petersen@...cle.com>,
	<hare@...e.de>, <axboe@...nel.dk>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linux@...ro.com>
Subject: [PATCH 0/1] Fix not fully initialized SCSI commands

We have encountered the following type of logs on initiators:

kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s
kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s

The initiator uses dm-mpath for multipathing, the SCSI mid layer, and
the QLogic FC HBA driver (qla2xxx). After debugging, the following call
stack was identified:

blk_mq_sched_dispatch_requests()
  blk_mq_dispatch_rq_list()
    dm_mq_queue_rq()
      map_request()
        ti->type->clone_and_map_rq()    // New cloned request with tag 405
        blk_insert_cloned_request()
          scsi_queue_rq()
            qla2xxx_mqueuecommand()
              qla2xxx_dif_start_scsi_mq()

If qla2xxx_dif_start_scsi_mq() returns an error for any reason (e.g.,
due to extremely heavy traffic causing the driver to exhaust its
handles), scsi_done() -> scsi_end_request() is not called within
qla2xxx_mqueuecommand(). As a result, the SCMD_INITIALIZED flag
remains set. Next, map_request() releases the cloned request and
requeues the original request. While the cloned request is released, the
associated SCSI command retains stale data from the previous command.

If all I/O traffic stops for some extended period of time, and later
resumes, the following scenario may occur:

blk_mq_sched_dispatch_requests()
  blk_mq_dispatch_rq_list()
    dm_mq_queue_rq()
      map_request()
        ti->type->clone_and_map_rq()    // New cloned request uses tag 405 again
        blk_insert_cloned_request()
          scsi_queue_rq()


Within scsi_queue_rq(), the scsi_init_command() function does not call
scsi_initialize_rq() because the SCMD_INITIALIZED flag is already set.
Because of that, when the command completes in scsi_complete(), the
scsi_cmd_runtime_exceeded() check returns true, causing the command to
fail.

This issue appears after the commit 4abafdc4360d ("block: remove the
initialize_rq_fn blk_mq_ops method"). Before this change, the
initialize_rq_fn method forcibly initialized the SCSI command in
blk_get_request(). There may be other places where a command is queued
in scsi_queue_rq() but scsi_done() is not called.

Anastasia Kovaleva (1):
  scsi: uninit not completed scsi cmd

 drivers/scsi/scsi_lib.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--
2.40.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ