lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1482199347-9128-1-git-send-email-mauricfo@linux.vnet.ibm.com>
Date:   Tue, 20 Dec 2016 00:02:27 -0200
From:   Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>
To:     jejb@...ux.vnet.ibm.com, martin.petersen@...cle.com,
        linux-scsi@...r.kernel.org
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        gpiccoli@...ux.vnet.ibm.com
Subject: [PATCH] scsi: do not requeue requests unaligned with device sector size

When a SCSI command (e.g., read operation) is partially completed
with good status and residual bytes (i.e., not all the bytes from
the specified transfer length were transferred) the SCSI midlayer
will update the request/bios with the completed bytes and requeue
the request in order to complete the remainder/pending bytes.

However, when the device sector size is greater than the 512-byte
default/kernel sector size, alignment restrictions and validation
apply (both to the starting logical block address, and the number
of logical blocks to transfer) -- values must be multiples of the
device sector size, otherwise the kernel fails the request in the
preparation stage (e.g., sd_setup_read_write_cmnd() at sd.c file):

  [...] sd 0:0:0:0: [sda] Bad block number requested

Hence, this error message (and the respective failed request) can
be observed on devices with larger sector sizes which may respond
the SCSI command with a SCSI residual size that is unaligned with
the device sector size -- because the requeued request's starting
logical block and number of logical blocks are based on the value
of the remainder/pending bytes.

In order to address this problem, introduce a check for this case
in scsi_io_completion() (before it calls scsi_end_request() which
in turn calls blk_update_request() which is the site that changes
the request's __sector and __data_len fields, which are validated
by sd_setup_read_write_cmnd()).

This check verifies whether there is any residual/remainder bytes
in the (potentially partially) completed requested and calculates
the correctly aligned values for the number of completed bytes to
pass up to scsi_end_request()/blk_update_request() that guarantee
that the requeued request is aligned with the device sector size.

The corner case is when one sector is requested and the response
is partially complete, for which the remainder/pending bytes are
unaligned and no further request would be valid.  On such a case,
the original request is retried after a delay (in case the error
is hopefully due to a temporary condition in the device), but up
to the retry limit (in case the condition is permanent, e.g. bad
sector in the medium), after which the request is finally failed.

In order to reproduce and verify this problem, the virtio_scsi.c
file can be modified to respond to 3 'special' SCSI commands, on
which partial completions are introduced (described in the patch).

This is the guest's disk test image and libvirt XML snippets for
the 4k-sector disk using the virtio scsi driver:

    # qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 128G

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/test.qcow2'/>
      <blockio logical_block_size='4096' physical_block_size='4096'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
    </controller>

And the verification in the guest:

    # cat /sys/block/sda/queue/physical_block_size
    4096

    # cat /sys/block/sda/queue/hw_sector_size
    4096

This is the patch to virtio_scsi.c (lines prefixed with ' ___ '):

 ___ --- a/drivers/scsi/virtio_scsi.c
 ___ +++ b/drivers/scsi/virtio_scsi.c
 ___ @@ -153,11 +153,45 @@
 ___  	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
 ___  	struct virtio_scsi_target_state *tgt =
 ___  				scsi_target(sc->device)->hostdata;
 ___ +	static int debug_failures = 0;
 ___
 ___  	dev_dbg(&sc->device->sdev_gendev,
 ___  		"cmd %p response %u status %#02x sense_len %u\n",
 ___  		sc, resp->response, resp->status, resp->sense_len);
 ___
 ___ +	// DEBUG: filter this CDB for testing purposes.
 ___ +	// CDB: Read(10) 28 00 01 02 03 xx 00 00 yy 00
 ___ +	// (xx: LSB of the LBA, and yy: LSB of the LEN)
 ___ +	if ((sc->cmnd[0] == 0x28) && (sc->cmnd[1] == 0x00)
 ___ +	&&  (sc->cmnd[2] == 0x01) && (sc->cmnd[3] == 0x02)
 ___ +	&&  (sc->cmnd[4] == 0x03) && (sc->cmnd[6] == 0x00)
 ___ +	&&  (sc->cmnd[7] == 0x00) && (sc->cmnd[9] == 0x00)) {
 ___ +
 ___ +		// Test 1:
 ___ +		// - LBA: 01 02 03 _04_
 ___ +		// - LEN: two sectors (2 * 4k = 8k)
 ___ +		// - Action: complete 5k out of 8k (3k residual)
 ___ +		if ((sc->cmnd[5] == 0x04) && (sc->cmnd[8] == 0x02))
 ___ +			resp->resid = 6 * 512;
 ___ +
 ___ +		// Test 2:
 ___ +		// - LBA: 01 02 03 _04_
 ___ +		// - LEN: one sector (1 * 4k = 4k)
 ___ +		// - Action: complete 3k out of 4k (1k residual)
 ___ +		//           always.
 ___ +		if ((sc->cmnd[5] == 0x04) && (sc->cmnd[8] == 0x01))
 ___ +			resp->resid = 2 * 512;
 ___ +
 ___ +		// Test 3:
 ___ +		// - LBA: 01 02 03 _08_
 ___ +		// - LEN: one sector (1 * 4k = 4k)
 ___ +		// - Action: complete 3k out of 4k (1k residual)
 ___ +		//           but on every 4th attempt (complete 4k)
 ___ +		if ((sc->cmnd[5] == 0x08) &&  (sc->cmnd[8] == 0x01)
 ___ +		&&  (++debug_failures % 4 != 0))
 ___ +			resp->resid = 2 * 512;
 ___ +	}
 ___ +
 ___  	sc->result = resp->status;
 ___  	virtscsi_compute_resid(sc, virtio32_to_cpu(vscsi->vdev,
 ___ resp->resid));
 ___  	switch (resp->response) {

Without this patch all the 3 tests fail w/ 'Bad block number requested'.

  Test 1)

  # dd if=/dev/sda of=/dev/null bs=8192 count=1 skip=8454530 iflag=direct
  dd: error reading ‘/dev/sda’: Input/output error
  0+0 records in
  0+0 records out
  0 bytes (0 B) copied, 0.00260138 s, 0.0 kB/s

  # dmesg
  [...] sd 0:0:0:0: [sda] Bad block number requested

And if you look with more detail using scsi_logging_level (HLQUEUE = 2
and HLCOMPLETE = 2) the unaligned requeued request is clearly visible
(on a 4k-sector device, block/count must be multiples of 8 512-byte sectors)
then it fails:

    # echo $(( (2<<21) + (2<<24) )) > /proc/sys/dev/scsi/logging_level;

    Test 1)

    # dmesg -c >/dev/null; dd if=/dev/sda of=/dev/null bs=8192 count=1
      skip=8454530 iflag=direct 2>/dev/null; dmesg
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272480, count=16
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 8192 of 8192 bytes
    [...] sd 0:0:0:0: [sda] tag#0 16 sectors total, 5120 bytes done.
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272490, count=6
    [...] sd 0:0:0:0: [sda] Bad block number requested
    [...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.

With the patch, this happily changes into another 4k-sized request:

    # dmesg -c >/dev/null; dd if=/dev/sda of=/dev/null bs=8192 count=1
      skip=8454530 iflag=direct 2>/dev/null; dmesg
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272480, count=16
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 8192 of 8192 bytes
    [...] sd 0:0:0:0: [sda] tag#0 16 sectors total, 5120 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 5120 bytes for alignment
                            (sector size 4096, remainder 1024, resid 3072)
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272488, count=8
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 4096 bytes done.
    [...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.

And verifying the one sector corner case in which all retries are failed
by the driver (when we must finally fail the request up the stack), we
observe the original request plus five retries, then the I/O error occurs:

    Test 2)

    # dmesg -c >/dev/null; dd if=/dev/sda of=/dev/null bs=4096 count=1
      skip=16909060 iflag=direct 2>/dev/null; dmesg
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272480, count=8
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                            (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
    [...] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 01 02 03 04 00 00 01 00
    [...] blk_update_request: I/O error, dev sda, sector 135272480
    [...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.

And now the one sector corner case when eventually one retry succeeds
(the original request plus three retries, of which the last retry passes)
and the request is successfully passed up the stack and finished:

    Test 3)

    # dmesg -c >/dev/null; dd if=/dev/sda of=/dev/null bs=4096 count=1
      skip=16909064 iflag=direct 2>/dev/null; dmesg
    [...] sd 0:0:0:0: [sda] sd_setup_read_write_cmnd: block=135272512, count=8
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                      (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                      (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 3072 bytes done.
    [...] sd 0:0:0:0: [sda] tag#0 checking 3072 bytes for alignment
                      (sector size 4096, remainder 3072, resid 1024)
    [...] sd 0:0:0:0: [sda] tag#0 sd_done: completed 4096 of 4096 bytes
    [...] sd 0:0:0:0: [sda] tag#0 8 sectors total, 4096 bytes done.
    [...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.

Apologies for the ridiculously long commit message with description and
test-cases, but this problem has been relatively difficult to reproduce
and understand, so I thought the documentation/instructions for working
with that could be helpful for others too.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>
---
 drivers/scsi/scsi_lib.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2cca9cffc63f..190ab28cfb2d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -821,6 +821,45 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	}
 
 	/*
+	 * If the SCSI command is successful but has residual bytes,
+	 * align good_bytes to the device sector size to ensure the
+	 * requeued request (to complete the remainder transfer) is
+	 * also aligned and does not fail alignment/size validation.
+	 */
+	if (unlikely(!result && scsi_get_resid(cmd) &&
+		     req->cmd_type == REQ_TYPE_FS)) {
+
+		unsigned int sector_size = cmd->device->sector_size;
+		unsigned int remainder = good_bytes % sector_size;
+		int resid = scsi_get_resid(cmd);
+
+		SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, cmd,
+			"checking %d bytes for alignment "
+			"(sector size %u, remainder %u, resid %d)\n",
+			good_bytes, sector_size, remainder, resid));
+
+		if (remainder) {
+			good_bytes -= remainder;
+			resid += remainder;
+			scsi_set_resid(cmd, resid);
+
+			/*
+			 * If less than one device sector has completed, retry the
+			 * request after delay (up to the retry limit) or timeout.
+			 */
+			if (!good_bytes) {
+				if ((++cmd->retries) <= cmd->allowed
+				&& !scsi_noretry_cmd(cmd)) {
+					goto delayed_retry;
+				} else {
+					set_host_byte(cmd, DID_TIME_OUT);
+					goto error;
+				}
+			}
+		}
+	}
+
+	/*
 	 * special case: failed zero length commands always need to
 	 * drop down into the retry code. Otherwise, if we finished
 	 * all bytes in the request we are done now.
@@ -845,6 +884,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	if (result == 0)
 		goto requeue;
 
+	error:
 	error = __scsi_error_from_host_byte(cmd, result);
 
 	if (host_byte(result) == DID_RESET) {
@@ -985,6 +1025,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		__scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY, 0);
 		break;
 	case ACTION_DELAYED_RETRY:
+	delayed_retry:
 		/* Retry the same command after a delay */
 		__scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, 0);
 		break;
-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ