linux-kernel - RE: Hyper-V stalls on device errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <8024ef25bbed4216a0ce96ff4318610a@SN2PR03MB061.namprd03.prod.outlook.com>
Date:	Tue, 30 Apr 2013 16:17:51 +0000
From:	KY Srinivasan <kys@...rosoft.com>
To:	Sitsofe Wheeler <sitsofe@...oo.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>
CC:	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"James E.J. Bottomley" <JBottomley@...allels.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: Hyper-V stalls on device errors

Thanks Sitsofe; we will look into this.

Regards,

K. Y

> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@...oo.com]
> Sent: Tuesday, April 30, 2013 12:12 PM
> To: KY Srinivasan; Haiyang Zhang
> Cc: devel@...uxdriverproject.org; James E.J. Bottomley; linux-
> kernel@...r.kernel.org
> Subject: Re: Hyper-V stalls on device errors
> 
> Apologies for the previous empty mail.
> 
> While testing a Windows 2012 host with a Fedora 18 guest running a 3.9
> kernel I've found that Hyper-v will stall all access to
> (para)virtualised disk devices when an underlying disk device returns an
> error. Every ten seconds a tiny bit of I/O goes through before being
> stalled again and it plays havoc with asynchronous I/O to disk devices
> too.
> 
> To produce this I created a device mapper device with a single error in
> it by using
> 
> dd if=/dev/zero of=/tmp/fakeblock0 bs=100M count=1
> losetup --find --show /tmp/fakeblock0
> # Assuming losetup uses /dev/loop0
> cat << EOF | dmsetup create oneerror
> 0 13443 linear /dev/loop0 0
> 13443 1 error
> 13444 191356 linear /dev/loop0 0
> EOF
> 
> After installing scsi-target-utils the /dev/mapper/oneerror device was
> then turned into a iSCSI target by adding
> 
> <target iqn.2013-04.com.stormagic:oneerror>
>      backing-store /dev/mapper/oneerror
>      write-cache off
> </target>
> 
> to /etc/tgt/targets.conf . The iSCSI target service was started with
> systemctl start tgtd.service (watch out for
> https://bugzilla.redhat.com/show_bug.cgi?id=848942 and you may need to
> disable the firewall by using systemctl stop firewalld.service ).
> 
> The Windows 2012 iSCSI initiator was used to add the target to the
> machine with the hypervisor (the usual discovery should work to the
> Linux box serving the SCSI target). Once done, this disk was then added
> to the Linux guest's Hyper-V settings via the SCSI controller. A spare
> IDE controller disk was also added.
> 
> In the Linux guest a badblock run was started on the spare IDE disk
> block device so that I/O was visible. A
> dd if=/dev/zero of=/dev/sdc oflag=direct
> (where /dev/sdc is the erroring block device that was added earlier) was
> then done to trigger the access of the bad sector.
> 
> The following appeared in dmesg:
> 
> [  160.718836] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  170.991312] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  181.039597] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  191.081242] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  201.116790] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  211.127741] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [  221.140338] sd 3:0:0:2: [sdc] Unhandled error code
> [  221.140346] sd 3:0:0:2: [sdc]
> [  221.140349] Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> [  221.140352] sd 3:0:0:2: [sdc] CDB:
> [  221.140354] Write(10): 2a 00 00 00 34 00 00 01 00 00
> [  221.140366] end_request: critical target error, dev sdc, sector 13312
> 
> A Fedora 18 guest on VMWare ESXi returned the error in under a second
> and only had the following in dmesg:
> 
> [  293.917383] sd 2:0:1:0: [sdb] Unhandled sense code
> [  293.917391] sd 2:0:1:0: [sdb]
> [  293.917394] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [  293.917408] sd 2:0:1:0: [sdb]
> [  293.917414] Sense Key : Medium Error [current]
> [  293.917418] sd 2:0:1:0: [sdb]
> [  293.917421] Add. Sense: Unrecovered read error
> [  293.917424] sd 2:0:1:0: [sdb] CDB:
> [  293.917428] Write(10): 2a 00 00 00 34 00 00 04 00 00
> [  293.917436] end_request: critical target error, dev sdb, sector 13312
> 
> The stalls do not occur when the bad block device is created directly in
> the Linux guest.  From the previous log messages it looks like Hyper-V
> is trying for up to a minute before returning an error and the I/O
> stalls to separate (but virtualised) devices on different buses looks
> like an unintended side effect...
> 
> --
> Sitsofe | http://sucs.org/~sits/
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/