lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49fwxtqsh8.fsf@segfault.boston.devel.redhat.com>
Date:	Wed, 01 Sep 2010 13:33:07 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Doug Neal <dneallkml@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: I/O scheduler deadlocks on Xen virtual block devices

Doug Neal <dneallkml@...il.com> writes:

> Hello all,
>
> I believe I have found a bug in the I/O scheduler which results in
> access to a block device being blocked indefinitely.
>
> The setup:
>  * Xen dom0 version 3.4.2.
>  * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
>  * Paravirtual disks and network interfaces.
>  * Root filesystem on /dev/xvdb1
>  * A scratch filesystem for the purposes of my tests on /dev/xvda
> mounted on /mnt
>  * Both filesystems are ext3, formatted and mounted with defaults
>  * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
>
>
> Activities leading up to the incident:
>
> To reproduce the bug, I run the VM on a Xen host which has a moderate
> workload of other VMs. (It seems to manifest itself more readily than
> if the host is otherwise idle).
>
> I repeatedly rsync the contents of a Linux install CD to an arbitrary
> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
> lot, and rsync again. It can sometimes take a few iterations before
> the bug is triggered. Sometimes it's triggered on the rsync, sometimes
> on the rm.
>
> At some point during either the rsync or the rm, all the processes
> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
> state. After 120 seconds the warnings start to appear in dmesg:
>
>   [  840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
>   [  840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
>
> Full dmesg output is below, including call traces which show that each
> of these processes is stuck in io_schedule or sync_buffer called from
> io_schedule.
>
>
> I believe I have eliminated:
>  * Problems with the underlying physical device
>    - The same bug has manifested itself on two completely separate
> sets of hardware, with different servers, switches, and SANs.
>
>  * Problems with the host's iSCSI initiator
>    - Other VMs depending on the same iSCSI session are unaffected
>    - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
> on /) are unaffected
>
> Things I've tried:
>  * Noop, deadline, cfq and anticipatory elevators.

Did you try these different I/O schedulers in the domU or on the dom0?
Does switching I/O schedulers in either place make the problem go away
when it happens?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ