linux-kernel - Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87y4zwt7mu.fsf@blackfin.pond.sub.org>
Date:	Thu, 27 Mar 2014 08:36:57 +0100
From:	Markus Armbruster <armbru@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	Alejandro Comisario <alejandro.comisario@...cadolibre.com>,
	kvm@...r.kernel.org, ghammer@...hat.com,
	Stefan Hajnoczi <stefanha@...il.com>,
	Jason Wang <jasowang@...hat.com>, linux-kernel@...r.kernel.org,
	qemu-devel@...gnu.org
Subject: Re: [Qemu-devel] Massive read only kvm guests when backing file was missing

"Michael S. Tsirkin" <mst@...hat.com> writes:

> On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
>> Hi List!
>> Hope some one can help me, we had a big issue in our cloud the other
>> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
>> went read only filesystem from the guest side because the backing
>> files directory (the openstack _base directory) was compromised and
>> the data was lost, when we realized the data was lost, it took us 5
>> mins to restore the backup of the backing files, but by that time all
>> the kvm guests received some kind of IO error from the hypervisor
>> layer, and went read only on root filesystem.
>> 
>> My question would be, is there a way to hold the IO operations against
>> the backing files ( i thought that would be 99% READ operations ) for
>> a little longer ( im asking this because i dont quite understand what
>> is the process and when it raises the error ) in a case the backing
>> files are missing (no IO possible) but is recoverable within minutes ?
>> 
>> Any tip  on how to achieve this if possible, or information about how
>> backing files works on kvm, will be amazing.
>> Waiting for feedback!
>> 
>> kindest regards.
>> Alejandro Comisario
>
>
> I'm guessing this is what happened: guests timed out meanwhile.
> You can increase the timeout within the guest:
> echo 600 > /sys/block/sda/device/timeout
> to timeout after 10 minutes.
>
> If you have installed qemu guest agent on your system, you can do this
> from the host. Unfortunately by default it's memory can be pushed out to swap
> and then on disk error access there might will fail :(
> Maybe we should consider mlock on all its memory at least as an option.
>
> You could pause your guests, restart them after the issue is resolved,
> and we could I guess add functionality to pause VM on disk errors
> automatically.
> Stefan?

Would -drive rerror=stop do?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/