lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Sep 2015 09:49:21 +0000
Subject: [Bug 102731] I have a cough.

--- Comment #14 from John Hughes <> ---
On 28/09/15 19:06, wrote:
> --- Comment #13 from Theodore Tso <> ---
> So it's been 12 days, and previously when you were using the Debian 3.16
> kernel, it was triggering once every four days, right?  Can I assume that your
> silence indicates that you haven't seen a problem to date?

I haven't seen the problem, but unfortunately I'm running 3.18.19 at the 
moment (I screwed up on the last boot and let it boot the default 
kernel).  I haven't had time to reboot.  So I'd like to give it a bit 
more time.
> If so, then it really does seen that it might be an interaction between LVM/MD
> and KVM.
> So if that's the case, then the next thing to ask is to try to figure out what
> might be the triggering cause.   A couple of things come to mind:
> 1) Some failure to properly handle a flush cache command being sent to the MD
> device.  This combined to either a power failure or a crash of the guest OS
> (depending on how KVM is configured), might explain a block update getting
> lost.   The fact that the block bitmap is out of sync with the block group
> descriptor is consistent with this failure.  However, if you were seeing
> failures once every four days, that would imply that the guest OS and/or host
> OS would be crashing at that or about that level of frequency, and you haven't
> reported that.

I haven't had any host or guest crashes.

> 2) Some kind a race between a 4k write and a RAID1 resync leading to a block
> write getting lost.  Again, this reported data corruption is consistent with
> this theory --- but this also requires the guest OS crashing due to some kind
> of kernel crash or KVM/qemu shutdown and/or host OS crash / power failure, as
> in (1) above.  If you weren't seeing these failures once every four days or so,
> then this isn't a likely explanation.

No crashes.

> 3)  Some kind of corruption caused by the TRIM command being sent to the
> RAID/MD device, possibly racing with a block bitmap update.  This could be
> caused either by the file system being mounted with the -o discard mount
> option, or by fstrim getting run out of cron, or by e2fsck explicitly being
> asked to discard unused blocks (with the "-E discard" option).

I'm not using "-o discard", or fstrim, I've never used the "-E discard" 
option to fsck.
> 4)  Some kind of bug which happens rarely either in qemu, the host kernel or
> the guest kernel depending on how it communicates with the virtual disk.
> (i.e., virtio, scsi, ide, etc.)   Virtio is the most likely use case, and so
> trying to change to use scsi emulation might be interesting.  (OTOH, if the
> problem is specific to the MD layer, then this possibility is less likely.)
> So as far as #3 is concerned, can you check to see if you had fstrim enabled,
> or are mounting the file system with -o discard?

I'm a bit overwhelmed with work at the moment so I haven't had time to 
read this message with the care it deserves, I'll get back to you with 
more detail next week.

You are receiving this mail because:
You are watching the assignee of the bug.
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to
More majordomo info at

Powered by blists - more mailing lists