linux-kernel - Re: 2.6.27, ext4 and bad USB disks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Thu, 15 Jan 2009 08:57:02 -0500
From:	Theodore Tso <tytso@....edu>
To:	Alex Buell <alex.buell@...ted.org.uk>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.27, ext4 and bad USB disks

On Thu, Jan 15, 2009 at 11:06:44AM +0000, Alex Buell wrote:
> I've got a couple of bad disks here which I just tested with ext4 over
> USB 2.0. Bad disk errors doesn't appear to be handled gracefully at
> all - I had this in the logs:
>  
> Jan 15 10:31:47 lithium end_request: I/O error, dev sda, sector 19626288
> Jan 15 10:31:47 lithium Buffer I/O error on device sda1, logical block 2453282

Warnings in fs/buffer.c

> Jan 15 10:33:59 lithium INFO: task rsync:31719 blocked for more than 120 seconds.
> Jan 15 10:33:59 lithium "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 15 10:33:59 lithium rsync         D f7b9cbb0     0 31719  31718

Softlockup warning....

> Jan 15 10:55:55 lithium JBD2: I/O error detected when updating journal superblock for sda1:8.
> Jan 15 10:55:55 lithium usb 1-3.3: USB disconnect, address 6
> Jan 15 10:55:55 lithium ext4_abort called.
> Jan 15 10:55:55 lithium EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
> Jan 15 10:55:55 lithium Remounting filesystem read-only

This is when things went badly enough that we remounted the filesystem
read-only.  An interesting question is whether we could have given up
much earlier.  We are reflecting the I/O errors back up to userspace,
but if we have some way of querying the block layer that the device is
*gone*, or the block layer calls some callback function that the
device is *gone*, maybe we would be better off invalidating all of the
file descriptors and then force-unmounting the filesystem right away.
It would avoid a lot of the noise in the log.

> Jan 15 10:55:55 lithium EXT4-fs error (device sda1) in ext4_da_writepages: IO failure
> Jan 15 10:55:55 lithium ext4_da_writepages: jbd2_start: 63307 pages, ino 86552; err -30
> Jan 15 10:55:55 lithium Pid: 2126, comm: sync Tainted: P          2.6.27-gentoo-r7 #1
> Jan 15 10:55:55 lithium [<c01cfddd>] ext4_da_writepages+0x118/0x2c7
> Jan 15 10:55:55 lithium [<c011675c>] __wake_up+0x29/0x39
> Jan 15 10:55:55 lithium [<c01cfddd>] ext4_da_writepages+0x118/0x2c7

This error we've already toned down in commit 2a21e37e (merged for
2.6.29).  The problem with the log noise is that it tends to obscure
the original root cause of the filesystem getting remounted read-only.
Furthermore, the stack trace really wasn't useful.  It's not a
*critical* bug fix, per se, but it would make it a lot easier to debug
problem reports from users who are trying out ext4 with 2.6.27 and
2.6.28, so I'll try to get the -stable kernel maintainers to accept
it.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/