linux-kernel - Re: xfs|loop|raid: attempt to access beyond end of device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071225091347.GA155407@sgi.com>
Date:	Tue, 25 Dec 2007 20:13:47 +1100
From:	David Chinner <dgc@....com>
To:	Janos Haar <djani22@...center.hu>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: xfs|loop|raid: attempt to access beyond end of device

On Sun, Dec 23, 2007 at 08:21:08PM +0100, Janos Haar wrote:
> Hello, list,
> 
> I have a little problem on one of my productive system.
> 
> The system sometimes crashed, like this:
> 
> Dec 23 08:53:05 Albohacen-global kernel: attempt to access beyond end of
> device
> Dec 23 08:53:05 Albohacen-global kernel: loop0: rw=1, want=50552830649176,
> limit=3085523200
> Dec 23 08:53:05 Albohacen-global kernel: Buffer I/O error on device loop0,
> logical block 6319103831146
> Dec 23 08:53:05 Albohacen-global kernel: lost page write due to I/O error on
> loop0

So a long way beyond the end of the device.

[snip soft lockup warnings]

> Dec 23 09:08:19 Albohacen-global kernel: Filesystem "loop0": Access to block
> zero in inode 397821447 start_block: 0 start_off: 0 blkcnt: 0 extent-state:
> 0 lastx: e4

And that's to block zero of the filesystem. Sure signs of a corupted inode
extent btree. We've seen a few of these corruptions on loopback device
reported recently.

You'll need to unmount and repair the filesystem to make this go away,
but it's hard to know what is causing the btree corruption.

> Dec 23 09:08:22 Albohacen-global last message repeated 19 times
> 
> some more info:
> 
> [root@...ohacen-global ~]# uname -a
> Linux Albohacen-global 2.6.21.1 #3 SMP Thu May 3 04:33:36 CEST 2007 x86_64
> x86_64 x86_64 GNU/Linux
> [root@...ohacen-global ~]# cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> [multipath] [faulty]
> md1 : active raid4 sdf2[1] sde2[0] sdd2[5] sdc2[4] sdb2[3] sda2[2]
>       19558720 blocks level 4, 64k chunk, algorithm 0 [6/6] [UUUUUU]
>       bitmap: 8/239 pages [32KB], 8KB chunk
> 
> md2 : active raid4 sdf3[1] sde3[0] sdd3[5] sdc3[4] sdb3[3] sda3[2]
>       1542761600 blocks level 4, 64k chunk, algorithm 0 [6/6] [UUUUUU]
>       bitmap: 0/148 pages [0KB], 1024KB chunk
> 
> md0 : active raid1 sdb1[1] sda1[0]
>       104320 blocks [2/2] [UU]
> 
> unused devices: <none>
> [root@...ohacen-global ~]# losetup /dev/loop0
> /dev/loop0: [0010]:6598 (/dev/md2), encryption blowfish (type 18)

You're using an encrypted block device? What mechanism are you using for
encryption (doesn't appear to be dmcrypt)? Does it handle readahead bio
cancellation correctly? We had similar XFS corruption problems on dmcrypt
between 2.6.14 and ~2.6.20 due to a bug in dmcrypt's failure to handle
aborted readahead I/O correctly....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/