linux-kernel - Re: Data corruption on software RAID

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0804081308080.27869@artax.karlin.mff.cuni.cz>
Date:	Tue, 8 Apr 2008 13:14:05 +0200 (CEST)
From:	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
To:	Helge Hafting <helge.hafting@...el.hist.no>
cc:	linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
	device-mapper development <dm-devel@...hat.com>,
	agk@...hat.com, mingo@...hat.com, neilb@...e.de
Subject: Re: Data corruption on software RAID

> > But with RAID (since 2.6.13), it can produce corruption because when the
> > buffer is modified while being written, different versions of data can be
> > written to devices in the RAID array. For example:
> >
> > 1. pdflush turns off a dirty bit on Ext2 bitmap buffer and starts writing
> > the buffer to RAID-1
> > 2. the kernel allocates some blocks in that Ext2 bitmap. One of RAID-1
> > devices writes new data, the other one gets old data.
> > 3. The kernel turns on the buffer dirty bit, so this buffer is scheduled for
> > next write.
> > 4. RAID-1 subsystem sees that both writes finished, it thinks that this
> > region is in-sync, turns off its dirty bit in its region bitmap and writes
> > the bitmap to disk.
> >   
> Would this help:
> RAID-1 sees that both writes finished. It checks the dirty bits on all
> relevant buffers/pages. If none got re-dirtied, then it is ok to
> turn off the dirty bit in the region bitmap and write that. Otherwise, it is
> not!
> 
> Or is such a check too time-consuming?

That is impossible. The page cache can answer questions like "where is 
page 0x1234 from inode 0x5678 located on disk?" But it can't answer the 
reverse question: "which inode and which page is using disk block 
0x12345678?"

Furthermore, with device mapper you can stack several mapping tables each 
on other --- and again --- device mapper can't solve the reverse problem 
it can't tell you which filesystem is using block X.

Mikulas

> Helge Hafting
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/