lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47FB477E.40502@aitel.hist.no>
Date:	Tue, 08 Apr 2008 12:22:54 +0200
From:	Helge Hafting <helge.hafting@...el.hist.no>
To:	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
CC:	linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
	device-mapper development <dm-devel@...hat.com>,
	agk@...hat.com, mingo@...hat.com, neilb@...e.de
Subject: Re: Data corruption on software RAID

Mikulas Patocka wrote:
> Hi
>
> During source code review, I found an unprobable but possible data 
> corruption on RAID-1 and on DM-RAID-1. (I'm not sure about RAID-4,5,6).
>
> The RAID code was enhanced with bitmaps in 2.6.13.
>
> The bitmap tracks regions on the device that may be possibly out-of-sync. 
> The purpose of the bitmap is to avoid resynchronizing the whole array in 
> the case of crash. DM-raid uses similar bitmap too.
>
> The write sequnce is usually:
> 1. turn on bit in the bitmap (if it hasn't been on before).
> 2. update the data.
> 3. when writes to all devices finish, turn the bit may be turned off.
>
> The developers assume that when all writes to the region finish, the 
> region is in-sync.
>
> This assumption is wrong.
>
> Kernel writes data while they may be modified in many places. For example, 
> the pdflush daemon writes periodically pages and buffers without locking 
> them. Similarly, pages may be written while they are mapped for write to 
> the processes.
>
> Normally, there is no problem with modify-while-write. The write sequence 
> is something like:
> * turn off Dirty bit
> * write the buffer or page
> --- and if the buffer or page is modified while it's being written, the 
> Dirty bit is turned on again and the correct data are written later.
>
> But with RAID (since 2.6.13), it can produce corruption because when the 
> buffer is modified while being written, different versions of data can be 
> written to devices in the RAID array. For example:
>
> 1. pdflush turns off a dirty bit on Ext2 bitmap buffer and starts writing 
> the buffer to RAID-1
> 2. the kernel allocates some blocks in that Ext2 bitmap. One of RAID-1 
> devices writes new data, the other one gets old data.
> 3. The kernel turns on the buffer dirty bit, so this buffer is scheduled 
> for next write.
> 4. RAID-1 subsystem sees that both writes finished, it thinks that this 
> region is in-sync, turns off its dirty bit in its region bitmap and writes 
> the bitmap to disk.
>   
Would this help:
RAID-1 sees that both writes finished. It checks the dirty bits on all
relevant buffers/pages. If none got re-dirtied, then it is ok to
turn off the dirty bit in the region bitmap and write that. Otherwise, 
it is not!

Or is such a check too time-consuming?

Helge Hafting
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ