linux-kernel - Re: ext2/3: document conditions when reliable operation is possible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090323104525.GA17969@elf.ucw.cz>
Date:	Mon, 23 Mar 2009 11:45:25 +0100
From:	Pavel Machek <pavel@....cz>
To:	Rob Landley <rob@...dley.net>
Cc:	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	tytso@....edu, rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org
Subject: Re: ext2/3: document conditions when reliable operation is possible

On Mon 2009-03-16 14:26:23, Rob Landley wrote:
> On Monday 16 March 2009 07:28:47 Pavel Machek wrote:
> > Hi!
> > > > +	Fortunately writes failing are very uncommon on traditional
> > > > +	spinning disks, as they have spare sectors they use when write
> > > > +	fails.
> > >
> > > I vaguely recall that the behavior of when a write error _does_ occur is
> > > to remount the filesystem read only?  (Is this VFS or per-fs?)
> >
> > Per-fs.
> 
> Might be nice to note that in the doc.

Ok, can you suggest a patch? I believe remount-ro is already
documented ... somewhere :-).

> > > I'm aware write errors shouldn't happen, and by the time they do it's too
> > > late to gracefully handle them, and all we can do is fail.  So how do we
> > > fail?
> >
> > Well, even remount-ro may be too late, IIRC.
> 
> Care to elaborate?  (When a filesystem is mounted RO, I'm not sure what 
> happens to the pages that have already been dirtied...)

Well, fsync() error reporting does not really work properly, but I
guess it will save you for the remount-ro case. So the data will be in
the journal, but it will be impossible to replay it...

> > > (Writes aren't always cleanly at the start of an erase block, so critical
> > > data _before_ what you touch is endangered too.)
> >
> > Well, flashes do remap, so it is actually "random blocks".
> 
> Fun.

Yes.

> > > > +	otherwise, disks may write garbage during powerfail.
> > > > +	Not sure how common that problem is on generic PC machines.
> > > > +
> > > > +	Note that atomic write is very hard to guarantee for RAID-4/5/6,
> > > > +	because it needs to write both changed data, and parity, to
> > > > +	different disks.
> > >
> > > These days instead of "atomic" it's better to think in terms of
> > > "barriers".
> >
> > This is not about barriers (that should be different topic). Atomic
> > write means that either whole sector is written, or nothing at all is
> > written. Because raid5 needs to update both master data and parity at
> > the same time, I don't think it can guarantee this during powerfail.
> 
> Good point, but I thought that's what journaling was for?

I believe journaling operates on assumption that "either whole sector
is written, or nothing at all is written".

> I'm aware that any flash filesystem _must_ be journaled in order to work 
> sanely, and must be able to view the underlying erase granularity down to the 
> bare metal, through any remapping the hardware's doing.  Possibly what's 
> really needed is a "flash is weird" section, since flash filesystems can't be 
> mounted on arbitrary block devices.

> Although an "-O erase_size=128" option so they _could_ would be nice.  There's 
> "mtdram" which seems to be the only remaining use for ram disks, but why there 
> isn't an "mtdwrap" that works with arbitrary underlying block devices, I have 
> no idea.  (Layering it on top of a loopback device would be most
> useful.)

I don't think that works. Compactflash (etc) cards basically randomly
remap the data, so you can't really run flash filesystem over
compactflash/usb/SD card -- you don't know the details of remapping.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/