lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 3 Dec 2008 18:37:45 +0100 (CET)
From:	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
cc:	Pavel Machek <pavel@...e.cz>, Theodore Tso <tytso@....edu>,
	Chris Friesen <cfriesen@...tel.com>,
	kernel list <linux-kernel@...r.kernel.org>, aviro@...hat.com
Subject: Re: writing file to disk: not as easy as it looks

On Wed, 3 Dec 2008, Alan Cox wrote:

> > implemented in disk firmware. Write errors are reported for disk 
> > connection problems, not media problems.
> 
> Media errors are reported for writes when the drive knows there are
> problems. That may be deferred to the cache flush afterwards but the
> information is still generated and shipped back to us - eventually.

It a question, how to process cache flush errors correctly. A cache flush 
error reported for one filesystem may belong to the data written by other 
filesystem. So should some flag "there was an error" be set for all 
partitions and report it to every filesystem when it does cache flush? Or 
record the time of the last error in the driver and let the filesystem 
query it (so that the filesystem can tell if the error happened before or 
after it was mounted).

BTW. how does SCSI report cache flush errors? Does it report them on 
SYNCHRONIZE CACHE command or does it report them on defered senses?

Another point is that unless the sector remap table is full, there should 
be no cache flush errors.

> > For connection problems, another solution may be to retry writes 
> > indefinitely until the admin aborts it or reconnects the disk. But I don't 
> > know how common these recoverable disk connection errors are.
> 
> CRC errors, lost IRQs and the like are retried by the midlayer and
> drivers and the error handling strategies will also try things like
> reducing link speeds on repeated CRC errors.

I meant for example loose cable or so --- does it make sense to retry 
indefinitely (until the admin plugs the cable or unmounts the filesystem) 
or return error to the filesystem after few retries?

Mikulas

> Alan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ