lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091127131302.GA5235@quack.suse.cz>
Date:	Fri, 27 Nov 2009 14:13:03 +0100
From:	Jan Kara <jack@...e.cz>
To:	tmhikaru@...il.com
Cc:	Alan Stern <stern@...land.harvard.edu>, Jan Kara <jack@...e.cz>,
	Boaz Harrosh <bharrosh@...asas.com>,
	Kernel development list <linux-kernel@...r.kernel.org>,
	USB list <linux-usb@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	SCSI development list <linux-scsi@...r.kernel.org>,
	linux-ext4@...r.kernel.org
Subject: Re: Weird I/O errors with USB hard drive not remounting filesystem
 readonly

On Fri 27-11-09 04:43:39, tmhikaru@...il.com wrote:
> On Wed, Nov 25, 2009 at 11:10:48AM -0500, Alan Stern wrote:
> > On Wed, 25 Nov 2009, Jan Kara wrote:
> > 
> > > > > > > Okay, very good.  There remains the question of the disturbing error
> > > > > > > messages in the system log.  Should they be supressed for FAILFAST
> > > > > > > requests?
> > > > > >   I think it's useful they are there because ultimately, something really
> > > > > > went wrong and you should better investigate. BTW, "end_request: I/O error"
> > > > > > messages are in the log even for requests where we retried and succeeded...
> > 
> > That isn't true.  Take a look at the dmesg log accompanying Tim's 
> > usbmon log.  Although there were 5 read errors in the usbmon log, there 
> > were only 2 I/O error messages in dmesg, corresponding to the 2 reads 
> > that weren't retried successfully.
> > 
> > Personally, I think it makes little sense to print error messages in 
> > the system log for commands where retries are disallowed.  Unless we go 
> > ahead and print error messages for _all_ failures, including those 
> > which are retried successfully.
> > 
> > Perhaps a good compromise would be to set the REQ_QUIET flag in 
> > req->cmd_flags for readaheads.  That would suppress the error messages 
> > coming from the SCSI core.
> > 
> > >   Yeah, we might make it more obvious that read failed and whether or not
> > > we are going to retry. Just technically it's not so simple because a
> > > different layer prints messages about errors (generic block layer) and
> > > different (scsi disk driver) decides what to do (retry, don't retry, ...).
> > 
> > Actually the retry decisions (or many of them) are made by the SCSI 
> > core, and that's also where some of those error messages come from.
> > 
> > > > 	I should have asked since I'm here at the moment - do you need any
> > > > more information out of the buggy USB enclosure at the moment, or can I work
> > > > on trying to fix/replace it now?
> > >   No, feel free to do anything with it :). Thanks for your help with
> > > debugging this.
> > 
> > To clarify, the enclosure isn't really very buggy.  It _should_ have
> > carried out the failed commands, or if it had a valid reason for not
> > doing so then it _should_ have reported the reason.  Regardless, the
> > errors that occurred were harmless because they went away when the
> > commands were retried.  (Although if they weren't harmless, you
> > wouldn't be able to tell just from reading the system log...)
> > 
> > Alan Stern
> 
> Okay. Okay. Back up a moment here - Clarify a little. I have the filesystem
> set to remount readonly on errors. I have not seen any filesystem
> corruption or file corruption I could find. The filesystem *was* remounting
> readonly under 2.6.31.5, but has not since .6 came out. (and I reformatted
> and redid the entire backup under 2.6.31.6 without errors)
> 
> How do I know when it has generated an actual failure that was not
> corrected?
> 
> How do I know when errors have been detected but they were corrected?
> 
> I'm guessing in the former, it'll remount ro, and in the latter it won't. Am
> I correct?
  So if just the file data cannot be read / written, you see messages
about IO errors in the log (as you see now), some application might get EIO
error but the filesystem will not be remounted. It gets remounted only if
some filesystem metadata cannot be read / written.
  It's true that retrying commands usually succeeds but not always and thus
I find it possible that eventually it fails several times in a row so that
kernel just gives up and discards data block to write because it thinks it
has no way of writing it. So I personally would ask for a warranty exchange
of the enclosure...

									Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ