lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f875e2fe1003041817u5d9f71calbcef5b0c5c63d8a0@mail.gmail.com>
Date:	Thu, 4 Mar 2010 21:17:56 -0500
From:	s ponnusa <foosaa@...il.com>
To:	Robert Hancock <hancockrwd@...il.com>
Cc:	Mark Lord <kernel@...savvy.com>,
	Greg Freemyer <greg.freemyer@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Jens Axboe <jens.axboe@...cle.com>, linux-mm@...ck.org
Subject: Re: Linux kernel - Libata bad block error handling to user mode 
	program

Yes, this log file was because of the read verification program. I
will send a fresh log file of write failure once I am back at work. I
did not verify the log before sending.

On Thu, Mar 4, 2010 at 9:16 PM, Robert Hancock <hancockrwd@...il.com> wrote:
> On Thu, Mar 4, 2010 at 8:11 PM, s ponnusa <foosaa@...il.com> wrote:
>>> There's nothing in libata which will cause the operation to eventually
>>> return success if the drive keeps failing it (at least there definitely
>>> should not be and I very much doubt there is). My guess is that somehow what
>>> you think should be happening is not what the drive is actually doing (maybe
>>> one of the retries you're seeing is actually succeeding in writing to the
>>> disk, or at least the drive reports it was).
>>>
>>> You haven't posted any of the actual kernel output you're seeing, so it's
>>> difficult to say exactly what's going on. However, attempting to scan for
>>> disk errors using writes seems like a flawed strategy. As several people
>>> have mentioned, drives can't necessarily detect errors on a write.
>>>
>>
>> The scenario involves lots of bad drives with the known bad sectors
>> locations. Take MHDD for example, it sends an ATA write command to one
>> of the bad sectors, the drive returns failure / timeout, it tries
>> again, the drive still says failure / timeout, program comes out and
>> says failure. If we are not checking the errors during write process,
>> and continue to reallocate the sector or retry the write again, what
>> happens after all the available sectors are remapped? I still could
>> not visualise it for some reasons.
>>
>> Consider this scenario:
>> My write program says write passed. But when I used another
>> verification program (replica of the erasure program but does only
>> read / verify) it is unable to read the data and returns failure. No
>> other program (for example a Windows based hex editor or DOS based
>> disk editor) is able to read the information from that particular
>> sector. So, obviously the data written by linux is corrupted and
>> cannot be read back by any other means. And the program which wrote
>> the data is unaware of the error that has happened at the lower level.
>> But the error log clearly has the issue caught but is trying to handle
>> differently.
>>
>> I've attached a part of sample dmesg log which was logged during the
>> grinding of bad sector operation and eventually the write passed.
>
> [ 7671.006928] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 7671.006936] ata1.00: BMDMA stat 0x25
> [ 7671.006943] ata1.00: cmd c8/00:08:a8:56:75/00:00:00:00:00/e5 tag 0
> dma 4096 in
> [ 7671.006945]          res 51/40:04:ac:56:75/10:02:05:00:00/e5 Emask
> 0x9 (media error)
> [ 7671.006949] ata1.00: status: { DRDY ERR }
> [ 7671.006951] ata1.00: error: { UNC }
> [ 7671.028606] ata1.00: configured for UDMA/100
> [ 7671.028617] ata1: EH complete
>
> Command C8 is a read that's failing. It looks like almost all of the
> failures in that log are from failed reads, I don't see any failed
> writes. From what I can see it sounds like the drive is apparently
> writing successfully but is unable to read the data back (the reads
> being due to read-modify-write operations being done or for some other
> reason).
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ