linux-kernel - Re: Linux kernel - Libata bad block error handling to user mode program

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f875e2fe1003041811p5aa934ecob90836a8d0a6b605@mail.gmail.com>
Date:	Thu, 4 Mar 2010 21:11:43 -0500
From:	s ponnusa <foosaa@...il.com>
To:	Robert Hancock <hancockrwd@...il.com>
Cc:	Mark Lord <kernel@...savvy.com>,
	Greg Freemyer <greg.freemyer@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Jens Axboe <jens.axboe@...cle.com>, linux-mm@...ck.org
Subject: Re: Linux kernel - Libata bad block error handling to user mode 
	program

On Thu, Mar 4, 2010 at 8:58 PM, Robert Hancock <hancockrwd@...il.com> wrote:
> On 03/04/2010 12:20 PM, s ponnusa wrote:
>>
>> SMART data consists only the count of remapped sectors, seek failures,
>> raw read error rate, uncorrectable sector counts, crc errors etc., and
>> technically one should be aware of the error during write operation as
>> well.
>>
>> As per the ATAPI specifications, the media will report error for both
>> read / write operations. It times out / sends out error code for both
>> read and write operations. Correct me if I am wrong. What happens if
>> all the available free sectors are remapped and there are no more
>> sectors to map? In that atleast the drive should return an error
>> right? When using the O_DIRECT more, the i/o error, media bad,
>> softreset, hardreset error messages are starting to fill up dmesg
>> almost immediately after the write call.
>>
>> It just tries in a continous loop and then finally returns success
>> (even without remapping). I don't know how to change the behavior of
>> libata / or other such driver which does it. All I want to do it to
>> know the error in my program while it is reporting it in the syslog at
>> kernel / driver level.
>
> There's nothing in libata which will cause the operation to eventually
> return success if the drive keeps failing it (at least there definitely
> should not be and I very much doubt there is). My guess is that somehow what
> you think should be happening is not what the drive is actually doing (maybe
> one of the retries you're seeing is actually succeeding in writing to the
> disk, or at least the drive reports it was).
>
> You haven't posted any of the actual kernel output you're seeing, so it's
> difficult to say exactly what's going on. However, attempting to scan for
> disk errors using writes seems like a flawed strategy. As several people
> have mentioned, drives can't necessarily detect errors on a write.
>

The scenario involves lots of bad drives with the known bad sectors
locations. Take MHDD for example, it sends an ATA write command to one
of the bad sectors, the drive returns failure / timeout, it tries
again, the drive still says failure / timeout, program comes out and
says failure. If we are not checking the errors during write process,
and continue to reallocate the sector or retry the write again, what
happens after all the available sectors are remapped? I still could
not visualise it for some reasons.

Consider this scenario:
My write program says write passed. But when I used another
verification program (replica of the erasure program but does only
read / verify) it is unable to read the data and returns failure. No
other program (for example a Windows based hex editor or DOS based
disk editor) is able to read the information from that particular
sector. So, obviously the data written by linux is corrupted and
cannot be read back by any other means. And the program which wrote
the data is unaware of the error that has happened at the lower level.
But the error log clearly has the issue caught but is trying to handle
differently.

I've attached a part of sample dmesg log which was logged during the
grinding of bad sector operation and eventually the write passed.

Please advice. Thank you.
Suresh Ponnusamy

>>
>> Thank you.
>>
>> On Thu, Mar 4, 2010 at 12:49 PM, Mark Lord<kernel@...savvy.com>  wrote:
>>>
>>> On 03/04/10 10:33, foo saa wrote:
>>> ..
>>>>
>>>> hdparm is good, but I don't want to use the internal ATA SECURE ERASE
>>>> because I can never get the amount of bad sectors the drive had.
>>>
>>> ..
>>>
>>> Oh.. but isn't that information in the S.M.A.R.T. data ??
>>>
>>> You'll not find the bad sectors by writing -- a true WRITE nearly never
>>> reports a media error.  Instead, the drive simply remaps to a good sector
>>> on the fly and returns success.
>>>
>>> Generally, only READs report media errors.
>>>
>>> Cheers
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>

View attachment "dmesglog.txt" of type "text/plain" (124724 bytes)