lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f875e2fe1003041012m680ffc87i50099ed011526440@mail.gmail.com>
Date:	Thu, 4 Mar 2010 13:12:59 -0500
From:	s ponnusa <foosaa@...il.com>
To:	Mike Hayward <hayward@...p.net>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-ide@...r.kernel.org, jens.axboe@...cle.com,
	linux-mm@...ck.org
Subject: Re: Linux kernel - Libata bad block error handling to user mode 
	program

The write cache is turned off at the hdd level. I am using O_DIRECT
mode with aligned buffers of the 4k page size. I have turned off the
page cache and read ahead during read as well using the fadvise
function.

As you have mentioned, the program grinds the hdd when it hits the bad
sector patch. It retries to remap / write again until it (hdd) fails.
It then finds the hdd does not respond and finally resets the device.
(This goes on and the program eventually moves on the next sector
because write call returned success. No errno value was set. Is this
how a write will function in linux? It does not propagate the error to
the user mode program for any reasons related to the disk failures
during a write process even with the O_DIRECT flag.

Is there any specific location, that can be used to turn off the
sector remapping, retrying option at the libata level (I don't want to
change it at the public repository, rather I would like to change in
my kernel for testing / debugging purposes) and propagating the error
to the usermode programs? The messages in syslog are due to the printk
calls at the libata-eh.c file in the drivers/ata section of the kernel
code. But I have not spend much analysing it though.

Thanks.

On Thu, Mar 4, 2010 at 11:31 AM, Mike Hayward <hayward@...p.net> wrote:
> I have seen a couple of your posts on this and thought I'd chime in
> since I know a bit about storage.
>
> I frequently see io errors come through to user space (both read and
> write requests) from usb flash drives, so there is a functioning error
> path there to some degree.  When I see the errors, the kernel is also
> logging the sector and eventually resetting the device.
>
> There is no doubt a disk drive will slow down when it hits a bad spot
> since it will retry numerous times, most likely trying to remap bad
> blocks.  Of course your write succeeded because you probably have the
> drive cache enabled.  Flush or a full cache hangs while the drive
> retries all of the sectors that are bad, remapping them until finally
> it can remap no more.  At some point it probably returns an error if
> flush is timing out or it can't remap any more sectors, but it won't
> include the bad sector.
>
> I would suggest turning the drive cache off.  Then the drive won't lie
> to you about completing writes and you'll at least know which sectors
> are bad.  Just a thought :-)
>
> - Mike
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ