linux-kernel - Re: IO error semantics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87f94c371001250815t92cbce9t95df5c274745dae9@mail.gmail.com>
Date:	Mon, 25 Jan 2010 11:15:50 -0500
From:	Greg Freemyer <greg.freemyer@...il.com>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Anton Altaparmakov <aia21@....ac.uk>,
	Nick Piggin <npiggin@...e.de>,
	Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andreas Dilger <adilger@....com>,
	"Theodore Ts'o" <tytso@....edu>,
	Satoshi OSHIMA <satoshi.oshima.fk@...achi.com>,
	linux-fsdevel@...r.kernel.org
Subject: Re: IO error semantics

On Mon, Jan 25, 2010 at 10:23 AM, Ric Wheeler <rwheeler@...hat.com> wrote:
> On 01/18/2010 06:33 PM, Anton Altaparmakov wrote:
>>
>> Hi,
>>
>> On 18 Jan 2010, at 14:00, Nick Piggin wrote:
>>
>>>
>>> For write errors, you could also do block re-allocation, which would be
>>> fun.
>>>
>>
>> Yes it would.  (-:
>>
>> FWIW, Windows does this with Microsoft's NTFS driver.  When a write fails
>> due to a bad block, the block is marked as bad (recorded in the bad cluster
>> list and marked as allocated in the in-use bitmap so no-one tries to
>> allocate it), a new block is allocated, inode metadata is updated to reflect
>> the change in the logical to physical block map of the file the block
>> belongs to, and the write is then re-tried to its new location.
>>
>> I have never bothered implementing it in NTFS on Linux partially because
>> there doesn't seem any obvious way to do it inside the file system.  I think
>> the VFS and/or the block layer would have to offer help there in some way.
>>  What I mean for example is that if ->writepage fails then the failure is
>> only detected inside the asynchronous i/o completion handler at which point
>> the page is not locked any more, it is marked as being under writeback, and
>> we are in IRQ context (or something) and thus it is not easy to see how we
>> can from there get to doing all the above needed actions that require memory
>> allocations, disk i/o, etc...  I suppose a separate thread could do it where
>> we just schedule the work to be done.  But problem with that is that that
>> work later on might fail so we can't simply pretend the block was written
>> successfully yet we do not want to report an error or the upper layers would
>> pick it up even though we hopefully will correct it in due course...
>>
>> Best regards,
>>
>>        Anton
>>
>
> For permanent write errors, I would expect any modern drive to do a sector
> remapping internally. We should never need to track this kind of information
> for any modern device that I know of (S-ATA, SAS, SSD's and raid arrays
> should all handle this).
>
> Would not seem to be worth the complexity.
>
> Also keep in mind that retrying IO errors is not always a good thing -
> devices retry failed IO multiple times internally. Adding additional retry
> loops up the stack only makes our unavoidable IO error take much longer to
> hit!
>
> Ric

I thought write errors returned by modern drives (last 15 years) in
general were caused by bad cables, controllers, power supplies, etc.

When a media error is returned on write it indicated the spare sector
area of the drive was full.

Thus a media write error is a major error.  I would think, if
anything, we should turn the filesystem readonly upon a write media
error.  Not try to hide such a major problem.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/