linux-kernel - Re: [PATCH] ext4: Return EIO on read error in ext4_find

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170623232616.r3ffksjntjfbrzgb@thunk.org>
Date:   Fri, 23 Jun 2017 19:26:16 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Khazhismel Kumykov <khazhy@...gle.com>
Cc:     Andreas Dilger <adilger@...ger.ca>, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ext4: Return EIO on read error in ext4_find_entry

On Fri, Jun 23, 2017 at 03:33:46PM -0700, Khazhismel Kumykov wrote:
> 
> Giving up early or checking future blocks both work, critical thing
> here is not returning NULL after seeing a read error.
> Previously to this the behavior was to continue to check future blocks
> after a read error, and it seemed OK.

Whether or not it is OK probably depends on how big the directory is.
If we need to suffer through N long error retries, whether it is
caused by long SCSI error retries, or long iSCSI error retries, sooner
or later it's going to be problematic if the process which is taking
forever to search through the whole directory has a some kind health
monitoring service or other watchdog timer.

Still, I agree that there will be some cases where instead of "Fast
fail", having the file server try as hard as possible fetch the file
from the failing disk is worthwhile.  I tend to be focused on the
cluster file system case where if it's going to several hundred
milliseconds to fetch the file, you're better off getting it from the
one other replicated copies from another server, or start the
reed-solomon reconstruction from.  However, if you have an
architecture where the only copy of the file is on the particular file
server (perhaps because you are depending on RAID instead of n=3
replication or reed-solomon erasure codes), having the file server try
as hard as possible to find the file is a good thing.

I wonder if the right answer is to have "fastfail" and "nofastfail"
mount option.

					- Ted