lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170623232616.r3ffksjntjfbrzgb@thunk.org>
Date:   Fri, 23 Jun 2017 19:26:16 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Khazhismel Kumykov <khazhy@...gle.com>
Cc:     Andreas Dilger <adilger@...ger.ca>, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ext4: Return EIO on read error in ext4_find_entry

On Fri, Jun 23, 2017 at 03:33:46PM -0700, Khazhismel Kumykov wrote:
> 
> Giving up early or checking future blocks both work, critical thing
> here is not returning NULL after seeing a read error.
> Previously to this the behavior was to continue to check future blocks
> after a read error, and it seemed OK.

Whether or not it is OK probably depends on how big the directory is.
If we need to suffer through N long error retries, whether it is
caused by long SCSI error retries, or long iSCSI error retries, sooner
or later it's going to be problematic if the process which is taking
forever to search through the whole directory has a some kind health
monitoring service or other watchdog timer.

Still, I agree that there will be some cases where instead of "Fast
fail", having the file server try as hard as possible fetch the file
from the failing disk is worthwhile.  I tend to be focused on the
cluster file system case where if it's going to several hundred
milliseconds to fetch the file, you're better off getting it from the
one other replicated copies from another server, or start the
reed-solomon reconstruction from.  However, if you have an
architecture where the only copy of the file is on the particular file
server (perhaps because you are depending on RAID instead of n=3
replication or reed-solomon erasure codes), having the file server try
as hard as possible to find the file is a good thing.

I wonder if the right answer is to have "fastfail" and "nofastfail"
mount option.

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ