lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 7 Feb 2014 14:15:31 +0800 From: Libo Chen <clbchenlibo.chen@...wei.com> To: James Bottomley <James.Bottomley@...senPartnership.com>, Eiichi Tsukata <eiichi.tsukata.xh@...achi.com> CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <yrl.pp-manager.tt@...achi.com> Subject: Re: [PATCH v2] scsi: Add 'retry_timeout' to avoid infinite command retry On 2014/2/7 13:46, James Bottomley wrote: > On Fri, 2014-02-07 at 09:22 +0900, Eiichi Tsukata wrote: >> Currently, scsi error handling in scsi_io_completion() tries to >> unconditionally requeue scsi command when device keeps some error state. >> For example, UNIT_ATTENTION causes infinite retry with >> action == ACTION_RETRY. >> This is because retryable errors are thought to be temporary and the scsi >> device will soon recover from those errors. Normally, such retry policy is >> appropriate because the device will soon recover from temporary error state. > > > >> But there is no guarantee that device is able to recover from error state >> immediately. Actually, we've experienced an infinite retry on some hardware. >> Therefore hardware error can results in infinite command retry loop. > > Could you please add an analysis of the actual failure; which devices > and what conditions. > same question, can you explain? >> This patch adds 'retry_timeout' sysfs attribute which limits the retry time >> of each scsi command. This attribute is located in scsi sysfs directory >> for example "/sys/bus/scsi/devices/X:X:X:X/" and value is in seconds. >> Once scsi command retry time is longer than this timeout, >> the command is treated as failure. 'retry_timeout' is set to '0' by default >> which means no timeout set. > > Don't do this ... you're mixing a feature (which you'd need to justify) > with an apparent bug fix. > > Once you dump all the complexity, I think the patch boils down to a > simple check before the action switch in scsi_io_completion(): > > if (action != ACTION_FAIL && > time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) { > action = ACTION_FAIL; > description = "command timed out"; > } > > > James > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@...r.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists