linux-kernel - Re: SATA exceptions with 2.6.20-rc5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <311601c90701221017v65866b6dm565d09bc756bdf3b@mail.gmail.com>
Date:	Mon, 22 Jan 2007 11:17:51 -0700
From:	"Eric D. Mudama" <edmudama@...il.com>
To:	"Jeff Garzik" <jeff@...zik.org>
Cc:	"Jens Axboe" <jens.axboe@...cle.com>,
	"Robert Hancock" <hancockr@...w.ca>,
	"Björn Steinbrink" <B.Steinbrink@....de>,
	linux-kernel@...r.kernel.org, htejun@...il.com
Subject: Re: SATA exceptions with 2.6.20-rc5

On 1/15/07, Jeff Garzik <jeff@...zik.org> wrote:
> Jens Axboe wrote:
> > On Mon, Jan 15 2007, Jeff Garzik wrote:
> >> Jens Axboe wrote:
> >>> I'd be surprised if the device would not obey the 7 second timeout rule
> >>> that seems to be set in stone and not allow more dirty in-drive cache
> >>> than it could flush out in approximately that time.
> >> AFAIK Windows flush-cache timeout is 30 seconds, not 7 as with other
> >> commands...
> >
> > Ok, 7 seconds for FLUSH_CACHE would have been nice for us too though, as
> > it would pretty much guarentee lower latencies for random writes and
> > write back caching. The concern is the barrier code, of course. I guess
> > I should do some timings on potential worst case patterns some day. Alan
> > may have done that sometime in the past, iirc.
>
> FWIW:  According to the drive guys (Eric M, among others), FLUSH CACHE
> will "probably" be under 30 seconds, but pathological cases might even
> extend beyond that.
>
> Definitely more than 7 seconds in less-than-pathological cases,
> unfortunately...

The mentioned Maxtor model (6Yxxx) isn't susceptible to the
large-buffer long completion times, due to architectural differences
and availability of only small buffers.  Any "real" long-completion
flush on this device would, I believe, involve damage to the disk that
hinders the ability to seek, settle, or write.  (e.g. 30-second
flushes are easy to hit if you mount the disk on a shaker-table with
sufficient amplitude)

Later in the thread I think people have pretty much isolated it as not
the disk's problem, but just wanted to point this out.

I assume that large enough customers can buy enterprise-type command
completion ("all commands within X seconds") from most any disk
vendor.  However, these firmwares require much smarter or more active
drivers or block layers, to handle the higher error rate when the data
on the device is valid, but it will take longer than allowed by the
arbitrary enterprise rules.  Most customers who are buying this many
devices have software engineers customizing the drivers or disk
management applications to handle this differing behavior.

--eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/