[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100309122029.GO28189@discord.disaster>
Date: Tue, 9 Mar 2010 23:20:29 +1100
From: Dave Chinner <david@...morbit.com>
To: Michael Tokarev <mjt@....msk.ru>
Cc: Karel Zak <kzak@...hat.com>, Mike Snitzer <snitzer@...hat.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Tejun Heo <tj@...nel.org>,
"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
Daniel Taylor <Daniel.Taylor@....com>,
Jeff Garzik <jeff@...zik.org>, Mark Lord <kernel@...savvy.com>,
tytso@....edu, "H. Peter Anvin" <hpa@...or.com>,
hirofumi@...l.parknet.co.jp,
Andrew Morton <akpm@...ux-foundation.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>, irtiger@...il.com,
Matthew Wilcox <matthew@....cx>, aschnell@...e.de,
knikanth@...e.de, jdelvare@...e.de,
Jim Meyering <jim@...ering.net>, Neil Brown <neilb@...e.de>
Subject: Re: ATA 4 KiB sector issues.
On Tue, Mar 09, 2010 at 02:38:57PM +0300, Michael Tokarev wrote:
> Dave Chinner wrote:
> > On Tue, Mar 09, 2010 at 01:16:01PM +0300, Michael Tokarev wrote:
> >> Karel Zak wrote:
> >>> I did almost all my tests with scsi_debug or MD RAID0 on scsi_debug.
> >>> It works as expected.
> >> Actually, for raid0, the alignment is questionable. Should it be a
> >> multiple of chunk size or whole stripe size? I'm not sure, both ways
> >> has bad and good sides.. But if it is the latter, the same issues
> >> pops up again: do a 3-disk raid0 and you'll have to align to 3*2^N.
> >
> > Yes, alignment is still needed, especially for filesystems that can
> > do stripe unit aligned allocation like XFS. If you don't align the
> > filesystem properly, all the data IO will be mis-aligned to the
> > underlying disks and stripe unit sized IO will hit multiple disks
> > rather than just one....
>
> I understand alignment is needed, the question is if the alignment
> should be to chunk size or full-stripe size. In neither case it
> will be bad for underlying disks.
Depends on the RAID implementation. High end RAID arrays often have
cache bypass features that are triggered by stripe width aligned and
sized IOs. cwWhen receiving well formed IO they can more than double
write performance because they are not limited by internal cache
mirroring bandwidth (e.g. the controller magically switches to
write-through for those well formed IOs instead of writeback).
So from that perspective, alignment needs to be to stripe width,
not stripe unit. Similarly for RAID5/6 alignment needs to be to
stripe width, so that a well formed IO issued by the filesystem
only hits one RAID5/6 stripe.
FWIW, XFS takes great care to ensure that it doesn't place all it's
allocation group headers on the same stripe unit. Failing to
distribute the AG headers across all the Ń•tripe units evenly loads
the disks/luns in the stripe unevenly. As soon as you have uneven
load on a stripe the performance tanks as stripe is only as fast as
it's slowest member.
Also, while XFS prefers to align to stripe unit, there are mount
options to change the default allocation alignment to be stripe
width based. Hence if you have large files and applications that are
doing well formed IO, stripe width alignment of the filesystem to
the underlying block device is critical to acheiving deterministic
throughput close to the maximum the hardware can support.....
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists