linux-kernel - Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1390416421.2372.68.camel@dabdike.int.hansenpartnership.com>
Date:	Wed, 22 Jan 2014 10:47:01 -0800
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Chris Mason <clm@...com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
	"lsf-pc@...ts.linux-foundation.org" 
	<lsf-pc@...ts.linux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"rwheeler@...hat.com" <rwheeler@...hat.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"mgorman@...e.de" <mgorman@...e.de>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going
 beyond 4096 bytes

On Wed, 2014-01-22 at 18:37 +0000, Chris Mason wrote:
> On Wed, 2014-01-22 at 10:13 -0800, James Bottomley wrote:
> > On Wed, 2014-01-22 at 18:02 +0000, Chris Mason wrote:
[agreement cut because it's boring for the reader]
> > Realistically, if you look at what the I/O schedulers output on a
> > standard (spinning rust) workload, it's mostly large transfers.
> > Obviously these are misalgned at the ends, but we can fix some of that
> > in the scheduler.  Particularly if the FS helps us with layout.  My
> > instinct tells me that we can fix 99% of this with layout on the FS + io
> > schedulers ... the remaining 1% goes to the drive as needing to do RMW
> > in the device, but the net impact to our throughput shouldn't be that
> > great.
> 
> There are a few workloads where the VM and the FS would team up to make
> this fairly miserable
> 
> Small files.  Delayed allocation fixes a lot of this, but the VM doesn't
> realize that fileA, fileB, fileC, and fileD all need to be written at
> the same time to avoid RMW.  Btrfs and MD have setup plugging callbacks
> to accumulate full stripes as much as possible, but it still hurts.
> 
> Metadata.  These writes are very latency sensitive and we'll gain a lot
> if the FS is explicitly trying to build full sector IOs.

OK, so these two cases I buy ... the question is can we do something
about them today without increasing the block size?

The metadata problem, in particular, might be block independent: we
still have a lot of small chunks to write out at fractured locations.
With a large block size, the FS knows it's been bad and can expect the
rolled up newspaper, but it's not clear what it could do about it.

The small files issue looks like something we should be tackling today
since writing out adjacent files would actually help us get bigger
transfers.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/