lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1390416421.2372.68.camel@dabdike.int.hansenpartnership.com>
Date:	Wed, 22 Jan 2014 10:47:01 -0800
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Chris Mason <clm@...com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ide@...r.kernel.org" <linux-ide@...r.kernel.org>,
	"lsf-pc@...ts.linux-foundation.org" 
	<lsf-pc@...ts.linux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"rwheeler@...hat.com" <rwheeler@...hat.com>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"mgorman@...e.de" <mgorman@...e.de>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going
 beyond 4096 bytes

On Wed, 2014-01-22 at 18:37 +0000, Chris Mason wrote:
> On Wed, 2014-01-22 at 10:13 -0800, James Bottomley wrote:
> > On Wed, 2014-01-22 at 18:02 +0000, Chris Mason wrote:
[agreement cut because it's boring for the reader]
> > Realistically, if you look at what the I/O schedulers output on a
> > standard (spinning rust) workload, it's mostly large transfers.
> > Obviously these are misalgned at the ends, but we can fix some of that
> > in the scheduler.  Particularly if the FS helps us with layout.  My
> > instinct tells me that we can fix 99% of this with layout on the FS + io
> > schedulers ... the remaining 1% goes to the drive as needing to do RMW
> > in the device, but the net impact to our throughput shouldn't be that
> > great.
> 
> There are a few workloads where the VM and the FS would team up to make
> this fairly miserable
> 
> Small files.  Delayed allocation fixes a lot of this, but the VM doesn't
> realize that fileA, fileB, fileC, and fileD all need to be written at
> the same time to avoid RMW.  Btrfs and MD have setup plugging callbacks
> to accumulate full stripes as much as possible, but it still hurts.
> 
> Metadata.  These writes are very latency sensitive and we'll gain a lot
> if the FS is explicitly trying to build full sector IOs.

OK, so these two cases I buy ... the question is can we do something
about them today without increasing the block size?

The metadata problem, in particular, might be block independent: we
still have a lot of small chunks to write out at fractured locations.
With a large block size, the FS knows it's been bad and can expect the
rolled up newspaper, but it's not clear what it could do about it.

The small files issue looks like something we should be tackling today
since writing out adjacent files would actually help us get bigger
transfers.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ