lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 22 Aug 2012 01:14:42 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	NeilBrown <neilb@...e.de>
Cc:	Yuanhan Liu <yuanhan.liu@...ux.intel.com>,
	Fengguang Wu <fengguang.wu@...el.com>,
	Li Shaohua <shli@...ionio.com>, Theodore Ts'o <tytso@....edu>,
	Marti Raudsepp <marti@...fo.org>,
	Kernel hackers <linux-kernel@...r.kernel.org>,
	ext4 hackers <linux-ext4@...r.kernel.org>, maze@...gle.com,
	"Shi, Alex" <alex.shi@...el.com>, linux-fsdevel@...r.kernel.org,
	linux RAID <linux-raid@...r.kernel.org>
Subject: Re: ext4 write performance regression in 3.6-rc1 on RAID0/5

On 2012-08-22, at 12:00 AM, NeilBrown wrote:
> On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@...ux.intel.com>
> wrote:
>> 
>> -#define NR_STRIPES		256
>> +#define NR_STRIPES		1024
> 
> Changing one magic number into another magic number might help your case, but
> it not really a general solution.

We've actually been carrying a patch for a few years in Lustre to
increase the NR_STRIPES to 2048, and made it a configurable module
parameter.  This made a noticeable improvement to the performance
for fast systems.

> Possibly making sure that max_nr_stripes is at least some multiple of the
> chunk size might make sense, but I wouldn't want to see a very large multiple.
> 
> I thing the problems with RAID5 are deeper than that.  Hopefully I'll figure
> out exactly what the best fix is soon - I'm trying to look into it.

The other MD RAID-5/6 patches that we have change the page submission
order to avoid the need to merge pages in the elevator so much, and a
patch to allow zero-copy IO submission if the caller marks the page for
direct IO (indicating it will not be modified until after IO completes).
This avoids a lot of overhead on fast systems.

This isn't really my area of expertise, but patches against RHEL6
could be seen at http://review.whamcloud.com/1142 if you want to
take a look.  I don't know if that code is at all relevant to what
is in 3.x today.

> I don't think the size of the cache is a big part of the solution.  I think
> correct scheduling of IO is the real answer.

My experience is that on fast systems the IO scheduler just gets in the
way.  Submitting larger contiguous IOs to each disk in the first place
is far better than trying to merge small IOs again at the back end.

Cheers, Andreas






Download attachment "PGP.sig" of type "application/pgp-signature" (236 bytes)

Powered by blists - more mailing lists