linux-kernel - Re: [performance bug] kernel building regression on 64 LCPUs machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 04 Mar 2011 13:27:57 -0500
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Corrado Zoccolo <czoccolo@...il.com>,
	"Alex\,Shi" <alex.shi@...el.com>,
	"Li\, Shaohua" <shaohua.li@...el.com>,
	Vivek Goyal <vgoyal@...hat.com>,
	"tytso\@mit.edu" <tytso@....edu>,
	"jaxboe\@fusionio.com" <jaxboe@...ionio.com>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	"Chen\, Tim C" <tim.c.chen@...el.com>
Subject: Re: [performance bug] kernel building regression on 64 LCPUs machine

Jeff Moyer <jmoyer@...hat.com> writes:

> Jan Kara <jack@...e.cz> writes:
>
>> I'm not so happy with ext4 results. The difference between ext3 and ext4
>> might be that amount of data written by kjournald in ext3 is considerably
>> larger if it ends up pushing out data (because of data=ordered mode) as
>> well. With ext4, all data are written by filemap_fdatawrite() from fsync
>> because of delayed allocation. And thus maybe for ext4 WRITE_SYNC_PLUG
>> is hurting us with your fast storage and small amount of written data? With
>> WRITE_SYNC, data would be already on it's way to storage before we get to
>> wait for them...
>
>> Or it could be that we really send more data in WRITE mode rather than in
>> WRITE_SYNC mode with the patch on ext4 (that should be verifiable with
>> blktrace). But I wonder how that could happen...
>
> It looks like this is the case, the I/O isn't coming down as
> synchronous.  I'm seeing a lot of writes, very few write sync's, which
> means that the write stream will be preempted by the incoming reads.
>
> Time to audit that fsync path and make sure it's marked properly, I
> guess.

OK, I spoke too soon.  Here's the blktrace summary information (I re-ran
the tests using 3 samples, the blktrace is from the last run of the
three in each case):

Vanilla
-------
fs_mark: 307.288 files/sec
fio: 286509 KB/s

Total (sde):
 Reads Queued:     341,558,   84,994MiB  Writes Queued:       1,561K,    6,244MiB
 Read Dispatches:  341,493,   84,994MiB  Write Dispatches:  648,046,    6,244MiB
 Reads Requeued:         0               Writes Requeued:        27
 Reads Completed:  341,491,   84,994MiB  Writes Completed:  648,021,    6,244MiB
 Read Merges:           65,    2,780KiB  Write Merges:      913,076,    3,652MiB
 IO unplugs:       578,102               Timer unplugs:           0

Throughput (R/W): 282,797KiB/s / 20,776KiB/s
Events (sde): 16,724,303 entries

Patched
-------
fs_mark: 278.587 files/sec
fio: 298007 KB/s

Total (sde):
 Reads Queued:     345,407,   86,834MiB  Writes Queued:       1,566K,    6,264MiB
 Read Dispatches:  345,391,   86,834MiB  Write Dispatches:  327,404,    6,264MiB
 Reads Requeued:         0               Writes Requeued:        33
 Reads Completed:  345,391,   86,834MiB  Writes Completed:  327,371,    6,264MiB
 Read Merges:           16,    1,576KiB  Write Merges:        1,238K,    4,954MiB
 IO unplugs:       580,308               Timer unplugs:           0

Throughput (R/W): 288,771KiB/s / 20,832KiB/s
Events (sde): 14,030,610 entries

So, it appears we flush out writes much more aggressively without the
patch in place.  I'm not sure why the write bandwidth looks to be higher
in the patched case... odd.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/