linux-ext4 - Re: readahead blocking on ext4 ( was: Something amiss with IO throttling )

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:	Mon, 25 Feb 2013 22:31:28 -0500
From:	Phillip Susi <psusi@...ntu.com>
To:	linux-mm@...ck.org
CC:	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: readahead blocking on ext4 ( was: Something amiss with IO throttling
 )

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/25/2013 07:52 PM, Phillip Susi wrote:
> There seems to be something wrong with IO throttling in recent 
> kernels.  I have noticed two problems recently:
> 
> 1)  I noticed that the man page for readahead() states that it
> blocks until the data has been read, which is incorrect.  The whole
> point of the call is that it initiates background read(ahead), in
> the hopes that it will finish before the data is actually read,
> just like posix_fadvise(POSIX_FADV_WILLNEED).  Testing however,
> indicates that indeed, both readahead() and fadvise with
> POSIX_FADV_WILLNEED does appear to be blocking at least for most of
> the time it takes to read a large file.  In my case I'm reading a
> 400mb file after dropping cache and having 2+gb of free ram, and
> the readahead/fadvise blocks for nearly the 5 seconds it takes to
> read that.

Well, I solved this part and no longer think it is related to the
second problem.  It turns out that the file I was using for testing
had been downloaded via bit torrent, and while the data blocks were
100% contiguous, the extent tree was slightly fragmented.  The file
used 3 level 0 extent tree blocks, so it seems that the readahead
blocked until most of the file was read, and the third extent tree
block then could be read, and the remaining blocks mapped by it could
be queued, then readahead would return.

I'm adding linux-ext4 to Cc, and I'm hoping we can come up with some
ideas to improve this.  It seems to me that the extent tree blocks
should be read first, then all of the data blocks queued up so
readahead only has to block once and shortly before returning.  What
do you think?

Also it may be interesting to note that the three extent tree blocks
were allocated seemingly at random:

EXTENTS:
(ETB0):33795, (0-30975):370688-401663, (30976-40191):401664-410879,
(ETB0):483328, (40192-72575):410880-443263,
(72576-81919):443264-452607, (ETB0):483329, (81920-111578):452608-482266

Shouldn't the extent tree blocks be clustered a little better?  And
shouldn't the actual extents be merged a little better?  With 111578
logical blocks and a max of 32767 ( or was it 32768 initialized
blocks, and 32767 uninit? ) per extent, this whole file should be able
to fit within the 4 extents embedded in the inode, with no need for
any extent tree blocks.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJRLCyQAAoJEJrBOlT6nu75PEkIAK+SWfo6ugAV6cvgqGWiNkuz
eAn6IfaGxdg/6TLH+xsnQbI8KTqe8Zg9VXTZyenfwdC2lsgFz2WOI19yzMbR6eWi
fddzadKuvPbKN8BgpmF3I61wAOG1YdbpEcvJRHhyFU211+I10shOeowGnyIuj7II
uoyGeBaWN1MpIuTzLySFVLAYdbZOofYlTbrxrjiCavAHhjG91WfZExMVR60OkY0z
dxBQ0NF5B+YIx+egOx1fVbstXxfrFTIIqVSLpK6fF44DUN/rHWIRivT3vARJt1Pa
ZxFmqyGMLUgqaq71n9B4L5FmfQ+anYM33plufm4cUjFfoEqBbowZvSnwN2p1mQk=
=H8i2
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html