linux-kernel - Re: [PATCH] fix readahead pipeline break caused by block plug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120131220333.GD4378@redhat.com>
Date:	Tue, 31 Jan 2012 17:03:33 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Shaohua Li <shaohua.li@...el.com>
Cc:	lkml <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jens Axboe <axboe@...nel.dk>,
	Herbert Poetzl <herbert@...hfloor.at>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Wu Fengguang <wfg@...ux.intel.com>
Subject: Re: [PATCH] fix readahead pipeline break caused by block plug

On Tue, Jan 31, 2012 at 03:59:40PM +0800, Shaohua Li wrote:
> Herbert Poetzl reported a performance regression since 2.6.39. The test
> is a simple dd read, but with big block size. The reason is:
> 
> T1: ra (A, A+128k), (A+128k, A+256k)
> T2: lock_page for page A, submit the 256k
> T3: hit page A+128K, ra (A+256k, A+384). the range isn't submitted
> because of plug and there isn't any lock_page till we hit page A+256k
> because all pages from A to A+256k is in memory
> T4: hit page A+256k, ra (A+384, A+ 512). Because of plug, the range isn't
> submitted again.
> T5: lock_page A+256k, so (A+256k, A+512k) will be submitted. The task is
> waitting for (A+256k, A+512k) finish.
> 
> There is no request to disk in T3 and T4, so readahead pipeline breaks.
> 
> We really don't need block plug for generic_file_aio_read() for buffered
> I/O. The readahead already has plug and has fine grained control when I/O
> should be submitted. Deleting plug for buffered I/O fixes the regression.
> 
> One side effect is plug makes the request size 256k, the size is 128k
> without it. This is because default ra size is 128k and not a reason we
> need plug here.

For me, this patch helps only so much and does not get back all the
performance lost in case of raw disk read. It does improve the throughput
from around 85-90 MB/s to 110-120 MB/s but running the same dd with
iflag=direct, gets me more than 250MB/s.

# echo 3 > /proc/sys/vm/drop_caches 
# dd if=/dev/sdb of=/dev/null bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 9.03305 s, 119 MB/s

echo 3 > /proc/sys/vm/drop_caches 
# dd if=/dev/sdb of=/dev/null bs=1M count=1K iflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.07426 s, 264 MB/s

I think it is happening because in case of raw read we are submitting
one page at a time to request queue and by the time all the pages
are submitted and one big merged request is formed it wates lot of time.

In case of direct IO, we are getting bigger IOs at request queue so
less cpu overhead, less idling on queue.

I created ext4 filesystem on same SSD and did the buffered read and
that seems to work just fine. Now I am getting bigger requests at
the request queue. (128K, 256 sectors).

[root@...lli common]# echo 3 > /proc/sys/vm/drop_caches 
[root@...lli common]# dd if=zerofile-4G of=/dev/null bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.09186 s, 262 MB/s

Anyway, remvoing top level plug in case of buffered reads sounds
reasonable.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/