linux-kernel - Re: Big I/O requests are split into small ones due to unaligned ext4 partition boundary?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Fri, 16 Dec 2016 13:42:15 +0800
From:   Ming Lei <tom.leiming@...il.com>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     Jens Axboe <axboe@...nel.dk>, "Theodore Ts'o" <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Abel Hu <Chou.Hu@...rosoft.com>,
        Thomas Shao <huishao@...rosoft.com>,
        Matthew Wilcox <matthew@....cx>,
        Long Li <longli@...rosoft.com>,
        KY Srinivasan <kys@...rosoft.com>
Subject: Re: Big I/O requests are split into small ones due to unaligned ext4
 partition boundary?

On Thu, Dec 15, 2016 at 9:53 PM, Dexuan Cui <decui@...rosoft.com> wrote:
>> From: Ming Lei [mailto:tom.leiming@...il.com]
>> Sent: Thursday, December 15, 2016 20:43
>>
>> On Thu, Dec 15, 2016 at 7:47 PM, Dexuan Cui <decui@...rosoft.com> wrote:
>> > Hi, when I run "mkfs.ext4 /dev/sdc2" in a Linux virtual machine on Hyper-V,
>> > where a disk IOPS=500 limit is applied by me [0],  the command takes much
>> > more time, if the ext4 partition boundary is not properly aligned:
>> >
>> > Example 1 [1]: it takes ~7 minutes with average wMB/s = 0.3   (slow)
>> > Example 2 [2]: it takes ~3.5 minutes with average wMB/s = 0.6 (slow)
>> > Example 3 [3]: it takes ~0.5 minute with average wMB/s = 4 (expected)
>> >
>> > strace shows the mkfs.ext3 program calls seek()/write() a lot and most of
>> > the writes use 32KB buffers (this should be big enough), and the program
>> > only invokes fsync() once, after it issues all the writes -- the fsync() takes
>> >>99% of the time.
>> >
>> > By logging SCSI commands, the SCSI Write(10) command is used here for the
>> > userspace 32KB write:
>> > in example 1, *each* command writes 1 or 2 sectors only (1 sector = 512
>> bytes);
>> > in example 2, *each* command writes 2 or 4 sectors only;
>> > in example 3, each command writes 1024 sectors.
>> >
>> > It looks the kernel block I/O layer can somehow split big user-space buffers
>> > into really small write requests (1, 2, and 4 sectors)?
>> > This looks really strange to me.
>> >
>> > Note: in my test, this strange issue happens to 4.4 and the mainline 4.9 kernels,
>> > but the stable 3.18.45 kernel doesn't have the issue, i.e. all the 3 above test
>> > examples can finish in ~0.5 minute.
>> >
>> > Any comment?
>>
>> I remember that we discussed this kind of issue, please see the discussion[1]
>> and check if the patch[2] can fix your issue.
>>
>> [1] http://marc.info/?t=145805525500002&r=1&w=2
>> [2] http://marc.info/?l=linux-kernel&m=145934325429152&w=2
>>
>> Ming
>
> Thank you very much, Ming! The patch can fix my issue!
> It looks your patch was not merged into the upstream somehow.
> Would you please submit the patch again?

Yeah, will do, and thanks for your test!



Thanks,
Ming Lei