linux-ext4 - Re: Help: ext4 jbd2 IO requests slow down fsync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACZyaBvcMqZosWfvNwWJt2+dihRJGybe4O0_Q67a+wxn3ci4cA@mail.gmail.com>
Date:   Mon, 27 Jan 2020 20:55:04 -0800
From:   Colin Zou <colin.zou@...il.com>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Help: ext4 jbd2 IO requests slow down fsync

Thanks for the information and analysis. I then did more tests. My app
runs random 4KB workloads on SSD device, one write followed by one
fsync. Here are the FIO test simulating the workload and the test
results. Please help to take a look and let me know what you think.

=================
Ext3 (Kernel 3.2)
=================
# ./fio  --rw=randwrite --bs=4k --direct=0 --filesize=1G --numjobs=32
--runtime=240 --group_reporting --filename=/db/test.benchmark
--name=randomwrite --fsync=3 --invalidate=1

randomwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K,
ioengine=psync, iodepth=1

...

fio-2.16-5-g915ca

Starting 32 processes

randomwrite: Laying out IO file(s) (1 file(s) / 1024MB)

Jobs: 31 (f=31): [w(9),_(1),w(22)] [100.0% done] [0KB/294.4MB/0KB /s]
[0/75.4K/0 iops] [eta 00m:00s]

randomwrite: (groupid=0, jobs=32): err= 0: pid=9050: Tue Jan  7 18:00:16 2020

  write: io=32768MB, bw=272130KB/s, iops=68032, runt=123303msec
     <<<<<<<<<<<  iops is much higher than 4.4 kernel test below.

    clat (usec): min=1, max=21783, avg=35.40, stdev=137.63

     lat (usec): min=1, max=21783, avg=35.51, stdev=137.74

    clat percentiles (usec):

     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    3],

     | 30.00th=[    4], 40.00th=[    6], 50.00th=[    7], 60.00th=[   10],

     | 70.00th=[   14], 80.00th=[   23], 90.00th=[   94], 95.00th=[  211],

     | 99.00th=[  354], 99.50th=[  580], 99.90th=[ 1144], 99.95th=[ 1336],

     | 99.99th=[ 1816]

    lat (usec) : 2=2.18%, 4=19.43%, 10=36.36%, 20=19.23%, 50=10.43%

    lat (usec) : 100=2.60%, 250=6.43%, 500=2.75%, 750=0.23%, 1000=0.19%

    lat (msec) : 2=0.16%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

  cpu          : usr=0.35%, sys=8.89%, ctx=27987466, majf=0, minf=1120

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=8388608/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=1



Run status group 0 (all jobs):

  WRITE: io=32768MB, aggrb=272129KB/s, minb=272129KB/s,
maxb=272129KB/s, mint=123303msec, maxt=123303msec



Disk stats (read/write):

    dm-4: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=1/8378314, aggrmerge=0/10665, aggrticks=0/3661470,
aggrin_queue=3661970, aggrutil=97.88%

  sdb: ios=1/8378314, merge=0/10665, ticks=0/3661470,
in_queue=3661970, util=97.88%


===================
Kernel 4.4
===================
# ./fio  --rw=randwrite --bs=4k --direct=0 --filesize=1G --numjobs=32
--runtime=240 --group_reporting --filename=/db/test.benchmark
--name=randomwrite --fsync=3 --invalidate=1

randomwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K,
ioengine=psync, iodepth=1

...

fio-2.16-5-g915ca

Starting 32 processes

Jobs: 32 (f=32): [w(32)] [100.0% done] [0KB/219.3MB/0KB /s] [0/56.2K/0
iops] [eta 00m:00s]

randomwrite: (groupid=0, jobs=32): err= 0: pid=27703: Tue Jan  7 17:42:03 2020

  write: io=32768MB, bw=228297KB/s, iops=57074, runt=146977msec

    clat (usec): min=1, max=1647, avg=22.88, stdev=27.19

     lat (usec): min=1, max=1647, avg=22.96, stdev=27.22

    clat percentiles (usec):

     |  1.00th=[    2],  5.00th=[    3], 10.00th=[    5], 20.00th=[    8],

     | 30.00th=[   11], 40.00th=[   13], 50.00th=[   15], 60.00th=[   17],

     | 70.00th=[   22], 80.00th=[   33], 90.00th=[   52], 95.00th=[   72],

     | 99.00th=[  103], 99.50th=[  131], 99.90th=[  241], 99.95th=[  326],

     | 99.99th=[  852]

    lat (usec) : 2=0.68%, 4=5.79%, 10=18.95%, 20=40.42%, 50=22.92%

    lat (usec) : 100=9.96%, 250=1.19%, 500=0.06%, 750=0.01%, 1000=0.01%

    lat (msec) : 2=0.01%

  cpu          : usr=0.30%, sys=10.26%, ctx=24443801, majf=0, minf=295

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=8388608/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=1



Run status group 0 (all jobs):

  WRITE: io=32768MB, aggrb=228297KB/s, minb=228297KB/s,
maxb=228297KB/s, mint=146977msec, maxt=146977msec



Disk stats (read/write):

    dm-3: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=0/8555222, aggrmerge=0/88454, aggrticks=0/4851095,
aggrin_queue=4858514, aggrutil=91.66%

  sdb: ios=0/8555222, merge=0/88454, ticks=0/4851095,
in_queue=4858514, util=91.66%



On Fri, Jan 24, 2020 at 5:57 PM Theodore Y. Ts'o <tytso@....edu> wrote:
>
> On Thu, Jan 23, 2020 at 10:28:47PM -0800, Colin Zou wrote:
> >
> > I used to run my application on ext3 on SSD and recently switched to
> > ext4. However, my application sees performance regression. The root
> > cause is, iosnoop shows that the workload includes a lot of fsync and
> > every fsync does data IO and also jbd2 IO. While on ext3, it seldom
> > does journal IO. Is there a way to tune ext4 to increase fsync
> > performance? Say, by reducing jbd2 IO requests?
>
> If you're not seeing journal I/O from ext3 after an fsync, you're not
> looking at things correctly.  At the very *least* there will be
> journal I/O for the commit block, unless all of the work was done
> earlier in a previous journal commit.
>
> In general, ext4 and ext3 will be doing roughly the same amount of I/O
> to the journal.  In some cases, depending on the workload, ext4
> *might* need to do more data I/O for the file being synced.  That's
> because with ext3, if there is an intervening periodic 5 second
> journal commit, some or all of the data I/O may have been forced out
> to disk earlier due to said 5 second sync.
>
> What sort of workload does your application do?  How much data blocks
> are you writing before each fsync(), and how often are the fsync()
> operations?
>
>                                                 - Ted