linux-ext4 - Re: discard and data=writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFnufp1N-k+MWWsC0G1EhGvzRjiQn3G8qPw=6uqE1wjwnPgmqA@mail.gmail.com>
Date:   Tue, 22 Dec 2020 15:59:29 +0100
From:   Matteo Croce <mcroce@...ux.microsoft.com>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: discard and data=writeback

On Mon, Dec 21, 2020 at 4:04 AM Theodore Y. Ts'o <tytso@....edu> wrote:
>
> So that implies that your experiment may not be repeatable; did you
> make sure the file system was freshly reformatted before you wrote out
> the files in the directory you are deleting?  And was the directory
> written out in exactly the same way?  And did you make sure all of the
> writes were flushed out to disk before you tried timing the "rm -rf"
> command?  And did you make sure that there weren't any other processes
> running that might be issuing other file system operations (either
> data or metadata heavy) that might be interfering with the "rm -rf"
> operation?  What kind of storage device were you using?  (An SSD; a
> USB thumb drive; some kind of Cloud emulated block device?)
>

I got another machine with a faster NVME disk. I discarded the whole
drive before partitioning it, this drive is very fast in discarding
blocks:
# time blkdiscard -f /dev/nvme0n1p1

real    0m1.356s
user    0m0.003s
sys     0m0.000s

Also, the drive is pretty big compared to the dataset size, so it's
unlikely to be fragmented:

# lsblk /dev/nvme0n1
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0  1.7T  0 disk
└─nvme0n1p1 259:1    0  1.7T  0 part /media
# df -h /media
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  1.8T  1.2G  1.7T   1% /media
# du -sh /media/linux-5.10/
1.1G    /media/linux-5.10/

I'm issuing sync + sleep(10) after the extraction, so the writes
should all be flushed.
Also, I repeated the test three times, with very similar results:

# dmesg |grep EXT4-fs
[12807.847559] EXT4-fs (nvme0n1p1): mounted filesystem with ordered
data mode. Opts: data=ordered,discard

# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    0m1.607s
user    0m0.048s
sys     0m1.559s
# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    0m1.634s
user    0m0.080s
sys     0m1.553s
# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    0m1.604s
user    0m0.052s
sys     0m1.552s


# dmesg |grep EXT4-fs
[13133.953978] EXT4-fs (nvme0n1p1): mounted filesystem with writeback
data mode. Opts: data=writeback,discard

# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    1m29.443s
user    0m0.073s
sys     0m2.520s
# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    1m29.409s
user    0m0.081s
sys     0m2.518s
# tar xf ~/linux-5.10.tar ; sync ; sleep 10
# time rm -rf linux-5.10/

real    1m19.283s
user    0m0.068s
sys     0m2.505s

> Note that benchmarking the file system operations is *hard*.  When I
> worked with a graduate student working on a paper describing a
> prototype of a file system enhancement to ext4 to optimize ext4 for
> drive-managed SMR drives[1], the graduate student spent *way* more
> time getting reliable, repeatable benchmarks than making changes to
> ext4 for the prototype.  (It turns out the SMR GC operations caused
> variations in write speeds, which meant the writeback throughput
> measurements would fluctuate wildly, which then influenced the
> writeback cache ratio, which in turn massively influenced the how
> aggressively the writeback threads would behave, which in turn
> massively influenced the filebench and postmark numbers.)
>
> [1] https://www.usenix.org/conference/fast17/technical-sessions/presentation/aghayev
>

Interesting!

Cheers,
-- 
per aspera ad upstream