linux-kernel - Re: fiemap is slow on btrfs on files with multiple extents

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yyq9lfH3AP8I/pwd@atmark-techno.com>
Date:   Wed, 21 Sep 2022 16:30:29 +0900
From:   Dominique MARTINET <dominique.martinet@...ark-techno.com>
To:     Filipe Manana <fdmanana@...nel.org>
Cc:     Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
        Josef Bacik <josef@...icpanda.com>, Chris Mason <clm@...com>,
        David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
        lkml <linux-kernel@...r.kernel.org>,
        Chen Liang-Chun <featherclc@...il.com>,
        Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>,
        kernel@...nvz.org, Yu Kuai <yukuai3@...wei.com>,
        Theodore Ts'o <tytso@....edu>
Subject: Re: fiemap is slow on btrfs on files with multiple extents

Filipe Manana wrote on Thu, Sep 01, 2022 at 02:25:12PM +0100:
> It took me a bit more than I expected, but here is the patchset to make fiemap
> (and lseek) much more efficient on btrfs:
> 
> https://lore.kernel.org/linux-btrfs/cover.1662022922.git.fdmanana@suse.com/
> 
> And also available in this git branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=lseek_fiemap_scalability

Thanks a lot!
Sorry for the slow reply, it took me a while to find time to get back to
my test setup.

There's still this weird behaviour that later calls to cp are slower
than the first, but the improvement is so good that it doesn't matter
quite as much -- I haven't been able to reproduce the rcu stalls in qemu
so I can't say for sure but they probably won't be a problem anymore.

>From a quick look with perf record/report the difference still seems to
stem from fiemap (time spent there goes from 4.13 to 45.20%), so there
is still more processing once the file is (at least partially) in cache,
but it has gotten much better.


(tests run on a laptop so assume some inconsistency with thermal
throttling etc)

/mnt/t/t # compsize bigfile
Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       15%      3.7G          23G          23G       
none       100%      477M         477M         514M       
zstd        14%      3.2G          23G          23G       
/mnt/t/t # time cp bigfile /dev/null
real	0m 44.52s
user	0m 0.49s
sys	0m 32.91s
/mnt/t/t # time cp bigfile /dev/null
real	0m 46.81s
user	0m 0.55s
sys	0m 35.63s
/mnt/t/t # time cp bigfile /dev/null
real	1m 13.63s
user	0m 0.55s
sys	1m 1.89s
/mnt/t/t # time cp bigfile /dev/null
real	1m 13.44s
user	0m 0.53s
sys	1m 2.08s


For comparison here's how it was on 6.0-rc2 your branch is based on:
/mnt/t/t # time cp atde-test /dev/null
real	0m 46.17s
user	0m 0.60s
sys	0m 33.21s
/mnt/t/t # time cp atde-test /dev/null
real	5m 35.92s
user	0m 0.57s
sys	5m 24.20s



If you're curious the report blames set_extent_bit and
clear_state_bit as follow; get_extent_skip_holes is completely gone; but
I wouldn't necessarily say this needs much more time spent on it.

45.20%--extent_fiemap
|
|--31.02%--lock_extent_bits
|          |          
|           --30.78%--set_extent_bit
|                     |          
|                     |--6.93%--insert_state
|                     |          |          
|                     |           --0.70%--set_state_bits
|                     |          
|                     |--4.25%--alloc_extent_state
|                     |          |          
|                     |           --3.86%--kmem_cache_alloc
|                     |          
|                     |--2.77%--_raw_spin_lock
|                     |          |          
|                     |           --1.23%--preempt_count_add
|                     |          
|                     |--2.48%--rb_next
|                     |          
|                     |--1.13%--_raw_spin_unlock
|                     |          |          
|                     |           --0.55%--preempt_count_sub
|                     |          
|                      --0.92%--set_state_bits
|          
 --13.80%--__clear_extent_bit
           |          
            --13.30%--clear_state_bit
                      |          
                      |           --3.48%--_raw_spin_unlock_irqrestore
                      |          
                      |--2.45%--merge_state.part.0
                      |          |          
                      |           --1.57%--rb_next
                      |          
                      |--2.14%--__slab_free
                      |          |          
                      |           --1.26%--cmpxchg_double_slab.constprop.0.isra.0
                      |          
                      |--0.74%--free_extent_state
                      |          
                      |--0.70%--kmem_cache_free
                      |          
                      |--0.69%--btrfs_clear_delalloc_extent
                      |          
                       --0.52%--rb_next



Thanks!
-- 
Dominique