[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL3q7H5cL+4W6SQApq=ZhkzffvZAR2cEWK0bduNun+OkFevk=g@mail.gmail.com>
Date: Wed, 21 Sep 2022 10:00:37 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Dominique MARTINET <dominique.martinet@...ark-techno.com>
Cc: Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
Josef Bacik <josef@...icpanda.com>, Chris Mason <clm@...com>,
David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
lkml <linux-kernel@...r.kernel.org>,
Chen Liang-Chun <featherclc@...il.com>,
Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>,
kernel@...nvz.org, Yu Kuai <yukuai3@...wei.com>,
"Theodore Ts'o" <tytso@....edu>
Subject: Re: fiemap is slow on btrfs on files with multiple extents
On Wed, Sep 21, 2022 at 8:30 AM Dominique MARTINET
<dominique.martinet@...ark-techno.com> wrote:
>
> Filipe Manana wrote on Thu, Sep 01, 2022 at 02:25:12PM +0100:
> > It took me a bit more than I expected, but here is the patchset to make fiemap
> > (and lseek) much more efficient on btrfs:
> >
> > https://lore.kernel.org/linux-btrfs/cover.1662022922.git.fdmanana@suse.com/
> >
> > And also available in this git branch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=lseek_fiemap_scalability
>
> Thanks a lot!
> Sorry for the slow reply, it took me a while to find time to get back to
> my test setup.
>
> There's still this weird behaviour that later calls to cp are slower
> than the first, but the improvement is so good that it doesn't matter
> quite as much -- I haven't been able to reproduce the rcu stalls in qemu
> so I can't say for sure but they probably won't be a problem anymore.
>
> From a quick look with perf record/report the difference still seems to
> stem from fiemap (time spent there goes from 4.13 to 45.20%), so there
> is still more processing once the file is (at least partially) in cache,
> but it has gotten much better.
>
>
> (tests run on a laptop so assume some inconsistency with thermal
> throttling etc)
>
> /mnt/t/t # compsize bigfile
> Processed 1 file, 194955 regular extents (199583 refs), 0 inline.
> Type Perc Disk Usage Uncompressed Referenced
> TOTAL 15% 3.7G 23G 23G
> none 100% 477M 477M 514M
> zstd 14% 3.2G 23G 23G
> /mnt/t/t # time cp bigfile /dev/null
> real 0m 44.52s
> user 0m 0.49s
> sys 0m 32.91s
> /mnt/t/t # time cp bigfile /dev/null
> real 0m 46.81s
> user 0m 0.55s
> sys 0m 35.63s
> /mnt/t/t # time cp bigfile /dev/null
> real 1m 13.63s
> user 0m 0.55s
> sys 1m 1.89s
> /mnt/t/t # time cp bigfile /dev/null
> real 1m 13.44s
> user 0m 0.53s
> sys 1m 2.08s
>
>
> For comparison here's how it was on 6.0-rc2 your branch is based on:
> /mnt/t/t # time cp atde-test /dev/null
> real 0m 46.17s
> user 0m 0.60s
> sys 0m 33.21s
> /mnt/t/t # time cp atde-test /dev/null
> real 5m 35.92s
> user 0m 0.57s
> sys 5m 24.20s
>
>
>
> If you're curious the report blames set_extent_bit and
> clear_state_bit as follow; get_extent_skip_holes is completely gone; but
> I wouldn't necessarily say this needs much more time spent on it.
get_extent_skip_holes() no longer exists, so 0% of time spent there :)
Yes, I know. The reason you see so much time spent on
lock_extent_bits() is basically
because cp does too many fiemap calls with a very small extent buffer size.
I pointed that out here:
https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/
Making it use a larger buffer (say 500 or 1000 extents), would make it
a lot better.
But as I pointed out there, last year cp was changed to not use fiemap
to detect holes anymore,
now it uses lseek with SEEK_HOLE mode. So with time, everyone will get
a cp version that does
not use fiemap anymore.
Also, for the cp case, since it does many read and fiemap calls to the
source file, the following
patch probably helps too:
https://lore.kernel.org/linux-btrfs/20220819024408.9714-1-ethanlien@synology.com/
Because it will make the io tree smaller. That should land on 6.1 too.
Thanks for testing and the report.
>
> 45.20%--extent_fiemap
> |
> |--31.02%--lock_extent_bits
> | |
> | --30.78%--set_extent_bit
> | |
> | |--6.93%--insert_state
> | | |
> | | --0.70%--set_state_bits
> | |
> | |--4.25%--alloc_extent_state
> | | |
> | | --3.86%--kmem_cache_alloc
> | |
> | |--2.77%--_raw_spin_lock
> | | |
> | | --1.23%--preempt_count_add
> | |
> | |--2.48%--rb_next
> | |
> | |--1.13%--_raw_spin_unlock
> | | |
> | | --0.55%--preempt_count_sub
> | |
> | --0.92%--set_state_bits
> |
> --13.80%--__clear_extent_bit
> |
> --13.30%--clear_state_bit
> |
> | --3.48%--_raw_spin_unlock_irqrestore
> |
> |--2.45%--merge_state.part.0
> | |
> | --1.57%--rb_next
> |
> |--2.14%--__slab_free
> | |
> | --1.26%--cmpxchg_double_slab.constprop.0.isra.0
> |
> |--0.74%--free_extent_state
> |
> |--0.70%--kmem_cache_free
> |
> |--0.69%--btrfs_clear_delalloc_extent
> |
> --0.52%--rb_next
>
>
>
> Thanks!
> --
> Dominique
Powered by blists - more mailing lists