lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 5 Aug 2014 00:55:14 +0200
From:	Andreas Dilger <adilger@...ger.ca>
To:	Mason <mpeg.blue@...e.fr>
Cc:	Lukáš Czerner <lczerner@...hat.com>,
	Theodore Ts'o <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: After unlinking a large file on ext4, the process stalls for a long time

It would be possible to optimize mb_free_blocks() by having it
clear a whole word at a time instead of a series if bits. 

I thought that was done already, but it doesn't appear to be the case.
Also, it isn't clear that the bit "normalization" is needed anymore.
This was done back in the aniceint times when the buddy bitmaps were stored on disk instead of being regenerated only at mount time. 

Cheers, Andreas

> On Aug 4, 2014, at 16:30, Mason <mpeg.blue@...e.fr> wrote:
> 
>> On 18/07/2014 11:29, Lukáš Czerner wrote:
>> 
>> Mason wrote:
>> 
>>> Theodore Ts'o wrote:
>>> 
>>>> Mason wrote:
>>>> 
>>>>> unlink("/mnt/hdd/xxx")                  = 0 <111.479283>
>>>>> 
>>>>> 0.01user 111.48system 1:51.99elapsed 99%CPU (0avgtext+0avgdata 772maxresident)k
>>>>> 0inputs+0outputs (0major+434minor)pagefaults 0swaps
>>>> 
>>>> ... and we're CPU bound inside the kernel.
>>>> 
>>>> Can you run perf so we can see exactly where we're spending the CPU?
>>>> You're not using a journal, so I'm pretty sure what you will find is
>>>> that we're spending all of our time in mb_free_blocks(), when it is
>>>> updating the internal mballoc buddy bitmaps.
>>>> 
>>>> With a journal, this work done by mb_free_blocks() is hidden in the
>>>> kjournal thread, and happens after the commit is completed, so it
>>>> won't block other file system operations (other than burning some
>>>> extra CPU on one of the multiple cores available on a typical x86
>>>> CPU).
>>>> 
>>>> Also, I suspect the CPU overhead is *much* less on an x86 CPU, which
>>>> has native bit test/set/clear instructions, whereas the MIPS
>>>> architecture was designed by Prof. Hennessy at Stanford, who was a
>>>> doctrinaire RISC fanatic, so there would be no bitop instructions.
> 
> I've attached the output of "mips-linux-gnu-objdump -xd mballoc.o"
> in case someone wants to peek at the generated code.
> 
>>>> Even though I'm pretty sure what we'll find, knowing exactly *where*
>>>> in mb_free_blocks() or the function it calls would be helpful in
>>>> knowing what we need to optimize.  So if you could try using perf
>>>> (assuming that the perf is supported MIPS; not sure if it does) that
>>>> would be really helpful.
> 
> How do you get perf to tell you where in mb_free_blocks we are spending
> the most time?
> 
>>> What command-line do you suggest I run to get the output you expect?
>> 
>> If perf works on your system you can record data with
>> 
>> perf record -g ./test file <size>
>> 
>> and then report with
>> 
>> perf report --stdio
>> 
>> That should yield some interesting information about where we spend
>> the most time in kernel.
> 
> I've no idea why, but the unlink operation, which used to take
> 111 seconds to run, now only takes 53...
> 
> Anyway, here is the requested output.
> 
> # time perf record -g foo /mnt/hdd/xxx 300
> [ perf record: Woken up 8 times to write data ]
> [ perf record: Captured and wrote 1.909 MB perf.data (~83406 samples) ]
> 0.04user 0.08system 0:53.54elapsed 0%CPU (0avgtext+0avgdata 3616maxresident)k
> 0inputs+0outputs (0major+984minor)pagefaults 0swaps
> 
> # perf report --stdio > report.txt
> (Complete report attached as report.txt.xz)
> 
> What can I do to improve the latency of unlinking large files?
> Would sparse_super2 help at all?
> 
> 
> # Events: 14K cycles
> #
> # Overhead  Command      Shared Object                        Symbol
> # ........  .......  .................  ............................
> #
>    33.94%      foo  [kernel.kallsyms]  [k] mb_free_blocks
>               |
>               --- mb_free_blocks
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>    21.11%      foo  [kernel.kallsyms]  [k] __find_get_block
>               |
>               --- __find_get_block
>                  |          
>                  |--99.94%-- ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.06%-- [...]
> 
>     8.33%      foo  [kernel.kallsyms]  [k] radix_tree_lookup_slot
>               |
>               --- radix_tree_lookup_slot
>                   find_get_page
>                   __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     6.99%      foo  [kernel.kallsyms]  [k] mb_find_buddy
>               |
>               --- mb_find_buddy
>                   mb_free_blocks
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     4.21%      foo  [kernel.kallsyms]  [k] trace_preempt_off
>               |
>               --- trace_preempt_off
>                  |          
>                  |--99.99%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.01%-- [...]
> 
>     4.19%      foo  [kernel.kallsyms]  [k] ext4_free_blocks
>               |
>               --- ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     4.14%      foo  [kernel.kallsyms]  [k] sub_preempt_count
>               |
>               --- sub_preempt_count
>                  |          
>                  |--99.69%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.31%-- [...]
> 
>     3.97%      foo  [kernel.kallsyms]  [k] __find_get_block_slow
>               |
>               --- __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
>     3.53%      foo  [kernel.kallsyms]  [k] __rcu_read_unlock
>               |
>               --- __rcu_read_unlock
>                  |          
>                  |--100.00%-- find_get_page
>                  |          __find_get_block_slow
>                  |          __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     3.26%      foo  [kernel.kallsyms]  [k] trace_preempt_on
>               |
>               --- trace_preempt_on
>                   sub_preempt_count
>                  |          
>                  |--100.00%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     2.06%      foo  [kernel.kallsyms]  [k] find_get_page
>               |
>               --- find_get_page
>                  |          
>                  |--100.00%-- __find_get_block_slow
>                  |          __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.00%-- [...]
> 
>     1.39%      foo  [kernel.kallsyms]  [k] add_preempt_count
>               |
>               --- add_preempt_count
>                  |          
>                  |--99.99%-- __find_get_block
>                  |          ext4_free_blocks
>                  |          ext4_ext_rm_leaf
>                  |          ext4_ext_truncate
>                  |          ext4_truncate
>                  |          ext4_evict_inode
>                  |          evict
>                  |          do_unlinkat
>                  |          stack_done
>                   --0.01%-- [...]
> 
>     1.26%      foo  [kernel.kallsyms]  [k] __rcu_read_lock
>               |
>               --- __rcu_read_lock
>                   find_get_page
>                   __find_get_block_slow
>                   __find_get_block
>                   ext4_free_blocks
>                   ext4_ext_rm_leaf
>                   ext4_ext_truncate
>                   ext4_truncate
>                   ext4_evict_inode
>                   evict
>                   do_unlinkat
>                   stack_done
> 
> -- 
> Regards.
> 
> <report.txt.xz>
> <mballoc.dump.xz>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ