linux-ext4 - Re: [PATCH 0/2] ext4: Fix performance regression with mballoc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220826095208.a3xjakfsfhjwo2wa@quack3>
Date:   Fri, 26 Aug 2022 11:52:08 +0200
From:   Jan Kara <jack@...e.cz>
To:     Stefan Wahren <stefan.wahren@...e.com>
Cc:     Jan Kara <jack@...e.cz>, Ted Tso <tytso@....edu>,
        linux-ext4@...r.kernel.org,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
        Harshad Shirwadkar <harshadshirwadkar@...il.com>
Subject: Re: [PATCH 0/2] ext4: Fix performance regression with mballoc

Hi Stefan!

On Thu 25-08-22 17:48:32, Stefan Wahren wrote:
> Am 25.08.22 um 11:18 schrieb Jan Kara:
> > On Wed 24-08-22 23:24:43, Stefan Wahren wrote:
> > > Am 24.08.22 um 12:40 schrieb Jan Kara:
> > > > On Wed 24-08-22 12:17:14, Stefan Wahren wrote:
> > > > > Am 23.08.22 um 22:15 schrieb Jan Kara:
> > > > > > So I have implemented mballoc improvements to avoid spreading allocations
> > > > > > even with mb_optimize_scan=1. It fixes the performance regression I was able
> > > > > > to reproduce with reaim on my test machine:
> > > > > > 
> > > > > >                         mb_optimize_scan=0     mb_optimize_scan=1     patched
> > > > > > Hmean     disk-1       2076.12 (   0.00%)     2099.37 (   1.12%)     2032.52 (  -2.10%)
> > > > > > Hmean     disk-41     92481.20 (   0.00%)    83787.47 *  -9.40%*    90308.37 (  -2.35%)
> > > > > > Hmean     disk-81    155073.39 (   0.00%)   135527.05 * -12.60%*   154285.71 (  -0.51%)
> > > > > > Hmean     disk-121   185109.64 (   0.00%)   166284.93 * -10.17%*   185298.62 (   0.10%)
> > > > > > Hmean     disk-161   229890.53 (   0.00%)   207563.39 *  -9.71%*   232883.32 *   1.30%*
> > > > > > Hmean     disk-201   223333.33 (   0.00%)   203235.59 *  -9.00%*   221446.93 (  -0.84%)
> > > > > > Hmean     disk-241   235735.25 (   0.00%)   217705.51 *  -7.65%*   239483.27 *   1.59%*
> > > > > > Hmean     disk-281   266772.15 (   0.00%)   241132.72 *  -9.61%*   263108.62 (  -1.37%)
> > > > > > Hmean     disk-321   265435.50 (   0.00%)   245412.84 *  -7.54%*   267277.27 (   0.69%)
> > > > > > 
> > > > > > Stefan, can you please test whether these patches fix the problem for you as
> > > > > > well? Comments & review welcome.
> > > > > i tested the whole series against 5.19 and 6.0.0-rc2. In both cases the
> > > > > update process succeed which is a improvement, but the download + unpack
> > > > > duration ( ~ 7 minutes ) is not as good as with mb_optimize_scan=0 ( ~ 1
> > > > > minute ).
> > > > OK, thanks for testing! I'll try to check specifically untar whether I can
> > > > still see some differences in the IO pattern on my test machine.
> > > i made two iostat output logs during the complete download phase with 5.19
> > > and your series applied. iostat was running via ssh connection and
> > > rpi-update via serial console.
> > > 
> > > First with mb_optimize_scan=0
> > > 
> > > https://github.com/lategoodbye/mb_optimize_scan_regress/blob/main/5.19_SDCIT_patch_nooptimize_download_success.iostat.log
> > > 
> > > Second with mb_optimize_scan=1
> > > 
> > > https://github.com/lategoodbye/mb_optimize_scan_regress/blob/main/5.19_SDCIT_patch_optimize_download_success.iostat.log
> > > 
> > > Maybe this helps
> > Thanks for the data! So this is interesting. In both iostat logs, there is
> > initial phase where no IO happens. I guess that's expected. It is
> > significantly longer in the mb_optimize_scan=0 but I suppose that is just
> > caused by a difference in when iostat was actually started. Then in
> > mb_optimize_scan=0 there is 155 seconds where the eMMC card is 100%
> > utilized and then iostat ends. During this time ~63MB is written
> > altogether. Request sizes vary a lot, average is 60KB.
> > 
> > In mb_optimize_scan=1 case there is 715 seconds recorded where eMMC card is
> > 100% utilized. During this time ~133MB is written, average request size is
> > 40KB. If I look just at first 155 seconds of the trace (assuming iostat was
> > in both cases terminated before writing was fully done), we have written
> > ~53MB and average request size is 56KB.
> > 
> > So with mb_optimize_scan=1 we are indeed still somewhat slower but based on
> > the trace it is not clear why the download+unpack should take 7 minutes
> > instead of 1 minute. There must be some other effect we are missing.
> > 
> > Perhaps if you just download the archive manually, call sync(1), and measure
> > how long it takes to (untar the archive + sync) in mb_optimize_scan=0/1 we
> > can see whether plain untar is indeed making the difference or there's
> > something else influencing the result as well (I have checked and
> > rpi-update does a lot of other deleting & copying as the part of the
> > update)? Thanks.
> 
> I will provide those iostats.
> 
> Btw i untar the firmware archive (mb_optimized_scan=1 and your patch) and
> got following:
> 
> cat /proc/fs/ext4/mmcblk1p2/mb_structs_summary
> 
> 
> optimize_scan: 1
> max_free_order_lists:
>         list_order_0_groups: 5
>         list_order_1_groups: 0
>         list_order_2_groups: 0
>         list_order_3_groups: 0
>         list_order_4_groups: 1
>         list_order_5_groups: 0
>         list_order_6_groups: 1
>         list_order_7_groups: 1
>         list_order_8_groups: 10
>         list_order_9_groups: 1
>         list_order_10_groups: 2
>         list_order_11_groups: 0
>         list_order_12_groups: 2
>         list_order_13_groups: 55
> fragment_size_tree:
>         tree_min: 1
>         tree_max: 31249
> 
>         tree_nodes: 79
> 
> Is this expected?

Yes, I don't see anything out of ordinary in this for a used filesystem. It
tells us there are 55 groups with big chunks of free space, there are some
groups which have only small chunks of free space but that's expected when
the filesystem is reasonably filled...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR