[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ywor0BFVnLYj2bxH@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date: Sat, 27 Aug 2022 20:06:00 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: Stefan Wahren <stefan.wahren@...e.com>, Ted Tso <tytso@....edu>,
linux-ext4@...r.kernel.org,
Thorsten Leemhuis <regressions@...mhuis.info>,
Harshad Shirwadkar <harshadshirwadkar@...il.com>
Subject: Re: [PATCH 0/2] ext4: Fix performance regression with mballoc
On Fri, Aug 26, 2022 at 12:15:22PM +0200, Jan Kara wrote:
> Hi Stefan,
>
> On Thu 25-08-22 18:57:08, Stefan Wahren wrote:
> > > Perhaps if you just download the archive manually, call sync(1), and measure
> > > how long it takes to (untar the archive + sync) in mb_optimize_scan=0/1 we
> > > can see whether plain untar is indeed making the difference or there's
> > > something else influencing the result as well (I have checked and
> > > rpi-update does a lot of other deleting & copying as the part of the
> > > update)? Thanks.
> >
> > mb_optimize_scan=0 -> almost 5 minutes
> >
> > mb_optimize_scan=1 -> almost 18 minutes
> >
> > https://github.com/lategoodbye/mb_optimize_scan_regress/commit/3f3fe8f87881687bb654051942923a6b78f16dec
>
> Thanks! So now the iostat data indeed looks substantially different.
>
> nooptimize optimize
> Total written 183.6 MB 190.5 MB
> Time (recorded) 283 s 1040 s
> Avg write request size 79 KB 41 KB
>
> So indeed with mb_optimize_scan=1 we do submit substantially smaller
> requests on average. So far I'm not sure why that is. Since Ojaswin can
> reproduce as well, let's see what he can see from block location info.
> Thanks again for help with debugging this and enjoy your vacation!
>
Hi Jan and Stefan,
Apologies for the delay, I was on leave yesterday and couldn't find time to get to this.
So I was able to collect the block numbers using the method you suggested. I converted the
blocks numbers to BG numbers and plotted that data to visualze the allocation spread. You can
find them here:
mb-opt=0, patched kernel: https://github.com/OjaswinM/mbopt-bug/blob/master/grpahs/mbopt-0-patched.png
mb-opt=1, patched kernel: https://github.com/OjaswinM/mbopt-bug/blob/master/grpahs/mbopt-1-patched.png
mb-opt=1, unpatched kernel: https://github.com/OjaswinM/mbopt-bug/blob/master/grpahs/mbopt-1-unpatched.png
Observations:
* Before the patched mb_optimize_scan=1 allocations were way more spread out in
40 different BGs.
* With the patch, we still allocate in 36 different BGs but majority happen in
just 1 or 2 BGs.
* With mb_optimize_scan=0, we only allocate in just 7 unique BGs, which could
explain why this is faster.
Also, one strange thing I'm seeing is that the perfs don't really show any
particular function causing the regression, which is surprising considering
mb_optimize_scan=1 almost takes 10 times more time.
All the perfs can be found here (raw files and perf report/diff --stdio ):
https://github.com/OjaswinM/mbopt-bug/tree/master/perfs
Lastly, FWIW I'm not able to replicate the regression when using loop devices
and mb_optmize_scan=1 performs similar to mb-opmtimize_scan=0 (without patches
as well). Not sure if this is related to the issue or just some side effect of
using loop devices.
Will post here if I have any updates on this.
Regards,
Ojaswin
Powered by blists - more mailing lists