linux-kernel - Re: [Regression] ext4: changes to mb_optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fcc28c9-c191-1d47-7d3d-c7dd82697ae0@i2se.com>
Date:   Tue, 16 Aug 2022 13:25:35 +0200
From:   Stefan Wahren <stefan.wahren@...e.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4@...r.kernel.org, Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
        Harshad Shirwadkar <harshadshirwadkar@...il.com>,
        Theodore Ts'o <tytso@....edu>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        linux-fsdevel@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Geetika.Moolchandani1@....com, regressions@...ts.linux.dev,
        Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on
 Raspberry Pi

Hi Jan,

Am 16.08.22 um 11:34 schrieb Jan Kara:
> Hi Stefan!
>
> On Sat 06-08-22 11:50:28, Stefan Wahren wrote:
>> Am 28.07.22 um 12:00 schrieb Jan Kara:
>>> Hello!
>>>
>>> On Mon 18-07-22 15:29:47, Stefan Wahren wrote:
>>>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>>>> unable to run "rpi-update" without massive performance regression on my
>>>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
>>>> tool successfully downloads the latest firmware (> 100 MB) on my development
>>>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
>>>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>> Thanks for report and the bisection!
>>>> - download takes endlessly much time and leads to an abort by userspace in
>>>> most cases because of the poor performance
>>>> - massive system load during download even after download has been aborted
>>>> (heartbeat LED goes wild)
>>> OK, is it that the CPU is busy or are we waiting on the storage card?
>>> Observing top(1) for a while should be enough to get the idea.  (sorry, I'm
>>> not very familiar with RPi so I'm not sure what heartbeat LED shows).
>> My description wasn't precise. I mean the green ACT LED, which uses the LED
>> heartbeat trigger:
>>
>> "This allows LEDs to be controlled by a CPU load average. The flash
>> frequency is a hyperbolic function of the 1-minute load average."
>>
>> I'm not sure if it's CPU or IO driven load, here the top output in bad case:
>>
>> top - 08:44:17 up 43 min,  2 users,  load average: 5,02, 5,45, 5,17
>> Tasks: 142 total,   1 running, 141 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  0,4 us,  0,4 sy,  0,0 ni, 49,0 id, 50,2 wa,  0,0 hi, 0,0 si,  0,0
>> st
>> MiB Mem :   7941,7 total,   4563,1 free,    312,7 used,   3066,0 buff/cache
>> MiB Swap:    100,0 total,    100,0 free,      0,0 used.   7359,6 avail Mem
> OK, there's plenty of memory available, CPUs are mostly idle, the load is
> likely created by tasks waiting for IO (which also contribute to load
> despite not consuming CPU). Not much surprising here.
>
>>> Can you run "iostat -x 1" while the download is running so that we can see
>>> roughly how the IO pattern looks?
>>>
>> Here the output during download:
>>
>> Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
>> mmcblk1          0,00    2,00      0,00     36,00     0,00 0,00   0,00
>> 0,00    0,00 23189,50  46,38     0,00    18,00 500,00 100,00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>             0,25    0,00    0,00   49,62    0,00   50,13
>>
>> Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
>> mmcblk1          0,00    2,00      0,00     76,00     0,00 0,00   0,00
>> 0,00    0,00 46208,50  92,42     0,00    38,00 500,00 100,00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>             0,25    0,00    0,00   49,62    0,00   50,13
>>
>> Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
>> mmcblk1          0,00    3,00      0,00     76,00     0,00 0,00   0,00
>> 0,00    0,00 48521,67 145,56     0,00    25,33 333,33 100,00
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>             0,25    0,00    0,00   49,62    0,00   50,13
> So this is interesting. We can see the card is 100% busy. The IO submitted
> to the card is formed by small requests - 18-38 KB per request - and each
> request takes 0.3-0.5s to complete. So the resulting throughput is horrible
> - only tens of KB/s. Also we can see there are many IOs queued for the
> device in parallel (aqu-sz columnt). This does not look like load I would
> expect to be generated by download of a large file from the web.
>
> You have mentioned in previous emails that with dd(1) you can do couple
> MB/s writing to this card which is far more than these tens of KB/s. So the
> file download must be doing something which really destroys the IO pattern
> (and with mb_optimize_scan=0 ext4 happened to be better dealing with it and
> generating better IO pattern). Can you perhaps strace the process doing the
> download (or perhaps strace -f the whole rpi-update process) so that we can
> see how does the load generated on the filesystem look like? Thanks!

i can do that. But may be the sources of rpi-update is more helpful?

https://github.com/raspberrypi/rpi-update/blob/master/rpi-update

>
> 								Honza