[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fcc28c9-c191-1d47-7d3d-c7dd82697ae0@i2se.com>
Date: Tue, 16 Aug 2022 13:25:35 +0200
From: Stefan Wahren <stefan.wahren@...e.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
Harshad Shirwadkar <harshadshirwadkar@...il.com>,
Theodore Ts'o <tytso@....edu>,
Ritesh Harjani <riteshh@...ux.ibm.com>,
linux-fsdevel@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Geetika.Moolchandani1@....com, regressions@...ts.linux.dev,
Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on
Raspberry Pi
Hi Jan,
Am 16.08.22 um 11:34 schrieb Jan Kara:
> Hi Stefan!
>
> On Sat 06-08-22 11:50:28, Stefan Wahren wrote:
>> Am 28.07.22 um 12:00 schrieb Jan Kara:
>>> Hello!
>>>
>>> On Mon 18-07-22 15:29:47, Stefan Wahren wrote:
>>>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>>>> unable to run "rpi-update" without massive performance regression on my
>>>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
>>>> tool successfully downloads the latest firmware (> 100 MB) on my development
>>>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
>>>> min. The same scenario on Linux 5.18 shows the following symptoms:
>>> Thanks for report and the bisection!
>>>> - download takes endlessly much time and leads to an abort by userspace in
>>>> most cases because of the poor performance
>>>> - massive system load during download even after download has been aborted
>>>> (heartbeat LED goes wild)
>>> OK, is it that the CPU is busy or are we waiting on the storage card?
>>> Observing top(1) for a while should be enough to get the idea. (sorry, I'm
>>> not very familiar with RPi so I'm not sure what heartbeat LED shows).
>> My description wasn't precise. I mean the green ACT LED, which uses the LED
>> heartbeat trigger:
>>
>> "This allows LEDs to be controlled by a CPU load average. The flash
>> frequency is a hyperbolic function of the 1-minute load average."
>>
>> I'm not sure if it's CPU or IO driven load, here the top output in bad case:
>>
>> top - 08:44:17 up 43 min, 2 users, load average: 5,02, 5,45, 5,17
>> Tasks: 142 total, 1 running, 141 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 0,4 us, 0,4 sy, 0,0 ni, 49,0 id, 50,2 wa, 0,0 hi, 0,0 si, 0,0
>> st
>> MiB Mem : 7941,7 total, 4563,1 free, 312,7 used, 3066,0 buff/cache
>> MiB Swap: 100,0 total, 100,0 free, 0,0 used. 7359,6 avail Mem
> OK, there's plenty of memory available, CPUs are mostly idle, the load is
> likely created by tasks waiting for IO (which also contribute to load
> despite not consuming CPU). Not much surprising here.
>
>>> Can you run "iostat -x 1" while the download is running so that we can see
>>> roughly how the IO pattern looks?
>>>
>> Here the output during download:
>>
>> Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
>> mmcblk1 0,00 2,00 0,00 36,00 0,00 0,00 0,00
>> 0,00 0,00 23189,50 46,38 0,00 18,00 500,00 100,00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0,25 0,00 0,00 49,62 0,00 50,13
>>
>> Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
>> mmcblk1 0,00 2,00 0,00 76,00 0,00 0,00 0,00
>> 0,00 0,00 46208,50 92,42 0,00 38,00 500,00 100,00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0,25 0,00 0,00 49,62 0,00 50,13
>>
>> Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm
>> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
>> mmcblk1 0,00 3,00 0,00 76,00 0,00 0,00 0,00
>> 0,00 0,00 48521,67 145,56 0,00 25,33 333,33 100,00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 0,25 0,00 0,00 49,62 0,00 50,13
> So this is interesting. We can see the card is 100% busy. The IO submitted
> to the card is formed by small requests - 18-38 KB per request - and each
> request takes 0.3-0.5s to complete. So the resulting throughput is horrible
> - only tens of KB/s. Also we can see there are many IOs queued for the
> device in parallel (aqu-sz columnt). This does not look like load I would
> expect to be generated by download of a large file from the web.
>
> You have mentioned in previous emails that with dd(1) you can do couple
> MB/s writing to this card which is far more than these tens of KB/s. So the
> file download must be doing something which really destroys the IO pattern
> (and with mb_optimize_scan=0 ext4 happened to be better dealing with it and
> generating better IO pattern). Can you perhaps strace the process doing the
> download (or perhaps strace -f the whole rpi-update process) so that we can
> see how does the load generated on the filesystem look like? Thanks!
i can do that. But may be the sources of rpi-update is more helpful?
https://github.com/raspberrypi/rpi-update/blob/master/rpi-update
>
> Honza
Powered by blists - more mailing lists