linux-ext4 - Re: [Regression] ext4: changes to mb_optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64b7899f-d84d-93de-f9c5-49538bd080d0@i2se.com>
Date:   Sat, 6 Aug 2022 11:50:28 +0200
From:   Stefan Wahren <stefan.wahren@...e.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4@...r.kernel.org, Ojaswin Mujoo <ojaswin@...ux.ibm.com>,
        Harshad Shirwadkar <harshadshirwadkar@...il.com>,
        Theodore Ts'o <tytso@....edu>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        linux-fsdevel@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Geetika.Moolchandani1@....com, regressions@...ts.linux.dev,
        Florian Fainelli <f.fainelli@...il.com>
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on
 Raspberry Pi

Hi Jan,

+add Florian

Am 28.07.22 um 12:00 schrieb Jan Kara:
> Hello!
>
> On Mon 18-07-22 15:29:47, Stefan Wahren wrote:
>> i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
>> unable to run "rpi-update" without massive performance regression on my
>> Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
>> tool successfully downloads the latest firmware (> 100 MB) on my development
>> micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
>> min. The same scenario on Linux 5.18 shows the following symptoms:
> Thanks for report and the bisection!
>   
>> - download takes endlessly much time and leads to an abort by userspace in
>> most cases because of the poor performance
>> - massive system load during download even after download has been aborted
>> (heartbeat LED goes wild)
> OK, is it that the CPU is busy or are we waiting on the storage card?
> Observing top(1) for a while should be enough to get the idea.  (sorry, I'm
> not very familiar with RPi so I'm not sure what heartbeat LED shows).

My description wasn't precise. I mean the green ACT LED, which uses the 
LED heartbeat trigger:

"This allows LEDs to be controlled by a CPU load average. The flash 
frequency is a hyperbolic function of the 1-minute load average."

I'm not sure if it's CPU or IO driven load, here the top output in bad case:

top - 08:44:17 up 43 min,  2 users,  load average: 5,02, 5,45, 5,17
Tasks: 142 total,   1 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,4 us,  0,4 sy,  0,0 ni, 49,0 id, 50,2 wa,  0,0 hi, 0,0 si,  
0,0 st
MiB Mem :   7941,7 total,   4563,1 free,    312,7 used,   3066,0 buff/cache
MiB Swap:    100,0 total,    100,0 free,      0,0 used.   7359,6 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   879 pi        20   0  237712  50880  38824 S   1,6   0,6 0:07.69 
lxterminal
   539 root      20   0  203424  46356  25228 S   1,3   0,6 0:08.66 Xorg
  1046 pi        20   0   10296   2936   2556 R   1,3   0,0 0:03.13 top
   680 pi        20   0   83820  22172  17648 S   0,7   0,3 0:01.22 pcmanfm
    11 root      20   0       0      0      0 I   0,3   0,0 0:01.13 
rcu_sched
   234 root       0 -20       0      0      0 I   0,3   0,0 0:00.20 
kworker/u+
     1 root      20   0   33200   8580   6468 S   0,0   0,1 0:14.17 systemd
     2 root      20   0       0      0      0 S   0,0   0,0 0:00.02 
kthreadd
     3 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 rcu_gp
     4 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
rcu_par_gp
     8 root       0 -20       0      0      0 I   0,0   0,0 0:00.00 
mm_percpu+
     9 root      20   0       0      0      0 I   0,0   0,0 0:00.00 
rcu_tasks+
    10 root      20   0       0      0      0 S   0,0   0,0 0:03.77 
ksoftirqd+
    12 root      rt   0       0      0      0 S   0,0   0,0 0:00.01 
migration+
    14 root      20   0       0      0      0 S   0,0   0,0 0:00.00 cpuhp/0
    15 root      20   0       0      0      0 S   0,0   0,0 0:00.00 cpuhp/1
    16 root      rt   0       0      0      0 S   0,0   0,0 0:00.01 
migration+

>
>> - whole system becomes nearly unresponsive
>> - system load goes back to normal after > 10 min
> So what likely happens is that the downloaded data is in the pagecache and
> what is causing the stuckage is that we are writing it back to the SD card
> that somehow is much less efficient with mb_optimize_scan=1 for your setup.
> Even if you stop the download, we still have dirty data in the page cache
> which we need to write out so that is the reason why the system takes so
> long to return back to normal.
>
>> - dmesg doesn't show anything suspicious
>>
>> I was able to bisect this issue:
>>
>> ff042f4a9b050895a42cae893cc01fa2ca81b95c good
>> 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
>> 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
>> b4bc93bd76d4da32600795cd323c971f00a2e788 bad
>> 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
>> b080cee72ef355669cbc52ff55dc513d37433600 good
>> ad9c6ee642a61adae93dfa35582b5af16dc5173a good
>> 9b03992f0c88baef524842e411fbdc147780dd5d bad
>> aab4ed5816acc0af8cce2680880419cd64982b1d good
>> 14705fda8f6273501930dfe1d679ad4bec209f52 good
>> 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
>> 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
>> 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
>> cc5095747edfb054ca2068d01af20be3fcc3634f good
>> 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
>>
>> commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
>> Author: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>> Date:   Tue Mar 8 15:22:01 2022 +0530
>>
>> ext4: make mb_optimize_scan performance mount option work with extents
>>
>> If i revert this commit with Linux 5.19-rc6 the performance regression
>> disappears.
>>
>> Please ask if you need more information.
> Can you run "iostat -x 1" while the download is running so that we can see
> roughly how the IO pattern looks?
>
Here the output during download:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     36,00     0,00 0,00   0,00   
0,00    0,00 23189,50  46,38     0,00    18,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     76,00     0,00 0,00   0,00   
0,00    0,00 46208,50  92,42     0,00    38,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00     76,00     0,00 0,00   0,00   
0,00    0,00 48521,67 145,56     0,00    25,33 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    1,00      0,00     32,00     0,00 0,00   0,00   
0,00    0,00 57416,00  57,42     0,00    32,00 1000,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,75    0,00   50,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00    112,00     0,00 0,00   0,00   
0,00    0,00 49107,67 147,32     0,00    37,33 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    1,00      0,00     12,00     0,00 0,00   0,00   
0,00    0,00 59402,00  59,40     0,00    12,00 1000,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00    148,00     0,00 0,00   0,00   
0,00    0,00 51140,33 153,42     0,00    49,33 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00     68,00     0,00 0,00   0,00   
0,00    0,00 53751,00 161,25     0,00    22,67 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     32,00     0,00 0,00   0,00   
0,00    0,00 53363,50 106,73     0,00    16,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     24,00     0,00 0,00   0,00   
0,00    0,00 39266,00  78,53     0,00    12,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,75    0,00   50,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     80,00     0,00 0,00   0,00   
0,00    0,00 40135,00  80,27     0,00    40,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    5,00      0,00    184,00     0,00 0,00   0,00   
0,00    0,00 51459,80 257,30     0,00    36,80 200,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00     56,00     0,00 0,00   0,00   
0,00    0,00 52412,50 104,83     0,00    28,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00     80,00     0,00 0,00   0,00   
0,00    0,00 56386,00 169,16     0,00    26,67 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    5,00      0,00     84,00     0,00 0,00   0,00   
0,00    0,00 53314,20 266,57     0,00    16,80 200,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,50    0,00    0,00   49,50    0,00   50,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    4,00      0,00     76,00     0,00 0,00   0,00   
0,00    0,00 58021,00 232,08     0,00    19,00 250,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,25   49,50    0,00   50,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    4,00      0,00     72,00     0,00 0,00   0,00   
0,00    0,00 62048,50 248,19     0,00    18,00 250,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,62    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00    116,00     0,00 0,00   0,00   
0,00    0,00 69769,00 139,54     0,00    58,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            6,47    0,00    2,49   59,20    0,00   31,84

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          1,00    4,00    132,00    624,00    32,00 243,00  
96,97  98,38  530,00 30246,50 121,52   132,00   156,00 178,00  89,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,00    0,00    0,00   49,87    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00    296,00     0,00 0,00   0,00   
0,00    0,00 1358,50   2,72     0,00   148,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,00   49,75    0,00   50,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    3,00      0,00    124,00     0,00 1,00   0,00  
25,00    0,00 2043,67   6,13     0,00    41,33 333,33 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,25    0,00    0,25   49,37    0,00   50,13

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    6,00      0,00    412,00     0,00 0,00   0,00   
0,00    0,00 3315,17  19,89     0,00    68,67 166,67 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,51    0,00    0,00   48,47    0,00   51,02

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    1,00      0,00     56,00     0,00 0,00   0,00   
0,00    0,00 4249,00   4,25     0,00    56,00 1000,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,50    0,00    0,75   48,87    0,00   49,87

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    2,00      0,00    248,00     0,00 0,00   0,00   
0,00    0,00 5190,50  10,38     0,00   124,00 500,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,75    0,00    0,00   49,38    0,00   49,88

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
mmcblk1          0,00    1,00      0,00     24,00     0,00 0,00   0,00   
0,00    0,00 6017,00   6,02     0,00    24,00 1000,00 100,00


Additionally i tried to compare the interrupt count between good and bad 
case. I booted the RPi 4, run rpi-update and dumped /proc/interrupts 
after 60 sec. In bad case the progress stuck somewhere in the half (not 
really deterministic) and in good it's finished or almost. Here is the diff:

--- intr_bad.log    2022-08-06 11:12:23.445661581 +0200
+++ intr_good.log    2022-08-06 11:27:32.104188861 +0200
@@ -1,19 +1,19 @@
             CPU0       CPU1       CPU2       CPU3
   25:          1          0          0          0     GICv2  99 
Level     timer
   26:          0          0          0          0     GICv2  29 
Level     arch_timer
- 27:       4551       3011       3326       2569     GICv2  30 
Level     arch_timer
+ 27:       4532       3367       3381       3439     GICv2  30 
Level     arch_timer
   35:          0          0          0          0     GICv2 175 
Level     PCIe PME
   36:        345          0          0          0     GICv2 112 
Level     DMA IRQ
   37:          0          0          0          0     GICv2 114 
Level     DMA IRQ
   43:         10          0          0          0     GICv2 125 
Level     ttyS1
   44:          0          0          0          0     GICv2 149 
Level     fe205000.i2c, fe804000.i2c
- 45:      34529          0          0          0     GICv2 158 
Level     mmc0, mmc1
- 46:        853          0          0          0     GICv2  65 
Level     fe00b880.mailbox
+ 45:      36427          0          0          0     GICv2 158 
Level     mmc0, mmc1
+ 46:        941          0          0          0     GICv2  65 
Level     fe00b880.mailbox
   47:       6864          0          0          0     GICv2 153 
Level     uart-pl011
   48:          0          0          0          0     GICv2 105 
Level     fe980000.usb, fe980000.usb
- 49:        925          0          0          0  BRCM STB PCIe MSI 
524288 Edge      xhci_hcd
- 50:      46821          0          0          0     GICv2 189 
Level     eth0
- 51:      15199          0          0          0     GICv2 190 
Level     eth0
+ 49:        836          0          0          0  BRCM STB PCIe MSI 
524288 Edge      xhci_hcd
+ 50:      86424          0          0          0     GICv2 189 
Level     eth0
+ 51:      23329          0          0          0     GICv2 190 
Level     eth0
   52:          0          0          0          0     GICv2 129 
Level     vc4 hvs
   53:          0          0          0          0 
interrupt-controller@...00100   4 Edge      vc4 hdmi hpd connected
   54:          0          0          0          0 
interrupt-controller@...00100   5 Edge      vc4 hdmi hpd disconnected
@@ -22,12 +22,12 @@
   57:          0          0          0          0     GICv2 107 
Level     fe004000.txp
   58:          0          0          0          0     GICv2 141 
Level     vc4 crtc
   59:          0          0          0          0     GICv2 142 
Level     vc4 crtc, vc4 crtc
- 60:         10          0          0          0     GICv2 133 
Level     vc4 crtc
+ 60:         11          0          0          0     GICv2 133 
Level     vc4 crtc
  IPI0:          0          0          0          0  CPU wakeup interrupts
  IPI1:          0          0          0          0  Timer broadcast 
interrupts
-IPI2:        216        252        218        269  Rescheduling interrupts
-IPI3:      16955      16920      16352      16607  Function call interrupts
+IPI2:        265        264        229        248  Rescheduling interrupts
+IPI3:       9865      37517      13414      24022  Function call interrupts
  IPI4:          0          0          0          0  CPU stop interrupts
-IPI5:        826        190        290        204  IRQ work interrupts
+IPI5:        532        245        238        210  IRQ work interrupts
  IPI6:          0          0          0          0  completion interrupts
  Err:          0