lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yt+M+JgW6KuZFMvc@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date:   Tue, 26 Jul 2022 12:13:04 +0530
From:   Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To:     Stefan Wahren <stefan.wahren@...e.com>
Cc:     linux-ext4@...r.kernel.org,
        Harshad Shirwadkar <harshadshirwadkar@...il.com>,
        "Theodore Ts'o" <tytso@....edu>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        linux-fsdevel@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Geetika.Moolchandani1@....com, regressions@...ts.linux.dev
Subject: Re: [Regression] ext4: changes to mb_optimize_scan cause issues on
 Raspberry Pi

On Mon, Jul 25, 2022 at 09:09:32PM +0200, Stefan Wahren wrote:
> Hi Ojaswin,
> 
> Am 25.07.22 um 17:07 schrieb Ojaswin Mujoo:
> > On Mon, Jul 18, 2022 at 03:29:47PM +0200, Stefan Wahren wrote:
> > > Hi,
> > > 
> > > i noticed that since Linux 5.18 (Linux 5.19-rc6 is still affected) i'm
> > > unable to run "rpi-update" without massive performance regression on my
> > > Raspberry Pi 4 (multi_v7_defconfig + CONFIG_ARM_LPAE). Using Linux 5.17 this
> > > tool successfully downloads the latest firmware (> 100 MB) on my development
> > > micro SD card (Kingston 16 GB Industrial) with a ext4 filesystem within ~ 1
> > > min. The same scenario on Linux 5.18 shows the following symptoms:
> > > 
> > > - download takes endlessly much time and leads to an abort by userspace in
> > > most cases because of the poor performance
> > > - massive system load during download even after download has been aborted
> > > (heartbeat LED goes wild)
> > > - whole system becomes nearly unresponsive
> > > - system load goes back to normal after > 10 min
> > > - dmesg doesn't show anything suspicious
> > > 
> > > I was able to bisect this issue:
> > > 
> > > ff042f4a9b050895a42cae893cc01fa2ca81b95c good
> > > 4b0986a3613c92f4ec1bdc7f60ec66fea135991f bad
> > > 25fd2d41b505d0640bdfe67aa77c549de2d3c18a bad
> > > b4bc93bd76d4da32600795cd323c971f00a2e788 bad
> > > 3fe2f7446f1e029b220f7f650df6d138f91651f2 bad
> > > b080cee72ef355669cbc52ff55dc513d37433600 good
> > > ad9c6ee642a61adae93dfa35582b5af16dc5173a good
> > > 9b03992f0c88baef524842e411fbdc147780dd5d bad
> > > aab4ed5816acc0af8cce2680880419cd64982b1d good
> > > 14705fda8f6273501930dfe1d679ad4bec209f52 good
> > > 5c93e8ecd5bd3bfdee013b6da0850357eb6ca4d8 good
> > > 8cb5a30372ef5cf2b1d258fce1711d80f834740a bad
> > > 077d0c2c78df6f7260cdd015a991327efa44d8ad bad
> > > cc5095747edfb054ca2068d01af20be3fcc3634f good
> > > 27b38686a3bb601db48901dbc4e2fc5d77ffa2c1 good
> > > 
> > > commit 077d0c2c78df6f7260cdd015a991327efa44d8ad
> > > Author: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > > Date:   Tue Mar 8 15:22:01 2022 +0530
> > > 
> > > ext4: make mb_optimize_scan performance mount option work with extents
> > > 
> > > If i revert this commit with Linux 5.19-rc6 the performance regression
> > > disappears.
> > > 
> > > Please ask if you need more information.
> > Hi Stefan,
> > 
> > Apologies, I had missed this email initially. So this particular patch
> > simply changed a typo in an if condition which was preventing the
> > mb_optimize_scan option to be enabled correctly (This feature was
> > introduced in the following commit [1]). I think with the
> > mb_optimize_scan now working, it is somehow causing the firmware
> > download/update to take a longer time.
> > 
> > I'll try to investigate this and get back with my findings.
> 
> thanks. I wasn't able to reproduce this heavy load symptoms with every SD
> card. Maybe this depends on the write performance of the SD card to trigger
> the situation (used command to measure write performance: dd if=/dev/zero
> of=/boot/test bs=1M count=30 oflag=dsync,direct ).
> 
> I tested a Kingston consumer 32 GB which had nearly constant write
> performance of 13 MB/s and didn't had the heavy load symptoms. The firmware
> update was done in a few seconds, so hard to say that at least the
> performance regression is reproducible.
> 
> I also tested 2x Kingston industrial 16 GB which had a floating write
> performance between 5 and 10 MB/s (wear leveling?) and both had the heavy
> load symptoms.
> 
> All SD cards has been detected as ultra high speed DDR50 by the emmc2
> interface.
> 
> Best regards
> 
> > 
> > Regard,
> > Ojaswin
> > 
> > [1]
> > 	commit 196e402adf2e4cd66f101923409f1970ec5f1af3
> > 	From: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > 	Date: Thu, 1 Apr 2021 10:21:27 -0700
> > 	
> > 	ext4: improve cr 0 / cr 1 group scanning
> > 
> > > Regards
> > > 

Thanks for the info Stefan, I'm still trying to reproduce the issue but
it's slightly challenging since I don't have my RPi handy at the moment. 

In the meantime, would you please try out the mb_optmize_scan=0 command
line options to see if that helps bypass the issue. This will help
confirm if the issue lies in mb_optmize_scan itself or if its something
else.

You can perhaps mount the root file system with this option using
the following kernel command line argument

rootflags="mb_optimize_scan=0"

You can also confirm if mb_optimize_scan was turned off by checking the
first line in output of:

cat /proc/fs/ext4/<dev>/mb_structs_summary

Regards,
Ojaswin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ