lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <3b3262d9-5383-494e-a19b-698a9e289c2c@leemhuis.info>
Date: Sat, 10 Aug 2024 07:36:34 +0200
From: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
To: Filipe Manana <fdmanana@...e.com>,
 Abhinav Praveen <abhinav@...veen.org.uk>
Cc: regressions@...ts.linux.dev, linux-btrfs <linux-btrfs@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>, David Sterba <dsterba@...e.com>,
 Josef Bacik <josef@...icpanda.com>, Chris Mason <clm@...com>
Subject: Re: Bisected Regression: Cache filling up causing drastic performance
 degradation on Linux 6.10.3

On 10.08.24 02:28, Abhinav Praveen wrote:
> I recently ran into an IO/Memory Management issue and posted about it on
> the linux-mm mailing list here:
> https://marc.info/?l=linux-mm&m=172306192530745&w=2
> 
> I have since bisected with mainline and found that:
> 956a17d9d050761e34ae6f2624e9c1ce456de204 is the first bad commit

TWIMC, that is 956a17d9d05076 ("btrfs: add a shrinker for extent maps")
[v6.10-rc1]

Adding Filipe and the Btrfs folks to the list of recipients.

Abhinav: thx for the report. There are at least two other discussion
ongoing about what to my untrained eyes look like similar problems that
remain after the fixes than went into 6.10 right before the release. You
might want to consult them:

https://lore.kernel.org/all/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMaNSpR556DdLA@mail.gmail.com/
https://bugzilla.kernel.org/show_bug.cgi?id=219121

Ciao, Thorsten

> The issue is present on mainline commit:
> 58d40f5f8131479a1e688828e2fa0a7836cf5358 (Fri Aug 9 10:23:18 2024)
> 
> The bisect log is below:
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [58d40f5f8131479a1e688828e2fa0a7836cf5358] Merge tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
> git bisect bad 58d40f5f8131479a1e688828e2fa0a7836cf5358
> # status: waiting for good commit(s), bad commit known
> # good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
> git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
> # bad: [3e334486ec5cc6e79e7b0c4f58757fe8e05fbe5a] Merge tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect bad 3e334486ec5cc6e79e7b0c4f58757fe8e05fbe5a
> # bad: [d34672777da3ea919e8adb0670ab91ddadf7dea0] Merge tag 'fbdev-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
> git bisect bad d34672777da3ea919e8adb0670ab91ddadf7dea0
> # bad: [b850dc206a57ae272c639e31ac202ec0c2f46960] Merge tag 'firewire-updates-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
> git bisect bad b850dc206a57ae272c639e31ac202ec0c2f46960
> # good: [59729c8a76544d9d7651287a5d28c5bf7fc9fccc] Merge tag 'tag-chrome-platform-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
> git bisect good 59729c8a76544d9d7651287a5d28c5bf7fc9fccc
> # good: [101b7a97143a018b38b1f7516920a7d7d23d1745] Merge tag 'acpi-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect good 101b7a97143a018b38b1f7516920a7d7d23d1745
> # good: [47e9bff7fc042b28eb4cf375f0cf249ab708fdfa] Merge tag 'erofs-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
> git bisect good 47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
> # bad: [b2665fe61d8a51ef70b27e1a830635a72dcc6ad8] Merge tag 'ata-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
> git bisect bad b2665fe61d8a51ef70b27e1a830635a72dcc6ad8
> # bad: [aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2] btrfs: handle errors in btrfs_reloc_clone_csums properly
> git bisect bad aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2
> # good: [d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028] btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node
> git bisect good d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028
> # good: [5fa8a6baff817c1b427aa7a8bfc1482043be6d58] btrfs: pass the extent map tree's inode to try_merge_map()
> git bisect good 5fa8a6baff817c1b427aa7a8bfc1482043be6d58
> # bad: [9a7b68d32afc4e92909c21e166ad993801236be3] btrfs: report filemap_fdata<write|wait>_range() error
> git bisect bad 9a7b68d32afc4e92909c21e166ad993801236be3
> # bad: [85d288309ab5463140a2d00b3827262fb14e7db4] btrfs: use btrfs_get_fs_generation() at try_release_extent_mapping()
> git bisect bad 85d288309ab5463140a2d00b3827262fb14e7db4
> # bad: [65bb9fb00b7012a78b2f5d1cd042bf098900c5d3] btrfs: update comment for btrfs_set_inode_full_sync() about locking
> git bisect bad 65bb9fb00b7012a78b2f5d1cd042bf098900c5d3
> # bad: [956a17d9d050761e34ae6f2624e9c1ce456de204] btrfs: add a shrinker for extent maps
> git bisect bad 956a17d9d050761e34ae6f2624e9c1ce456de204
> # good: [f1d97e76915285013037c487d9513ab763005286] btrfs: add a global per cpu counter to track number of used extent maps
> git bisect good f1d97e76915285013037c487d9513ab763005286
> # first bad commit: [956a17d9d050761e34ae6f2624e9c1ce456de204] btrfs: add a shrinker for extent maps
> 
> The original issue (from my previous post) is as follows:
> 
> If I read from my Steam Library (this has about 430GiB of data), stored on an
> ext4 formatted NVMe drive like this:
> 
> find /mnt/SteamLibrary/steamapps/common -type f -exec cat {} + -type f | pv >
> /dev/null
> 
> I see that it initially starts reading at 800MiB/s (6.4 Gbps) then, once my
> cache fills up (as shown by buff/cache in free), the read speed drops to as low
> as 6MiB/s (48 Mbps) but periodically returns to 800MiB/s as the cache gets
> freed.
> 
> When the cache fills, other tasks are also affected (e.g video playback
> stutters or stops). I also see high CPU usage from kswapd0 and btrfs-cleaner
> (which is strange because, again, it's an ext4 filesystem that I'm reading
> from) using top.
> 
> Running echo 1 > /proc/sys/vm/drop_caches immediately improves performance.
> 
> But, instead, if I run the same read command in a Memory cgroup with memory.max
> set to 500M, I get a solid 800MiB/s read speed without filling up the cache or
> affecting other tasks.
> 
> TL;DR simply reading files seems to be enough to cause major system-wide
> performance degradation. This also applies when updating games on Steam or
> moving them between Library locations.
> 
> Anyone know if this is a bug or regression in Linux 6.10? Or whether there are
> any tunables or Sysctls that could improve performance without manually running
> things in CGroups?
> 
> This happens on a AMD 7950X3D with 96GB of ram.
> 
> I describe the same thing on my post at:
> https://www.reddit.com/r/linuxquestions/comments/1emetro/cache_filling_up_causing_drastic_performance/
> 
> It also seems that someone else has experienced something similar here*:
> https://www.reddit.com/r/linuxquestions/comments/1e83ltj/610_disk_caching_vs_memory_exhaustion_issues/
> 
> *Their issue seems to have been resolved by 6.10.2 however.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ