linux-kernel - RE: Tasks stuck jbd2 for a long time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <70943eab978e4482b9fa4f68119bc8ea@amazon.com>
Date:   Thu, 24 Aug 2023 03:52:04 +0000
From:   "Lu, Davina" <davinalu@...zon.com>
To:     Theodore Ts'o <tytso@....edu>
CC:     "Bhatnagar, Rishabh" <risbhat@...zon.com>, Jan Kara <jack@...e.cz>,
        "jack@...e.com" <jack@...e.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Park, SeongJae" <sjpark@...zon.com>
Subject: RE: Tasks stuck jbd2 for a long time


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

> Thanks for the details.  This is something that am interested in trying to potentially to merge, since for a sufficiently coversion-heavy workload (assuming the conversion is happening 
> across multiple inodes, and not just a huge number of random writes into a single fallocated file), limiting the number of kernel threads to one CPU isn't always going to be the right thing.  
>The reason why we had done this way was because at the time, the only choices that we had was between a single kernel thread, or spawning a kernel thread for every single CPU -- 
>which for a very high-core-count system, consumed a huge amount of system resources.  This is no longer the case with the new Concurrency Managed Workqueue (cmwq), but we never 
>did the experiment to make sure cmwq didn't have surprising gotchas.

Thank you for the detailed explanation. 


> I won't have time to look at this before the next merge window, but what I'm hoping to look at is your patch at [2], with two changes:
> a)  Drop the _WQ_ORDERED flag, since it is an internal flag.
> b) Just pass in 0 for max_active instead of "num_active_cpus() > 1 ?
 > num_active_cpus() : 1", for two reasons.  Num_active_cpus() doesn't
 >  take into account CPU hotplugs (for example, if you have a
  > dynmically adjustable VM shape where the number of active CPU's
  > might change over time).  Is there a reason why we need to set that
  > limit?

> Do you see any potential problem with these changes?

Sorry for the late response, after the internal discussion, I can continue on this patch. These 2 points are easy to change, I will also do some xfstest for EXT4 and run BMS on RDS environment to do a quick verify.  We can change num_active_cpus() to 0. Why adding that: just because during fio test, the max active number goes to ~50 we won't see this issue. But this is not necessary. I will see what's Oleg's opinion later offline.

Thanks,
Davina