lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+FbhJN9-rbPSpHbgLmjcgV3=1QmqUVsHS70KpnL5i_DxMp4bg@mail.gmail.com>
Date:   Fri, 29 Sep 2023 12:36:56 +0200
From:   Marcus Seyfarth <m.seyfarth@...il.com>
To:     Tor Vic <torvic9@...lbox.org>
Cc:     Bagas Sanjaya <bagasdotme@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Triplett <josh@...htriplett.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Zqiang <qiang.zhang1211@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux RCU <rcu@...r.kernel.org>
Subject: Re: Fwd: [6.5.5] System slowdown during compilation workload and RIP: lazy_rcu_shrink_scan

> This seems to be a heavily patched kernel.
> Does this problem also appear with a vanilla 6.5 kernel?

Indeed, CachyOS comes with additional patches. I haven't found an easy
way to try out a vanilla Kernel yet (there is
https://aur.archlinux.org/packages/linux-mainline - but that is
already on 6.6 RC3). As CachyOS also makes use of ananicy and uksmd, I
don't know if it is the best idea for the stability of the system to
test with such a vanilla Kernel that doesn't support these extra
features.

I can present a new data point however which looks quite a bit
different: I've attached a new trace from today of an OOM that
recovered successfully (which means without the freezes afterwards)
with the same Kernel in use, the relevant part is:

[29. Sep 12:14] Qt bearer threa invoked oom-killer:
gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0,
oom_score_adj=200
[  +0,000007] CPU: 25 PID: 1126 Comm: Qt bearer threa Tainted: G
    O       6.5.5-2.1-cachyos-lto #1
ae9643c86e4447bdd5b0d7da31c14411335d3e8d
[  +0,000003] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS
CX99DE26 10/10/2020
[  +0,000001] Call Trace:
[  +0,000002]  <TASK>
[  +0,000003]  dump_header+0x51/0x260
[  +0,000005]  oom_kill_process+0x92/0x1a0
[  +0,000003]  out_of_memory+0x227/0x320
[  +0,000002]  __folio_alloc+0x2e46/0x6ee0
[  +0,000005]  ? blk_mq_flush_plug_list+0xaa/0xa00
[  +0,000005]  __filemap_get_folio+0x1e2/0x460
[  +0,000002]  filemap_fault+0x56c/0x1260
[  +0,000004]  do_pte_missing+0x194/0x2da0
[  +0,000004]  ? ____fput+0x550/0x2d60
[  +0,000002]  ? rtnl_dump_all+0xff/0x120
[  +0,000004]  ? free_unref_page+0x237/0xc20
[  +0,000003]  ? __wake_up+0xe4/0x1c0
[  +0,000004]  handle_mm_fault+0x976/0xe00
[  +0,000003]  do_user_addr_fault+0x8ca/0x2f80
[  +0,000002]  ? do_syscall_64+0x68/0x80
[  +0,000005]  exc_page_fault+0x66/0x160
[  +0,000003]  asm_exc_page_fault+0x22/0x30
[  +0,000005] RIP: 0033:0x7f5243289003
[  +0,000014] Code: Unable to access opcode bytes at 0x7f5243288fd9.
[  +0,000001] RSP: 002b:00007f52227fae98 EFLAGS: 00010206
[  +0,000002] RAX: 00007f5272534d40 RBX: 00007f520c012930 RCX: 0000000000000055
[  +0,000002] RDX: 0000000000000005 RSI: 00007f520c00e270 RDI: 00007f520c012950
[  +0,000001] RBP: 0000557650c0c690 R08: 00007f520c011460 R09: 00000007f520c00e
[  +0,000001] R10: 00007f520c000058 R11: 0000000000000003 R12: 00007f52227faf28
[  +0,000001] R13: 00007f520c00f4e0 R14: 0000000000000000 R15: 00007f520c00f4f8
[  +0,000001]  </TASK>
[  +0,000001] Mem-Info:
[  +0,000001] active_anon:1474442 inactive_anon:9717963 isolated_anon:0
               active_file:10365 inactive_file:6709 isolated_file:0
               unevictable:0 dirty:0 writeback:0
               slab_reclaimable:100405 slab_unreclaimable:112907
               mapped:2811 shmem:1465 pagetables:58902
               sec_pagetables:0 bounce:0
               kernel_misc_reclaimable:0
               free:57924 free_pcp:4870 free_cma:0
[  +0,000004] Node 0 active_anon:5897768kB inactive_anon:38871852kB
active_file:41460kB inactive_file:26836kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB mapped:11244kB dirty:0>
[  +0,000003] DMA free:15360kB boost:0kB min:8kB low:200kB high:392kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepen>
[  +0,000003] lowmem_reserve[]: 0 1762 47954 47954
[  +0,000003] DMA32 free:185848kB boost:0kB min:1056kB low:24244kB
high:47432kB reserved_highatomic:64KB active_anon:204464kB
inactive_anon:1429212kB active_file:0kB inactive_file:0kB un>
[  +0,000002] lowmem_reserve[]: 0 0 46191 46191
[  +0,000002] Normal free:30488kB boost:0kB min:26964kB low:618252kB
high:1209540kB reserved_highatomic:4352KB active_anon:5693304kB
inactive_anon:37442640kB active_file:41652kB inactive>
[  +0,000003] lowmem_reserve[]: 0 0 0 0
[  +0,000002] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 1*1024kB (M) 1*2048kB (M) 3*4096kB (M) = 15360kB
[  +0,000006] DMA32: 176*4kB (UM) 99*8kB (UM) 486*16kB (UME) 174*32kB
(UME) 134*64kB (UME) 65*128kB (UME) 8*256kB (UME) 5*512kB (M)
40*1024kB (ME) 11*2048kB (UM) 21*4096kB (M) = 185848kB
[  +0,000008] Normal: 308*4kB (UME) 301*8kB (UME) 344*16kB (UME)
409*32kB (UME) 35*64kB (UME) 7*128kB (UM) 3*256kB (U) 1*512kB (U)
0*1024kB 0*2048kB 0*4096kB = 26648kB
[  +0,000007] 20391 total pagecache pages
[  +0,000001] 104 pages in swap cache
[  +0,000001] Free swap  = 64kB
[  +0,000000] Total swap = 12293372kB
[  +0,000001] 12542844 pages RAM
[  +0,000000] 0 pages HighMem/MovableOnly
[  +0,000001] 249387 pages reserved
[  +0,000000] 0 pages hwpoisoned
[  +0,000001] Tasks state (memory values in pages):


> What if you disable RCU_LAZY?
I will try that out over the next coming days; by default it is
enabled on CachyOS.

View attachment "dmesg2.txt" of type "text/plain" (111506 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ