linux-kernel - Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f989315-e380-46aa-80d1-ce8608889e5f@marcinwanat.pl>
Date: Mon, 27 May 2024 10:22:02 +0200
From: Marcin Wanat <private@...cinwanat.pl>
To: Zhaoyang Huang <huangzhaoyang@...il.com>
Cc: Dave Chinner <david@...morbit.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 "zhaoyang.huang" <zhaoyang.huang@...soc.com>, Alex Shi <alexs@...nel.org>,
 "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
 Hugh Dickins <hughd@...gle.com>, Baolin Wang
 <baolin.wang@...ux.alibaba.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, steve.kang@...soc.com
Subject: Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock
 during migration

On 22.05.2024 12:13, Marcin Wanat wrote:
> On 22.05.2024 07:37, Zhaoyang Huang wrote:
>> On Tue, May 21, 2024 at 11:47 PM Marcin Wanat <private@...cinwanat.pl> 
>> wrote:
>>>
>>> On 21.05.2024 03:00, Zhaoyang Huang wrote:
>>>> On Tue, May 21, 2024 at 8:58 AM Zhaoyang Huang 
>>>> <huangzhaoyang@...il.com> wrote:
>>>>>
>>>>> On Tue, May 21, 2024 at 3:42 AM Marcin Wanat 
>>>>> <private@...cinwanat.pl> wrote:
>>>>>>
>>>>>> On 15.04.2024 03:50, Zhaoyang Huang wrote:
>>>>>> I have around 50 hosts handling high I/O (each with 20Gbps+ uplinks
>>>>>> and multiple NVMe drives), running RockyLinux 8/9. The stock RHEL
>>>>>> kernel 8/9 is NOT affected, and the long-term kernel 5.15.X is NOT 
>>>>>> affected.
>>>>>> However, with long-term kernels 6.1.XX and 6.6.XX,
>>>>>> (tested at least 10 different versions), this lockup always appears
>>>>>> after 2-30 days, similar to the report in the original thread.
>>>>>> The more load (for example, copying a lot of local files while
>>>>>> serving 20Gbps traffic), the higher the chance that the bug will 
>>>>>> appear.
>>>>>>
>>>>>> I haven't been able to reproduce this during synthetic tests,
>>>>>> but it always occurs in production on 6.1.X and 6.6.X within 2-30 
>>>>>> days.
>>>>>> If anyone can provide a patch, I can test it on multiple machines
>>>>>> over the next few days.
>>>>> Could you please try this one which could be applied on 6.6 
>>>>> directly. Thank you!
>>>> URL: https://lore.kernel.org/linux-mm/20240412064353.133497-1- 
>>>> zhaoyang.huang@...soc.com/
>>>>
>>>
>>> Unfortunately, I am unable to cleanly apply this patch against the
>>> latest 6.6.31
>> Please try below one which works on my v6.6 based android. Thank you
>> for your test in advance :D
>>
>> mm/huge_memory.c | 22 ++++++++++++++--------
>>   1 file changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> 
> I have compiled 6.6.31 with this patch and will test it on multiple 
> machines over the next 30 days. I will provide an update after 30 days 
> if everything is fine or sooner if any of the hosts experience the same 
> soft lockup again.
> 

First server with 6.6.31 and this patch hang today. Soft lockup changed 
to hard lockup:

[26887.389623] watchdog: Watchdog detected hard LOCKUP on cpu 21
[26887.389626] Modules linked in: nft_limit xt_limit xt_hashlimit 
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_connlimit 
nf_conncount tls xt_set ip_set_hash_net ip_set xt_CT xt_conntrack 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables 
nfnetlink rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency 
intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
rapl intel_cstate ipmi_ssif irdma ext4 mbcache ice iTCO_wdt jbd2 mgag200 
intel_pmc_bxt iTCO_vendor_support ib_uverbs i2c_algo_bit acpi_ipmi 
intel_uncore mei_me drm_shmem_helper pcspkr ib_core i2c_i801 ipmi_si 
drm_kms_helper mei lpc_ich i2c_smbus ioatdma intel_pch_thermal 
ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter joydev tcp_bbr 
drm fuse xfs libcrc32c sd_mod t10_pi sg crct10dif_pclmul crc32_pclmul 
crc32c_intel ixgbe polyval_clmulni ahci polyval_generic libahci mdio 
i40e libata megaraid_sas dca ghash_clmulni_intel wmi
[26887.389682] CPU: 21 PID: 264 Comm: kswapd0 Kdump: loaded Tainted: G 
      W          6.6.31.el9 #3
[26887.389685] Hardware name: FUJITSU PRIMERGY RX2540 M4/D3384-A1, BIOS 
V5.0.0.12 R1.22.0 for D3384-A1x                    06/04/2018
[26887.389687] RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389696] Code: 08 0f 92 c2 8b 45 00 0f b6 d2 c1 e2 08 30 e4 09 d0 
a9 00 01 ff ff 0f 85 ea 01 00 00 85 c0 74 12 0f b6 45 00 84 c0 74 0a f3 
90 <0f> b6 45 00 84 c0 75 f6 b8 01 00 00 00 66 89 45 00 5b 5d 41 5c 41
[26887.389698] RSP: 0018:ffffb3e587a87a20 EFLAGS: 00000002
[26887.389700] RAX: 0000000000000001 RBX: ffff9ad6c6f67050 RCX: 
0000000000000000
[26887.389701] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff9ad6c6f67050
[26887.389703] RBP: ffff9ad6c6f67050 R08: 0000000000000000 R09: 
0000000000000067
[26887.389704] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000046
[26887.389705] R13: 0000000000000200 R14: 0000000000000000 R15: 
ffffe1138aa98000
[26887.389707] FS:  0000000000000000(0000) GS:ffff9ade20340000(0000) 
knlGS:0000000000000000
[26887.389708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26887.389710] CR2: 000000002912809b CR3: 000000064401e003 CR4: 
00000000007706e0
[26887.389711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[26887.389712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[26887.389713] PKRU: 55555554
[26887.389714] Call Trace:
[26887.389717]  <NMI>
[26887.389720]  ? watchdog_hardlockup_check+0xac/0x150
[26887.389725]  ? __perf_event_overflow+0x102/0x1d0
[26887.389729]  ? handle_pmi_common+0x189/0x3e0
[26887.389735]  ? set_pte_vaddr_p4d+0x4a/0x60
[26887.389738]  ? flush_tlb_one_kernel+0xa/0x20
[26887.389742]  ? native_set_fixmap+0x65/0x80
[26887.389745]  ? ghes_copy_tofrom_phys+0x75/0x110
[26887.389751]  ? __ghes_peek_estatus.isra.0+0x49/0xb0
[26887.389755]  ? intel_pmu_handle_irq+0x10b/0x230
[26887.389756]  ? perf_event_nmi_handler+0x28/0x50
[26887.389759]  ? nmi_handle+0x58/0x150
[26887.389764]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389768]  ? default_do_nmi+0x6b/0x170
[26887.389770]  ? exc_nmi+0x12c/0x1a0
[26887.389772]  ? end_repeat_nmi+0x16/0x1f
[26887.389777]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389780]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389784]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389787]  </NMI>
[26887.389788]  <TASK>
[26887.389789]  __raw_spin_lock_irqsave+0x3d/0x50
[26887.389793]  folio_lruvec_lock_irqsave+0x5e/0x90
[26887.389798]  __page_cache_release+0x68/0x230
[26887.389801]  ? remove_migration_ptes+0x5c/0x80
[26887.389807]  __folio_put+0x24/0x60
[26887.389808]  __split_huge_page+0x368/0x520
[26887.389812]  split_huge_page_to_list+0x4b3/0x570
[26887.389816]  deferred_split_scan+0x1c8/0x290
[26887.389819]  do_shrink_slab+0x12f/0x2d0
[26887.389824]  shrink_slab_memcg+0x133/0x1d0
[26887.389829]  shrink_node_memcgs+0x18e/0x1d0
[26887.389832]  shrink_node+0xa7/0x370
[26887.389836]  balance_pgdat+0x332/0x6f0
[26887.389842]  kswapd+0xf0/0x190
[26887.389845]  ? balance_pgdat+0x6f0/0x6f0
[26887.389848]  kthread+0xee/0x120
[26887.389851]  ? kthread_complete_and_exit+0x20/0x20
[26887.389853]  ret_from_fork+0x2d/0x50
[26887.389857]  ? kthread_complete_and_exit+0x20/0x20
[26887.389859]  ret_from_fork_asm+0x11/0x20
[26887.389864]  </TASK>
[26887.389865] Kernel panic - not syncing: Hard LOCKUP
[26887.389867] CPU: 21 PID: 264 Comm: kswapd0 Kdump: loaded Tainted: G 
      W          6.6.31.el9 #3
[26887.389869] Hardware name: FUJITSU PRIMERGY RX2540 M4/D3384-A1, BIOS 
V5.0.0.12 R1.22.0 for D3384-A1x                    06/04/2018
[26887.389870] Call Trace:
[26887.389871]  <NMI>
[26887.389872]  dump_stack_lvl+0x44/0x60
[26887.389877]  panic+0x241/0x330
[26887.389881]  nmi_panic+0x2f/0x40
[26887.389883]  watchdog_hardlockup_check+0x119/0x150
[26887.389886]  __perf_event_overflow+0x102/0x1d0
[26887.389889]  handle_pmi_common+0x189/0x3e0
[26887.389893]  ? set_pte_vaddr_p4d+0x4a/0x60
[26887.389896]  ? flush_tlb_one_kernel+0xa/0x20
[26887.389899]  ? native_set_fixmap+0x65/0x80
[26887.389902]  ? ghes_copy_tofrom_phys+0x75/0x110
[26887.389906]  ? __ghes_peek_estatus.isra.0+0x49/0xb0
[26887.389909]  intel_pmu_handle_irq+0x10b/0x230
[26887.389911]  perf_event_nmi_handler+0x28/0x50
[26887.389913]  nmi_handle+0x58/0x150
[26887.389916]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389920]  default_do_nmi+0x6b/0x170
[26887.389922]  exc_nmi+0x12c/0x1a0
[26887.389923]  end_repeat_nmi+0x16/0x1f
[26887.389926] RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389930] Code: 08 0f 92 c2 8b 45 00 0f b6 d2 c1 e2 08 30 e4 09 d0 
a9 00 01 ff ff 0f 85 ea 01 00 00 85 c0 74 12 0f b6 45 00 84 c0 74 0a f3 
90 <0f> b6 45 00 84 c0 75 f6 b8 01 00 00 00 66 89 45 00 5b 5d 41 5c 41
[26887.389931] RSP: 0018:ffffb3e587a87a20 EFLAGS: 00000002
[26887.389933] RAX: 0000000000000001 RBX: ffff9ad6c6f67050 RCX: 
0000000000000000
[26887.389934] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff9ad6c6f67050
[26887.389935] RBP: ffff9ad6c6f67050 R08: 0000000000000000 R09: 
0000000000000067
[26887.389936] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000046
[26887.389937] R13: 0000000000000200 R14: 0000000000000000 R15: 
ffffe1138aa98000
[26887.389940]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389943]  ? native_queued_spin_lock_slowpath+0x6e/0x2c0
[26887.389946]  </NMI>
[26887.389947]  <TASK>
[26887.389947]  __raw_spin_lock_irqsave+0x3d/0x50
[26887.389950]  folio_lruvec_lock_irqsave+0x5e/0x90
[26887.389953]  __page_cache_release+0x68/0x230
[26887.389955]  ? remove_migration_ptes+0x5c/0x80
[26887.389958]  __folio_put+0x24/0x60
[26887.389960]  __split_huge_page+0x368/0x520
[26887.389963]  split_huge_page_to_list+0x4b3/0x570
[26887.389967]  deferred_split_scan+0x1c8/0x290
[26887.389971]  do_shrink_slab+0x12f/0x2d0
[26887.389974]  shrink_slab_memcg+0x133/0x1d0
[26887.389978]  shrink_node_memcgs+0x18e/0x1d0
[26887.389982]  shrink_node+0xa7/0x370
[26887.389985]  balance_pgdat+0x332/0x6f0
[26887.389991]  kswapd+0xf0/0x190
[26887.389994]  ? balance_pgdat+0x6f0/0x6f0
[26887.389997]  kthread+0xee/0x120
[26887.389998]  ? kthread_complete_and_exit+0x20/0x20
[26887.390000]  ret_from_fork+0x2d/0x50
[26887.390003]  ? kthread_complete_and_exit+0x20/0x20
[26887.390004]  ret_from_fork_asm+0x11/0x20
[26887.390009]  </TASK>