lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240815025226.8973-1-liuye@kylinos.cn>
Date: Thu, 15 Aug 2024 10:52:26 +0800
From: liuye <liuye@...inos.cn>
To: akpm@...ux-foundation.org
Cc: linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	liuye@...inos.cn
Subject: Re: Re: [PATCH] mm/vmscan: Fix hard LOCKUP in function isolate_lru_folios

> > Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
> 
> Merged in 2016.
> 
> Under what circumstances does it occur?  

User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from 
the ZONE_NORMAL area.

> Can you please describe how to reproduce this?  

Terminal 1: Construct to continuously increase pages active(anon). 
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st gu
 1  0      0 1445623076 45898836 83646008    0    0     0     0 1807 1682  0  0 100  0  0  0
 1  0      0 1445623076 43450228 86094616    0    0     0     0 1677 1468  0  0 100  0  0  0
 1  0      0 1445623076 41003480 88541364    0    0     0     0 1985 2022  0  0 100  0  0  0
 1  0      0 1445623076 38557088 90987756    0    0     0     4 1731 1544  0  0 100  0  0  0
 1  0      0 1445623076 36109688 93435156    0    0     0     0 1755 1501  0  0 100  0  0  0
 1  0      0 1445619552 33663256 95881632    0    0     0     0 2015 1678  0  0 100  0  0  0
 1  0      0 1445619804 31217140 98327792    0    0     0     0 2058 2212  0  0 100  0  0  0
 1  0      0 1445619804 28769988 100774944    0    0     0     0 1729 1585  0  0 100  0  0  0
 1  0      0 1445619804 26322348 103222584    0    0     0     0 1774 1575  0  0 100  0  0  0
 1  0      0 1445619804 23875592 105669340    0    0     0     4 1738 1604  0  0 100  0  0  0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB   
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark.

perf show nr_scanned=28835844. 
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon

If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

#5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000 
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 
    ffffc90006fb7c70: 0000000000000000 0000000000000000 
    ffffc90006fb7c80: 0000000000000000 0000000000000000 
    ffffc90006fb7c90: 0000000000000000 0000000000000000 
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 
    ffffc90006fb7cb0: 0000000000000000 0000000000000000 
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 

> Why do you think it took eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem.

> It looks like that will fix, but perhaps something more fundamental
> needs to be done - we're doing a tremendous amount of pretty pointless
> work here.  Answers to my above questions will help us resolve this.
> 
> Thanks.

Please refer to the above explanation for details.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ