[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250321135524.GA1888695@cmpxchg.org>
Date: Fri, 21 Mar 2025 09:55:24 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Mel Gorman <mgorman@...hsingularity.net>, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH 1/5] mm: compaction: push watermark into
compaction_suitable() callers
On Fri, Mar 21, 2025 at 02:21:20PM +0800, kernel test robot wrote:
> commit: 6304be90cf5460f33b031e1e19cbe7ffdcbc9f66 ("[PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers")
> url: https://github.com/intel-lab-lkp/linux/commits/Johannes-Weiner/mm-compaction-push-watermark-into-compaction_suitable-callers/20250314-050839
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/all/20250313210647.1314586-2-hannes@cmpxchg.org/
> patch subject: [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> [ 24.321289][ T36] BUG: unable to handle page fault for address: ffff88844000c5f8
> [ 24.322631][ T36] #PF: supervisor read access in kernel mode
> [ 24.323577][ T36] #PF: error_code(0x0000) - not-present page
> [ 24.324482][ T36] PGD 3a01067 P4D 3a01067 PUD 0
> [ 24.325301][ T36] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [ 24.326157][ T36] CPU: 1 UID: 0 PID: 36 Comm: kcompactd0 Not tainted 6.14.0-rc6-00559-g6304be90cf54 #1
> [ 24.327631][ T36] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 24.329194][ T36] RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256)
> [ 24.330125][ T36] Code: 84 c0 78 14 4c 8b 97 48 06 00 00 45 31 db 4d 85 d2 4d 0f 4f da 4c 01 de 49 29 f1 41 f7 c0 38 02 00 00 0f 85 92 00 00 00 48 98 <48> 03 54 c7 38 49 39 d1 7e 7e b0 01 85 c9 74 7a 83 f9 0a 7f 73 48
> All code
> ========
> 0: 84 c0 test %al,%al
> 2: 78 14 js 0x18
> 4: 4c 8b 97 48 06 00 00 mov 0x648(%rdi),%r10
> b: 45 31 db xor %r11d,%r11d
> e: 4d 85 d2 test %r10,%r10
> 11: 4d 0f 4f da cmovg %r10,%r11
> 15: 4c 01 de add %r11,%rsi
> 18: 49 29 f1 sub %rsi,%r9
> 1b: 41 f7 c0 38 02 00 00 test $0x238,%r8d
> 22: 0f 85 92 00 00 00 jne 0xba
> 28: 48 98 cltq
> 2a:* 48 03 54 c7 38 add 0x38(%rdi,%rax,8),%rdx <-- trapping instruction
That would be the zone->lowmem_reserve[highest_zoneidx] deref:
long int lowmem_reserve[4]; /* 0x38 0x20 */
> 2f: 49 39 d1 cmp %rdx,%r9
> 32: 7e 7e jle 0xb2
> 34: b0 01 mov $0x1,%al
> 36: 85 c9 test %ecx,%ecx
> 38: 74 7a je 0xb4
> 3a: 83 f9 0a cmp $0xa,%ecx
> 3d: 7f 73 jg 0xb2
> 3f: 48 rex.W
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 03 54 c7 38 add 0x38(%rdi,%rax,8),%rdx
> 5: 49 39 d1 cmp %rdx,%r9
> 8: 7e 7e jle 0x88
> a: b0 01 mov $0x1,%al
> c: 85 c9 test %ecx,%ecx
> e: 74 7a je 0x8a
> 10: 83 f9 0a cmp $0xa,%ecx
> 13: 7f 73 jg 0x88
> 15: 48 rex.W
> [ 24.333001][ T36] RSP: 0018:ffffc90000137cd0 EFLAGS: 00010246
> [ 24.334003][ T36] RAX: 00000000000036a8 RBX: 0000000000000001 RCX: 0000000000000000
> [ 24.335270][ T36] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88843fff1080
and %rax and %rdx look like the swapped watermark and zoneidx (36a8 is
14k pages, or 54M, which matches a min watermark on a 16G system).
So this is the bug that Hugh fixed here:
https://lore.kernel.org/all/005ace8b-07fa-01d4-b54b-394a3e029c07@google.com/
It's resolved in the latest version of the patch in -mm.
Powered by blists - more mailing lists