[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202412051551.690e9656-lkp@intel.com>
Date: Thu, 5 Dec 2024 16:43:24 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Rik van Riel <riel@...riel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, Ingo Molnar <mingo@...nel.org>, Dave Hansen
<dave.hansen@...el.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>, Mel Gorman <mgorman@...e.de>,
<oliver.sang@...el.com>
Subject: [tip:x86/mm] [x86/mm/tlb] 209954cbc7:
WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func
Hello,
besides the performance report
"[tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression"
in
https://lore.kernel.org/all/202411282207.6bd28eae-lkp@intel.com/
we now also observed a WARNING from another test. the issue doesn't always
happen, so we run it more to make sure the parent keep clean.
we also tested the patch in
https://lore.kernel.org/all/20241202202213.26a79ed6@fangorn/
as below [1], the issue is not fixed by this patch, still happen with similar
rate.
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq/vm-scalability/always/always
commit:
7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
40036730a9 ("x86,mm: only trim the mm_cpumask once a second") [1]
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:50 58% 29:50 56% 28:50 dmesg.RIP:flush_tlb_func
:50 58% 29:50 56% 28:50 dmesg.WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func
below full report FYI.
kernel test robot noticed "WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func" on:
commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
[test failed on linux-next/master f486c8aa16b8172f63bddc70116a0c897a7f3f02]
in testcase: vm-scalability
version: vm-scalability-x86_64-6f4ef16-0_20241103
with following parameters:
runtime: 300
thp_enabled: always
thp_defrag: always
nr_task: 32
nr_ssd: 1
priority: 1
test: swap-w-seq
cpufreq_governor: performance
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202412051551.690e9656-lkp@intel.com
[ 210.338271][ T4668] ------------[ cut here ]------------
[ 210.343902][ T4668] WARNING: CPU: 38 PID: 4668 at arch/x86/mm/tlb.c:815 flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.353137][ T4668] Modules linked in: xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit libnvdimm btrfs x86_pkg_temp_thermal intel_powerclamp blake2b_generic xor coretemp raid6_pq libcrc32c sd_mod kvm_intel sg kvm crct10dif_pclmul crc32_pclmul crc32c_intel snd_pcm binfmt_misc ipmi_ssif ghash_clmulni_intel snd_timer dax_hmem ahci cxl_acpi rapl snd ast libahci cxl_port intel_th_gth intel_cstate mei_me drm_shmem_helper cxl_core acpi_power_meter ioatdma isst_if_mbox_pci i2c_i801 isst_if_mmio soundcore intel_th_pci intel_uncore einj libata drm_kms_helper mei pcspkr isst_if_common intel_th intel_pch_thermal i2c_smbus intel_vsec dca wmi ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev drm fuse dm_mod loop ip_tables
[ 210.425152][ T4668] CPU: 38 UID: 0 PID: 4668 Comm: usemem Not tainted 6.12.0-rc7-00004-g209954cbc7d0 #1
[ 210.434817][ T4668] RIP: 0010:flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.440408][ T4668] Code: 4b 24 b8 01 00 00 00 48 d3 e0 49 01 c7 4c 3b 7b 10 72 e2 4c 89 e2 45 89 ec 44 8b 6c 24 04 e9 1b ff ff ff 0f 0b e9 e2 fe ff ff <0f> 0b e9 00 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
All code
========
0: 4b 24 b8 rex.WXB and $0xb8,%al
3: 01 00 add %eax,(%rax)
5: 00 00 add %al,(%rax)
7: 48 d3 e0 shl %cl,%rax
a: 49 01 c7 add %rax,%r15
d: 4c 3b 7b 10 cmp 0x10(%rbx),%r15
11: 72 e2 jb 0xfffffffffffffff5
13: 4c 89 e2 mov %r12,%rdx
16: 45 89 ec mov %r13d,%r12d
19: 44 8b 6c 24 04 mov 0x4(%rsp),%r13d
1e: e9 1b ff ff ff jmp 0xffffffffffffff3e
23: 0f 0b ud2
25: e9 e2 fe ff ff jmp 0xffffffffffffff0c
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 00 ff ff ff jmp 0xffffffffffffff31
31: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
38: 00 00 00
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 00 ff ff ff jmp 0xffffffffffffff07
7: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
e: 00 00 00
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
[ 210.460431][ T4668] RSP: 0000:ffa000000b2a33c8 EFLAGS: 00010087
[ 210.466646][ T4668] RAX: 0000000000098246 RBX: ff11001ffb534e40 RCX: 000000000009e7c5
[ 210.474768][ T4668] RDX: ff110001ea4acd00 RSI: 000000000009e7c4 RDI: ff11001ffb534e40
[ 210.482888][ T4668] RBP: 0000000000098446 R08: 0000000000000002 R09: 0000000000000000
[ 210.491007][ T4668] R10: ff1100108017daf0 R11: 0000000000000002 R12: 0000000000000026
[ 210.499123][ T4668] R13: 0000000000000026 R14: 0000000000000068 R15: 0000000000000001
[ 210.507231][ T4668] FS: 00007f4d1dbe1740(0000) GS:ff11001ffb500000(0000) knlGS:0000000000000000
[ 210.516296][ T4668] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 210.523016][ T4668] CR2: 00007f4ccba00000 CR3: 0000000122dd4005 CR4: 0000000000773ef0
[ 210.531122][ T4668] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 210.539219][ T4668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 210.547317][ T4668] PKRU: 55555554
[ 210.550982][ T4668] Call Trace:
[ 210.554389][ T4668] <TASK>
[ 210.557446][ T4668] ? __warn (kernel/panic.c:748)
[ 210.561631][ T4668] ? flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.566597][ T4668] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 210.571215][ T4668] ? handle_bug (arch/x86/kernel/traps.c:285)
[ 210.575658][ T4668] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
[ 210.580439][ T4668] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 210.585573][ T4668] ? flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.590529][ T4668] ? setup_object (mm/slub.c:2395)
[ 210.595137][ T4668] smp_call_function_many_cond (kernel/smp.c:134 kernel/smp.c:875)
[ 210.600963][ T4668] ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:735)
[ 210.606263][ T4668] on_each_cpu_cond_mask (kernel/smp.c:1053)
[ 210.611474][ T4668] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91 arch/x86/mm/tlb.c:937 arch/x86/mm/tlb.c:1023)
[ 210.616593][ T4668] pmdp_invalidate (mm/pgtable-generic.c:205 (discriminator 4))
[ 210.621280][ T4668] __split_huge_pmd_locked (arch/x86/include/asm/pgtable-invert.h:18 arch/x86/include/asm/pgtable-invert.h:24 arch/x86/include/asm/pgtable.h:273 mm/huge_memory.c:2750)
[ 210.626839][ T4668] ? page_vma_mapped_walk (mm/page_vma_mapped.c:241)
[ 210.632309][ T4668] try_to_unmap_one (mm/rmap.c:1705 (discriminator 1))
[ 210.637250][ T4668] rmap_walk_anon (mm/rmap.c:2635)
[ 210.641932][ T4668] try_to_unmap (mm/rmap.c:1992)
[ 210.646351][ T4668] ? __pfx_try_to_unmap_one (mm/rmap.c:1637)
[ 210.651812][ T4668] ? __pfx_folio_not_mapped (mm/rmap.c:1964)
[ 210.657273][ T4668] ? __pfx_folio_lock_anon_vma_read (mm/rmap.c:546)
[ 210.663421][ T4668] shrink_folio_list (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/page-flags.h:822 include/linux/page-flags.h:843 include/linux/mm.h:1243 include/linux/mm.h:1260 mm/vmscan.c:1305)
[ 210.668443][ T4668] ? __count_memcg_events (mm/memcontrol.c:569 mm/memcontrol.c:832)
[ 210.673716][ T4668] ? scan_folios (include/linux/vmstat.h:80 mm/vmscan.c:4452)
[ 210.678379][ T4668] ? isolate_folios (mm/vmscan.c:4547)
[ 210.683210][ T4668] evict_folios (mm/vmscan.c:4592)
[ 210.687771][ T4668] try_to_shrink_lruvec (mm/vmscan.c:4785)
[ 210.693024][ T4668] shrink_one (mm/vmscan.c:4824)
[ 210.697315][ T4668] shrink_many (mm/vmscan.c:4885)
[ 210.701770][ T4668] shrink_node (mm/vmscan.c:4965 mm/vmscan.c:5943)
[ 210.706219][ T4668] do_try_to_free_pages (mm/vmscan.c:6143 mm/vmscan.c:6263)
[ 210.711356][ T4668] try_to_free_pages (mm/vmscan.c:6513)
[ 210.716227][ T4668] __alloc_pages_slowpath+0x303/0xb70
[ 210.722654][ T4668] ? get_page_from_freelist (mm/page_alloc.c:3417)
[ 210.728214][ T4668] __alloc_pages_noprof (mm/page_alloc.c:4748)
[ 210.733419][ T4668] alloc_pages_mpol_noprof (mm/mempolicy.c:2267)
[ 210.738802][ T4668] pte_alloc_one (include/linux/mm.h:2886 include/asm-generic/pgalloc.h:70 arch/x86/mm/pgtable.c:33)
[ 210.743228][ T4668] __pte_alloc (mm/memory.c:448)
[ 210.747484][ T4668] do_anonymous_page (mm/memory.c:4752 (discriminator 1))
[ 210.752432][ T4668] ? update_load_avg (kernel/sched/fair.c:4500 kernel/sched/fair.c:4837)
[ 210.757290][ T4668] __handle_mm_fault (mm/memory.c:5909)
[ 210.762227][ T4668] handle_mm_fault (mm/memory.c:6077)
[ 210.766990][ T4668] do_user_addr_fault (arch/x86/mm/fault.c:1339)
[ 210.772008][ T4668] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 210.776595][ T4668] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
[ 210.781445][ T4668] RIP: 0033:0x5612c90f0ad4
[ 210.785855][ T4668] Code: 01 00 00 00 e8 0d f9 ff ff 89 c7 e8 6c ff ff ff bf 00 00 00 00 e8 fc f8 ff ff 85 d2 74 08 48 8d 04 f7 48 8b 00 c3 48 8d 04 f7 <48> 89 30 b8 00 00 00 00 c3 41 54 55 53 48 85 ff 0f 84 23 01 00 00
All code
========
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: e8 0d f9 ff ff call 0xfffffffffffff916
9: 89 c7 mov %eax,%edi
b: e8 6c ff ff ff call 0xffffffffffffff7c
10: bf 00 00 00 00 mov $0x0,%edi
15: e8 fc f8 ff ff call 0xfffffffffffff916
1a: 85 d2 test %edx,%edx
1c: 74 08 je 0x26
1e: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
22: 48 8b 00 mov (%rax),%rax
25: c3 ret
26: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
2a:* 48 89 30 mov %rsi,(%rax) <-- trapping instruction
2d: b8 00 00 00 00 mov $0x0,%eax
32: c3 ret
33: 41 54 push %r12
35: 55 push %rbp
36: 53 push %rbx
37: 48 85 ff test %rdi,%rdi
3a: 0f 84 23 01 00 00 je 0x163
Code starting with the faulting instruction
===========================================
0: 48 89 30 mov %rsi,(%rax)
3: b8 00 00 00 00 mov $0x0,%eax
8: c3 ret
9: 41 54 push %r12
b: 55 push %rbp
c: 53 push %rbx
d: 48 85 ff test %rdi,%rdi
10: 0f 84 23 01 00 00 je 0x139
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241205/202412051551.690e9656-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists