lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202412051551.690e9656-lkp@intel.com>
Date: Thu, 5 Dec 2024 16:43:24 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Rik van Riel <riel@...riel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<x86@...nel.org>, Ingo Molnar <mingo@...nel.org>, Dave Hansen
	<dave.hansen@...el.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>, Mel Gorman <mgorman@...e.de>,
	<oliver.sang@...el.com>
Subject: [tip:x86/mm] [x86/mm/tlb]  209954cbc7:
 WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func



Hello,

besides the performance report
"[tip:x86/mm] [x86/mm/tlb]  209954cbc7:  will-it-scale.per_thread_ops 13.2% regression"
in
https://lore.kernel.org/all/202411282207.6bd28eae-lkp@intel.com/

we now also observed a WARNING from another test. the issue doesn't always
happen, so we run it more to make sure the parent keep clean.

we also tested the patch in
https://lore.kernel.org/all/20241202202213.26a79ed6@fangorn/
as below [1], the issue is not fixed by this patch, still happen with similar
rate.

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
  gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq/vm-scalability/always/always

commit: 
  7e33001b8b ("x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM")
  209954cbc7 ("x86/mm/tlb: Update mm_cpumask lazily")
  40036730a9 ("x86,mm: only trim the mm_cpumask once a second")    [1]


7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 40036730a9566a8abe36ffe2bf4
---------------- --------------------------- ---------------------------
       fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
           |             |             |             |             |
           :50          58%          29:50          56%          28:50    dmesg.RIP:flush_tlb_func
           :50          58%          29:50          56%          28:50    dmesg.WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func


below full report FYI.

kernel test robot noticed "WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func" on:

commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm

[test failed on linux-next/master f486c8aa16b8172f63bddc70116a0c897a7f3f02]

in testcase: vm-scalability
version: vm-scalability-x86_64-6f4ef16-0_20241103
with following parameters:

	runtime: 300
	thp_enabled: always
	thp_defrag: always
	nr_task: 32
	nr_ssd: 1
	priority: 1
	test: swap-w-seq
	cpufreq_governor: performance



config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202412051551.690e9656-lkp@intel.com


[  210.338271][ T4668] ------------[ cut here ]------------
[ 210.343902][ T4668] WARNING: CPU: 38 PID: 4668 at arch/x86/mm/tlb.c:815 flush_tlb_func (arch/x86/mm/tlb.c:815)
[  210.353137][ T4668] Modules linked in: xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit libnvdimm btrfs x86_pkg_temp_thermal intel_powerclamp blake2b_generic xor coretemp raid6_pq libcrc32c sd_mod kvm_intel sg kvm crct10dif_pclmul crc32_pclmul crc32c_intel snd_pcm binfmt_misc ipmi_ssif ghash_clmulni_intel snd_timer dax_hmem ahci cxl_acpi rapl snd ast libahci cxl_port intel_th_gth intel_cstate mei_me drm_shmem_helper cxl_core acpi_power_meter ioatdma isst_if_mbox_pci i2c_i801 isst_if_mmio soundcore intel_th_pci intel_uncore einj libata drm_kms_helper mei pcspkr isst_if_common intel_th intel_pch_thermal i2c_smbus intel_vsec dca wmi ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev drm fuse dm_mod loop ip_tables
[  210.425152][ T4668] CPU: 38 UID: 0 PID: 4668 Comm: usemem Not tainted 6.12.0-rc7-00004-g209954cbc7d0 #1
[ 210.434817][ T4668] RIP: 0010:flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.440408][ T4668] Code: 4b 24 b8 01 00 00 00 48 d3 e0 49 01 c7 4c 3b 7b 10 72 e2 4c 89 e2 45 89 ec 44 8b 6c 24 04 e9 1b ff ff ff 0f 0b e9 e2 fe ff ff <0f> 0b e9 00 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
All code
========
   0:	4b 24 b8             	rex.WXB and $0xb8,%al
   3:	01 00                	add    %eax,(%rax)
   5:	00 00                	add    %al,(%rax)
   7:	48 d3 e0             	shl    %cl,%rax
   a:	49 01 c7             	add    %rax,%r15
   d:	4c 3b 7b 10          	cmp    0x10(%rbx),%r15
  11:	72 e2                	jb     0xfffffffffffffff5
  13:	4c 89 e2             	mov    %r12,%rdx
  16:	45 89 ec             	mov    %r13d,%r12d
  19:	44 8b 6c 24 04       	mov    0x4(%rsp),%r13d
  1e:	e9 1b ff ff ff       	jmp    0xffffffffffffff3e
  23:	0f 0b                	ud2
  25:	e9 e2 fe ff ff       	jmp    0xffffffffffffff0c
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	e9 00 ff ff ff       	jmp    0xffffffffffffff31
  31:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
  38:	00 00 00 
  3b:	90                   	nop
  3c:	90                   	nop
  3d:	90                   	nop
  3e:	90                   	nop
  3f:	90                   	nop

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	e9 00 ff ff ff       	jmp    0xffffffffffffff07
   7:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
   e:	00 00 00 
  11:	90                   	nop
  12:	90                   	nop
  13:	90                   	nop
  14:	90                   	nop
  15:	90                   	nop
[  210.460431][ T4668] RSP: 0000:ffa000000b2a33c8 EFLAGS: 00010087
[  210.466646][ T4668] RAX: 0000000000098246 RBX: ff11001ffb534e40 RCX: 000000000009e7c5
[  210.474768][ T4668] RDX: ff110001ea4acd00 RSI: 000000000009e7c4 RDI: ff11001ffb534e40
[  210.482888][ T4668] RBP: 0000000000098446 R08: 0000000000000002 R09: 0000000000000000
[  210.491007][ T4668] R10: ff1100108017daf0 R11: 0000000000000002 R12: 0000000000000026
[  210.499123][ T4668] R13: 0000000000000026 R14: 0000000000000068 R15: 0000000000000001
[  210.507231][ T4668] FS:  00007f4d1dbe1740(0000) GS:ff11001ffb500000(0000) knlGS:0000000000000000
[  210.516296][ T4668] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  210.523016][ T4668] CR2: 00007f4ccba00000 CR3: 0000000122dd4005 CR4: 0000000000773ef0
[  210.531122][ T4668] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  210.539219][ T4668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  210.547317][ T4668] PKRU: 55555554
[  210.550982][ T4668] Call Trace:
[  210.554389][ T4668]  <TASK>
[ 210.557446][ T4668] ? __warn (kernel/panic.c:748)
[ 210.561631][ T4668] ? flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.566597][ T4668] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 210.571215][ T4668] ? handle_bug (arch/x86/kernel/traps.c:285)
[ 210.575658][ T4668] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
[ 210.580439][ T4668] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 210.585573][ T4668] ? flush_tlb_func (arch/x86/mm/tlb.c:815)
[ 210.590529][ T4668] ? setup_object (mm/slub.c:2395)
[ 210.595137][ T4668] smp_call_function_many_cond (kernel/smp.c:134 kernel/smp.c:875)
[ 210.600963][ T4668] ? __pfx_flush_tlb_func (arch/x86/mm/tlb.c:735)
[ 210.606263][ T4668] on_each_cpu_cond_mask (kernel/smp.c:1053)
[ 210.611474][ T4668] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:91 arch/x86/mm/tlb.c:937 arch/x86/mm/tlb.c:1023)
[ 210.616593][ T4668] pmdp_invalidate (mm/pgtable-generic.c:205 (discriminator 4))
[ 210.621280][ T4668] __split_huge_pmd_locked (arch/x86/include/asm/pgtable-invert.h:18 arch/x86/include/asm/pgtable-invert.h:24 arch/x86/include/asm/pgtable.h:273 mm/huge_memory.c:2750)
[ 210.626839][ T4668] ? page_vma_mapped_walk (mm/page_vma_mapped.c:241)
[ 210.632309][ T4668] try_to_unmap_one (mm/rmap.c:1705 (discriminator 1))
[ 210.637250][ T4668] rmap_walk_anon (mm/rmap.c:2635)
[ 210.641932][ T4668] try_to_unmap (mm/rmap.c:1992)
[ 210.646351][ T4668] ? __pfx_try_to_unmap_one (mm/rmap.c:1637)
[ 210.651812][ T4668] ? __pfx_folio_not_mapped (mm/rmap.c:1964)
[ 210.657273][ T4668] ? __pfx_folio_lock_anon_vma_read (mm/rmap.c:546)
[ 210.663421][ T4668] shrink_folio_list (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/page-flags.h:822 include/linux/page-flags.h:843 include/linux/mm.h:1243 include/linux/mm.h:1260 mm/vmscan.c:1305)
[ 210.668443][ T4668] ? __count_memcg_events (mm/memcontrol.c:569 mm/memcontrol.c:832)
[ 210.673716][ T4668] ? scan_folios (include/linux/vmstat.h:80 mm/vmscan.c:4452)
[ 210.678379][ T4668] ? isolate_folios (mm/vmscan.c:4547)
[ 210.683210][ T4668] evict_folios (mm/vmscan.c:4592)
[ 210.687771][ T4668] try_to_shrink_lruvec (mm/vmscan.c:4785)
[ 210.693024][ T4668] shrink_one (mm/vmscan.c:4824)
[ 210.697315][ T4668] shrink_many (mm/vmscan.c:4885)
[ 210.701770][ T4668] shrink_node (mm/vmscan.c:4965 mm/vmscan.c:5943)
[ 210.706219][ T4668] do_try_to_free_pages (mm/vmscan.c:6143 mm/vmscan.c:6263)
[ 210.711356][ T4668] try_to_free_pages (mm/vmscan.c:6513)
[ 210.716227][ T4668] __alloc_pages_slowpath+0x303/0xb70
[ 210.722654][ T4668] ? get_page_from_freelist (mm/page_alloc.c:3417)
[ 210.728214][ T4668] __alloc_pages_noprof (mm/page_alloc.c:4748)
[ 210.733419][ T4668] alloc_pages_mpol_noprof (mm/mempolicy.c:2267)
[ 210.738802][ T4668] pte_alloc_one (include/linux/mm.h:2886 include/asm-generic/pgalloc.h:70 arch/x86/mm/pgtable.c:33)
[ 210.743228][ T4668] __pte_alloc (mm/memory.c:448)
[ 210.747484][ T4668] do_anonymous_page (mm/memory.c:4752 (discriminator 1))
[ 210.752432][ T4668] ? update_load_avg (kernel/sched/fair.c:4500 kernel/sched/fair.c:4837)
[ 210.757290][ T4668] __handle_mm_fault (mm/memory.c:5909)
[ 210.762227][ T4668] handle_mm_fault (mm/memory.c:6077)
[ 210.766990][ T4668] do_user_addr_fault (arch/x86/mm/fault.c:1339)
[ 210.772008][ T4668] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 210.776595][ T4668] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
[  210.781445][ T4668] RIP: 0033:0x5612c90f0ad4
[ 210.785855][ T4668] Code: 01 00 00 00 e8 0d f9 ff ff 89 c7 e8 6c ff ff ff bf 00 00 00 00 e8 fc f8 ff ff 85 d2 74 08 48 8d 04 f7 48 8b 00 c3 48 8d 04 f7 <48> 89 30 b8 00 00 00 00 c3 41 54 55 53 48 85 ff 0f 84 23 01 00 00
All code
========
   0:	01 00                	add    %eax,(%rax)
   2:	00 00                	add    %al,(%rax)
   4:	e8 0d f9 ff ff       	call   0xfffffffffffff916
   9:	89 c7                	mov    %eax,%edi
   b:	e8 6c ff ff ff       	call   0xffffffffffffff7c
  10:	bf 00 00 00 00       	mov    $0x0,%edi
  15:	e8 fc f8 ff ff       	call   0xfffffffffffff916
  1a:	85 d2                	test   %edx,%edx
  1c:	74 08                	je     0x26
  1e:	48 8d 04 f7          	lea    (%rdi,%rsi,8),%rax
  22:	48 8b 00             	mov    (%rax),%rax
  25:	c3                   	ret
  26:	48 8d 04 f7          	lea    (%rdi,%rsi,8),%rax
  2a:*	48 89 30             	mov    %rsi,(%rax)		<-- trapping instruction
  2d:	b8 00 00 00 00       	mov    $0x0,%eax
  32:	c3                   	ret
  33:	41 54                	push   %r12
  35:	55                   	push   %rbp
  36:	53                   	push   %rbx
  37:	48 85 ff             	test   %rdi,%rdi
  3a:	0f 84 23 01 00 00    	je     0x163

Code starting with the faulting instruction
===========================================
   0:	48 89 30             	mov    %rsi,(%rax)
   3:	b8 00 00 00 00       	mov    $0x0,%eax
   8:	c3                   	ret
   9:	41 54                	push   %r12
   b:	55                   	push   %rbp
   c:	53                   	push   %rbx
   d:	48 85 ff             	test   %rdi,%rdi
  10:	0f 84 23 01 00 00    	je     0x139


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241205/202412051551.690e9656-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ