lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202410231429.b91daa36-oliver.sang@intel.com>
Date: Wed, 23 Oct 2024 14:54:39 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Qi Zheng <zhengqi.arch@...edance.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<david@...hat.com>, <hughd@...gle.com>, <willy@...radead.org>,
	<mgorman@...e.de>, <muchun.song@...ux.dev>, <vbabka@...nel.org>,
	<akpm@...ux-foundation.org>, <zokeefe@...gle.com>, <rientjes@...gle.com>,
	<jannh@...gle.com>, <peterx@...hat.com>, <linux-mm@...ck.org>,
	<x86@...nel.org>, Qi Zheng <zhengqi.arch@...edance.com>,
	<oliver.sang@...el.com>
Subject: Re: [PATCH v1 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64


Hello,

by this commit, below two configs are enabled

--- /pkg/linux/x86_64-rhel-8.3/gcc-12/c9f9931196ccc64ec25268538edc327c3add08de/.config  2024-10-20 21:40:11.559320920 +0800
+++ /pkg/linux/x86_64-rhel-8.3/gcc-12/2e22ca3c1f2a6d64740f7b875d869d1f80f78ce8/.config  2024-10-20 06:02:46.008212911 +0800
@@ -1207,6 +1207,8 @@ CONFIG_IOMMU_MM_DATA=y
 CONFIG_EXECMEM=y
 CONFIG_NUMA_MEMBLKS=y
 CONFIG_NUMA_EMU=y
+CONFIG_ARCH_SUPPORTS_PT_RECLAIM=y
+CONFIG_PT_RECLAIM=y


then we noticed various issues which we don't observe on parent.


kernel test robot noticed "BUG:Bad_rss-counter_state_mm:#type:MM_FILEPAGES_val" on:

commit: 2e22ca3c1f2a6d64740f7b875d869d1f80f78ce8 ("[PATCH v1 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64")
url: https://github.com/intel-lab-lkp/linux/commits/Qi-Zheng/mm-khugepaged-retract_page_tables-use-pte_offset_map_lock/20241017-174953
patch link: https://lore.kernel.org/all/0f6e7fb7fb21431710f28df60738f8be98fe9dd9.1729157502.git.zhengqi.arch@bytedance.com/
patch subject: [PATCH v1 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64

in testcase: boot

config: x86_64-rhel-8.3
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-----------------------------------------------------------+------------+------------+
|                                                           | c9f9931196 | 2e22ca3c1f |
+-----------------------------------------------------------+------------+------------+
| boot_failures                                             | 0          | 6          |
| BUG:Bad_rss-counter_state_mm:#type:MM_FILEPAGES_val       | 0          | 5          |
| BUG:Bad_rss-counter_state_mm:#type:MM_ANONPAGES_val       | 0          | 6          |
| BUG:Bad_page_cache_in_process                             | 0          | 3          |
| segfault_at_ip_sp_error                                   | 0          | 5          |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 0          | 4          |
+-----------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202410231429.b91daa36-oliver.sang@intel.com


[    9.153217][    T1] BUG: Bad rss-counter state mm:000000006dcf9cdd type:MM_FILEPAGES val:40
[    9.153929][    T1] BUG: Bad rss-counter state mm:000000006dcf9cdd type:MM_ANONPAGES val:1

...

[    9.444419][  T214] systemd[214]: segfault at 0 ip 0000000000000000 sp 00000000f6b1c2ec error 14 likely on CPU 1 (core 1, socket 0)
[ 9.445388][ T214] Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================
[  OK  ] Started LKP bootstrap.
[    9.453331][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    9.454023][    T1] CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc3-next-20241016-00007-g2e22ca3c1f2a #1
[    9.454818][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[    9.455601][    T1] Call Trace:
[    9.455906][    T1]  <TASK>
[ 9.456184][ T1] panic (kernel/panic.c:354) 
[ 9.456522][ T1] do_exit (include/linux/audit.h:327 kernel/exit.c:920) 
[ 9.456884][ T1] do_group_exit (kernel/exit.c:1069) 
[ 9.457252][ T1] get_signal (kernel/signal.c:2917) 
[ 9.457615][ T1] arch_do_signal_or_restart (arch/x86/kernel/signal.c:337) 
[ 9.458053][ T1] syscall_exit_to_user_mode (kernel/entry/common.c:113 include/linux/entry-common.h:328 kernel/entry/common.c:207 kernel/entry/common.c:218) 
[ 9.458495][ T1] __do_fast_syscall_32 (arch/x86/entry/common.c:391) 
[ 9.458904][ T1] do_fast_syscall_32 (arch/x86/entry/common.c:411) 
[ 9.459299][ T1] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:127) 
[    9.459788][    T1] RIP: 0023:0xf7fbf589
[ 9.460130][ T1] Code: 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
   0:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
	...
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:*	89 e5                	mov    %esp,%ebp		<-- trapping instruction
  26:	0f 34                	sysenter 
  28:	cd 80                	int    $0x80
  2a:	5d                   	pop    %rbp
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	retq   
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  39:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	retq   
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   f:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
[    9.461528][    T1] RSP: 002b:00000000ff837680 EFLAGS: 00200206 ORIG_RAX: 0000000000000006
[    9.462212][    T1] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 0000000000000002
[    9.462834][    T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000f731e6cc
[    9.463462][    T1] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    9.464083][    T1] R10: 0000000000000000 R11: 0000000000200206 R12: 0000000000000000
[    9.464714][    T1] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    9.465343][    T1]  </TASK>
[    9.465677][    T1] Kernel Offset: 0x4600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241023/202410231429.b91daa36-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ