lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b21d911286a620b16ea1a30c704491876962812a.camel@linux.intel.com>
Date: Wed, 09 Oct 2024 11:13:46 +0200
From: Thomas Hellström <thomas.hellstrom@...ux.intel.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, Peter Zijlstra
 <peterz@...radead.org>,  Ingo Molnar <mingo@...hat.com>, Will Deacon
 <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng
 <boqun.feng@...il.com>, Maarten Lankhorst <maarten@...khorst.se>, Christian
 König <christian.koenig@....com>,
 linux-kernel@...r.kernel.org,  intel-xe@...ts.freedesktop.org,
 dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH RESEND] locking/ww_mutex: Adjust to lockdep nest_lock
 requirements

On Wed, 2024-10-09 at 15:42 +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed
> "WARNING:at_kernel/locking/lockdep.c:#__lock_acquire" on:
> 
> commit: d417c66b8b12b5706c9df4ddf5367af540f195c6 ("[PATCH RESEND]
> locking/ww_mutex: Adjust to lockdep nest_lock requirements")
> url:

This is weird. This is an overflow warning indicating that the number
of locks held overflows the 12-bit unsigned hlock->references.

I don't see it on local testing where hlock->references exactly maxes
out with 2048 which is the expected value from 2047 locks in the test
plus one extra dummy lock from the patch.

OTOH there might be a reason why the number of locks originally was set
to 2047, and why we don't see more instances of this error. Could it be
gcc-12 packing problem perhaps.

Anyway if this is indeed the problem, it should suffice to lower the
number of locks in the selftest to 2046...

I'll send out an additional version of the patch to do just that.

/Thomas


> https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/locking-ww_mutex-Adjust-to-lockdep-nest_lock-requirements/20241002-205818
> base:
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git d00b83d416e7
> 3bc3fa4d21b14bec920e88b70ce6
> patch link:
> https://lore.kernel.org/all/20241002125611.361001-1-thomas.hellstrom@linux.intel.com/
> patch subject: [PATCH RESEND] locking/ww_mutex: Adjust to lockdep
> nest_lock requirements
> 
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-977d51cf-1_20240508
> with following parameters:
> 
> 	group: locking
> 
> 
> 
> compiler: gcc-12
> test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz
> (Skylake) with 16G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new
> version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes:
> > https://lore.kernel.org/oe-lkp/202410091542.f6c4a438-oliver.sang@intel.com
> 
> 
> [   63.327071][  T246] ------------[ cut here ]------------
> [   63.332388][  T246] DEBUG_LOCKS_WARN_ON(hlock->references <
> references)
> [ 63.332410][ T246] WARNING: CPU: 2 PID: 246 at
> kernel/locking/lockdep.c:5058 __lock_acquire
> (kernel/locking/lockdep.c:5058 (discriminator 9)) 
> [   63.348622][  T246] Modules linked in: test_ww_mutex(+)
> openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 psample btrfs blake2b_generic xor zstd_compress
> raid6_pq libcrc32c intel_rapl_msr intel_rapl_common
> x86_pkg_temp_thermal intel_powerclamp sd_mod coretemp sg kvm_intel
> ipmi_devintf ipmi_msghandler i915 kvm binfmt_misc drm_buddy intel_gtt
> drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel mei_wdt
> ghash_clmulni_intel ttm wmi_bmof sha512_ssse3 ahci rapl
> drm_kms_helper libahci video intel_cstate intel_uncore serio_raw
> libata mei_me i2c_i801 mei i2c_smbus intel_pch_thermal ie31200_edac
> wmi acpi_pad tpm_infineon loop fuse drm dm_mod ip_tables sch_fq_codel
> [   63.409553][  T246] CPU: 2 PID: 246 Comm: kworker/u16:5 Tainted: G
> S                 6.10.0-04481-gd417c66b8b12 #1
> [   63.419907][  T246] Hardware name: HP HP Z238 Microtower
> Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017
> [   63.429471][  T246] Workqueue: test-ww_mutex stress_inorder_work
> [test_ww_mutex]
> [ 63.436889][ T246] RIP: 0010:__lock_acquire
> (kernel/locking/lockdep.c:5058 (discriminator 9)) 
> [ 63.442284][ T246] Code: d2 0f 85 15 0c 00 00 44 8b 0d 7d df c7 04
> 45 85 c9 0f 85 d0 fe ff ff 48 c7 c6 c0 c1 2a 84 48 c7 c7 00 91 2a 84
> e8 8d 39 e5 ff <0f> 0b e9 b6 fe ff ff 41 be 02 00 00 00 e9 11 f7 ff
> ff 31 db e9 bb
> All code
> ========
>    0:	d2 0f                	rorb   %cl,(%rdi)
>    2:	85 15 0c 00 00 44    	test   %edx,0x4400000c(%rip)       
> # 0x44000014
>    8:	8b 0d 7d df c7 04    	mov    0x4c7df7d(%rip),%ecx        #
> 0x4c7df8b
>    e:	45 85 c9             	test   %r9d,%r9d
>   11:	0f 85 d0 fe ff ff    	jne    0xfffffffffffffee7
>   17:	48 c7 c6 c0 c1 2a 84 	mov    $0xffffffff842ac1c0,%rsi
>   1e:	48 c7 c7 00 91 2a 84 	mov    $0xffffffff842a9100,%rdi
>   25:	e8 8d 39 e5 ff       	callq  0xffffffffffe539b7
>   2a:*	0f 0b                	ud2    		<-- trapping
> instruction
>   2c:	e9 b6 fe ff ff       	jmpq   0xfffffffffffffee7
>   31:	41 be 02 00 00 00    	mov    $0x2,%r14d
>   37:	e9 11 f7 ff ff       	jmpq   0xfffffffffffff74d
>   3c:	31 db                	xor    %ebx,%ebx
>   3e:	e9                   	.byte 0xe9
>   3f:	bb                   	.byte 0xbb
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	0f 0b                	ud2    
>    2:	e9 b6 fe ff ff       	jmpq   0xfffffffffffffebd
>    7:	41 be 02 00 00 00    	mov    $0x2,%r14d
>    d:	e9 11 f7 ff ff       	jmpq   0xfffffffffffff723
>   12:	31 db                	xor    %ebx,%ebx
>   14:	e9                   	.byte 0xe9
>   15:	bb                   	.byte 0xbb
> [   63.461748][  T246] RSP: 0018:ffffc9000141f7a8 EFLAGS: 00010086
> [   63.467683][  T246] RAX: 0000000000000000 RBX: 0000000000000001
> RCX: 0000000000000027
> [   63.475525][  T246] RDX: 0000000000000027 RSI: 0000000000000004
> RDI: ffff8883b5330c48
> [   63.483354][  T246] RBP: ffff888430688000 R08: 0000000000000001
> R09: ffffed1076a66189
> [   63.491172][  T246] R10: ffff8883b5330c4b R11: 0000000000000001
> R12: ffff888430688f58
> [   63.499015][  T246] R13: 00000000000000a0 R14: ffff88841d5b1f28
> R15: 0000000000000000
> [   63.506860][  T246] FS:  0000000000000000(0000)
> GS:ffff8883b5300000(0000) knlGS:0000000000000000
> [   63.515659][  T246] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [   63.522116][  T246] CR2: 00005566d67bd842 CR3: 000000043b67e002
> CR4: 00000000003706f0
> [   63.529942][  T246] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [   63.537768][  T246] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [   63.545591][  T246] Call Trace:
> [   63.548744][  T246]  <TASK>
> [ 63.551537][ T246] ? __warn (kernel/panic.c:693) 
> [ 63.555470][ T246] ? __lock_acquire (kernel/locking/lockdep.c:5058
> (discriminator 9)) 
> [ 63.560262][ T246] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
> [ 63.564637][ T246] ? handle_bug (arch/x86/kernel/traps.c:239) 
> [ 63.568838][ T246] ? exc_invalid_op (arch/x86/kernel/traps.c:260
> (discriminator 1)) 
> [ 63.573389][ T246] ? asm_exc_invalid_op
> (arch/x86/include/asm/idtentry.h:621) 
> [ 63.578283][ T246] ? __lock_acquire (kernel/locking/lockdep.c:5058
> (discriminator 9)) 
> [ 63.583090][ T246] ? __lock_acquire (kernel/locking/lockdep.c:5058
> (discriminator 9)) 
> [ 63.587884][ T246] lock_acquire (kernel/locking/lockdep.c:466
> kernel/locking/lockdep.c:5758 kernel/locking/lockdep.c:5721) 
> [ 63.592243][ T246] ? stress_inorder_work (kernel/locking/test-
> ww_mutex.c:456) test_ww_mutex
> [ 63.598787][ T246] ? __pfx_lock_acquire
> (kernel/locking/lockdep.c:5724) 
> [ 63.603676][ T246] ? __pfx_do_raw_spin_lock
> (kernel/locking/spinlock_debug.c:114) 
> [ 63.608899][ T246] ? __pfx___might_resched
> (kernel/sched/core.c:8392) 
> [ 63.614036][ T246] ? __ww_mutex_lock+0x94c/0x2b50 
> [ 63.619954][ T246] __ww_mutex_lock+0x1f9/0x2b50 
> [ 63.625696][ T246] ? stress_inorder_work (kernel/locking/test-
> ww_mutex.c:456) test_ww_mutex
> [ 63.632218][ T246] ? stress_inorder_work (kernel/locking/test-
> ww_mutex.c:456) test_ww_mutex
> [ 63.638750][ T246] ? __pfx___ww_mutex_lock+0x10/0x10 
> [ 63.644940][ T246] ? __mutex_unlock_slowpath
> (arch/x86/include/asm/atomic64_64.h:101 include/linux/atomic/atomic-
> arch-fallback.h:4329 include/linux/atomic/atomic-long.h:1506
> include/linux/atomic/atomic-instrumented.h:4481
> kernel/locking/mutex.c:929) 
> [ 63.650454][ T246] ? lock_is_held_type
> (kernel/locking/lockdep.c:5497 kernel/locking/lockdep.c:5827) 
> [ 63.655348][ T246] ? __pfx___might_resched
> (kernel/sched/core.c:8392) 
> [ 63.660502][ T246] ? ww_mutex_lock (kernel/locking/mutex.c:878) 
> [ 63.665047][ T246] ww_mutex_lock (kernel/locking/mutex.c:878) 
> [ 63.669432][ T246] stress_inorder_work (kernel/locking/test-
> ww_mutex.c:456) test_ww_mutex
> [ 63.675813][ T246] ? __pfx_stress_inorder_work (kernel/locking/test-
> ww_mutex.c:434) test_ww_mutex
> [ 63.682709][ T246] ? lock_is_held_type
> (kernel/locking/lockdep.c:5497 kernel/locking/lockdep.c:5827) 
> [ 63.687600][ T246] process_one_work (kernel/workqueue.c:3236) 
> [ 63.692398][ T246] ? __pfx_lock_acquire
> (kernel/locking/lockdep.c:5724) 
> [ 63.697307][ T246] ? __pfx_process_one_work
> (kernel/workqueue.c:3133) 
> [ 63.702545][ T246] ? assign_work (kernel/workqueue.c:1202) 
> [ 63.707003][ T246] ? lock_is_held_type
> (kernel/locking/lockdep.c:5497 kernel/locking/lockdep.c:5827) 
> [ 63.711899][ T246] worker_thread (kernel/workqueue.c:3306
> kernel/workqueue.c:3390) 
> [ 63.716353][ T246] ? __pfx_worker_thread (kernel/workqueue.c:3339) 
> [ 63.721334][ T246] kthread (kernel/kthread.c:389) 
> [ 63.725270][ T246] ? __pfx_kthread (kernel/kthread.c:342) 
> [ 63.729732][ T246] ret_from_fork (arch/x86/kernel/process.c:153) 
> [ 63.734019][ T246] ? __pfx_kthread (kernel/kthread.c:342) 
> [ 63.738485][ T246] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> [   63.743119][  T246]  </TASK>
> [   63.746012][  T246] irq event stamp: 10527
> [ 63.750118][ T246] hardirqs last enabled at (10527):
> finish_task_switch+0x1b6/0x950 
> [ 63.760389][ T246] hardirqs last disabled at (10526): __schedule
> (kernel/sched/core.c:6416 (discriminator 1)) 
> [ 63.769458][ T246] softirqs last enabled at (10478): handle_softirqs
> (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:401
> kernel/softirq.c:582) 
> [ 63.778864][ T246] softirqs last disabled at (10473): __irq_exit_rcu
> (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637) 
> [   63.788185][  T246] ---[ end trace 0000000000000000 ]---
> 
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241009/202410091542.f6c4a438-oliver.sang@intel.com
> 
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ