[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202304292108.f44daa45-oliver.sang@intel.com>
Date: Sat, 29 Apr 2023 22:07:15 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Khalid Aziz <khalid.aziz@...cle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
Matthew Wilcox <willy@...radead.org>, <linux-mm@...ck.org>,
<akpm@...ux-foundation.org>, <markhemm@...glemail.com>,
<viro@...iv.linux.org.uk>, <david@...hat.com>,
<mike.kravetz@...cle.com>, Khalid Aziz <khalid.aziz@...cle.com>,
<andreyknvl@...il.com>, <dave.hansen@...el.com>, <luto@...nel.org>,
<brauner@...nel.org>, <arnd@...db.de>, <ebiederm@...ssion.com>,
<catalin.marinas@....com>, <linux-arch@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <mhiramat@...nel.org>,
<rostedt@...dmis.org>, <vasily.averin@...ux.dev>,
<xhao@...ux.alibaba.com>, <pcc@...gle.com>, <neilb@...e.de>,
<maz@...nel.org>, <oliver.sang@...el.com>
Subject: Re: [PATCH RFC v2 4/4] mm/ptshare: Add page fault handling for page
table shared regions
Hello,
kernel test robot noticed "WARNING:bad_unlock_balance_detected" on:
commit: a2eef9e49f572b7b2dfa23fc32567e83da9573d5 ("[PATCH RFC v2 4/4] mm/ptshare: Add page fault handling for page table shared regions")
url: https://github.com/intel-lab-lkp/linux/commits/Khalid-Aziz/mm-ptshare-Add-vm-flag-for-shared-PTE/20230427-005143
base: https://git.kernel.org/cgit/linux/kernel/git/arnd/asm-generic.git master
patch link: https://lore.kernel.org/all/9edffd2a12a049a42d9a2c216e3f999aae7e65a4.1682453344.git.khalid.aziz@oracle.com/
patch subject: [PATCH RFC v2 4/4] mm/ptshare: Add page fault handling for page table shared regions
in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:
sc_nr_hugepages: 2
group: mm
test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
compiler: gcc-11
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202304292108.f44daa45-oliver.sang@intel.com
[ 183.878671][ T3695] WARNING: bad unlock balance detected!
[ 183.884040][ T3695] 6.3.0-rc1-00015-ga2eef9e49f57 #1 Not tainted
[ 183.890014][ T3695] -------------------------------------
[ 183.895382][ T3695] userfaultfd/3695 is trying to release lock (&mm->mmap_lock) at:
[ 183.903000][ T3695] handle_mm_fault (mm/memory.c:5276)
[ 183.909324][ T3695] but there are no more locks to release!
[ 183.914866][ T3695]
[ 183.914866][ T3695] other info that might help us debug this:
[ 183.922738][ T3695] no locks held by userfaultfd/3695.
[ 183.927847][ T3695]
[ 183.927847][ T3695] stack backtrace:
[ 183.933560][ T3695] CPU: 7 PID: 3695 Comm: userfaultfd Not tainted 6.3.0-rc1-00015-ga2eef9e49f57 #1
[ 183.942558][ T3695] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 183.950604][ T3695] Call Trace:
[ 183.953730][ T3695] <TASK>
[ 183.956510][ T3695] dump_stack_lvl (lib/dump_stack.c:108)
[ 183.960845][ T3695] __lock_release (kernel/locking/lockdep.c:5346)
[ 183.965354][ T3695] ? lock_downgrade (kernel/locking/lockdep.c:5321)
[ 183.970032][ T3695] ? dump_stack_print_info (lib/dump_stack.c:70)
[ 183.975231][ T3695] ? handle_mm_fault (mm/memory.c:5276)
[ 183.979998][ T3695] lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5691)
[ 183.984248][ T3695] up_read (kernel/locking/rwsem.c:1616)
[ 183.987978][ T3695] handle_mm_fault (mm/memory.c:5276)
[ 183.992571][ T3695] ? lock_is_held_type (kernel/locking/lockdep.c:5410 kernel/locking/lockdep.c:5712)
[ 183.997424][ T3695] ? __handle_mm_fault (mm/memory.c:5201)
[ 184.002364][ T3695] ? find_vma (mm/mmap.c:1854)
[ 184.006459][ T3695] ? can_vma_merge_before+0x330/0x330
[ 184.012283][ T3695] do_user_addr_fault (arch/x86/mm/fault.c:1407)
[ 184.017146][ T3695] exc_page_fault (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1506 arch/x86/mm/fault.c:1554)
[ 184.021496][ T3695] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[ 184.026186][ T3695] RIP: 0033:0x403b2e
[ 184.029923][ T3695] Code: 89 75 e0 48 89 55 d8 48 c7 45 f8 00 00 00 00 eb 2c 48 8b 55 e8 48 8b 45 f8 48 01 d0 0f b6 10 48 8b 4d e0 48 8b 45 f8 48 01 c8 <0f> b6 00 38 c2 74 07 b8 01 00 00 00 eb 14 48 83 45 f8 01 48 8b 45
All code
========
0: 89 75 e0 mov %esi,-0x20(%rbp)
3: 48 89 55 d8 mov %rdx,-0x28(%rbp)
7: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
e: 00
f: eb 2c jmp 0x3d
11: 48 8b 55 e8 mov -0x18(%rbp),%rdx
15: 48 8b 45 f8 mov -0x8(%rbp),%rax
19: 48 01 d0 add %rdx,%rax
1c: 0f b6 10 movzbl (%rax),%edx
1f: 48 8b 4d e0 mov -0x20(%rbp),%rcx
23: 48 8b 45 f8 mov -0x8(%rbp),%rax
27: 48 01 c8 add %rcx,%rax
2a:* 0f b6 00 movzbl (%rax),%eax <-- trapping instruction
2d: 38 c2 cmp %al,%dl
2f: 74 07 je 0x38
31: b8 01 00 00 00 mov $0x1,%eax
36: eb 14 jmp 0x4c
38: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
3d: 48 rex.W
3e: 8b .byte 0x8b
3f: 45 rex.RB
Code starting with the faulting instruction
===========================================
0: 0f b6 00 movzbl (%rax),%eax
3: 38 c2 cmp %al,%dl
5: 74 07 je 0xe
7: b8 01 00 00 00 mov $0x1,%eax
c: eb 14 jmp 0x22
e: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
13: 48 rex.W
14: 8b .byte 0x8b
15: 45 rex.RB
[ 184.049280][ T3695] RSP: 002b:00007ffe05336170 EFLAGS: 00010206
[ 184.055169][ T3695] RAX: 00007f57bde00000 RBX: 00007ffe05336340 RCX: 00007f57bde00000
[ 184.062955][ T3695] RDX: 00000000000000ff RSI: 00007f57bde00000 RDI: 00007f57ce000000
[ 184.070742][ T3695] RBP: 00007ffe05336170 R08: 00000000ffffffff R09: 0000000000000000
[ 184.078528][ T3695] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[ 184.086315][ T3695] R13: 00007ffe05336560 R14: 000000000040be00 R15: 00007f57e6668020
[ 184.094107][ T3695] </TASK>
[ 184.096984][ T3695] ------------[ cut here ]------------
[ 184.102287][ T3695] DEBUG_RWSEMS_WARN_ON(tmp < 0): count = 0xffffffffffffff00, magic = 0xffff888160af9d28, owner = 0x1, curr 0xffff8881b2dc8040, list empty
[ 184.116154][ T3695] WARNING: CPU: 7 PID: 3695 at kernel/locking/rwsem.c:1348 __up_read (kernel/locking/rwsem.c:1348 (discriminator 15))
[ 184.125073][ T3695] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common btrfs x86_pkg_temp_thermal intel_powerclamp blake2b_generic coretemp xor raid6_pq zstd_compress kvm_intel libcrc32c kvm i915 sd_mod t10_pi irqbypass crct10dif_pclmul crc64_rocksoft_generic crc32_pclmul drm_buddy crc64_rocksoft crc32c_intel intel_gtt crc64 ghash_clmulni_intel drm_display_helper sha512_ssse3 sg rapl drm_kms_helper ipmi_devintf ipmi_msghandler mei_wdt wmi_bmof intel_cstate syscopyarea ahci libahci i2c_i801 i2c_designware_platform sysfillrect mei_me intel_uncore i2c_smbus idma64 i2c_designware_core sysimgblt libata ttm mei video wmi intel_pmc_core acpi_pad binfmt_misc fuse drm ip_tables
[ 184.190695][ T3695] CPU: 7 PID: 3695 Comm: userfaultfd Not tainted 6.3.0-rc1-00015-ga2eef9e49f57 #1
[ 184.199697][ T3695] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 184.207763][ T3695] RIP: 0010:__up_read (kernel/locking/rwsem.c:1348 (discriminator 15))
[ 184.212637][ T3695] Code: 3c 02 00 0f 85 61 03 00 00 53 48 8b 55 00 4d 89 e9 4d 89 f8 4c 89 f1 48 c7 c6 40 70 c9 83 48 c7 c7 a0 6e c9 83 e8 11 15 e9 ff <0f> 0b 5a e9 0e ff ff ff 48 89 44 24 38 e9 db fd ff ff be 08 00 00
All code
========
0: 3c 02 cmp $0x2,%al
2: 00 0f add %cl,(%rdi)
4: 85 61 03 test %esp,0x3(%rcx)
7: 00 00 add %al,(%rax)
9: 53 push %rbx
a: 48 8b 55 00 mov 0x0(%rbp),%rdx
e: 4d 89 e9 mov %r13,%r9
11: 4d 89 f8 mov %r15,%r8
14: 4c 89 f1 mov %r14,%rcx
17: 48 c7 c6 40 70 c9 83 mov $0xffffffff83c97040,%rsi
1e: 48 c7 c7 a0 6e c9 83 mov $0xffffffff83c96ea0,%rdi
25: e8 11 15 e9 ff callq 0xffffffffffe9153b
2a:* 0f 0b ud2 <-- trapping instruction
2c: 5a pop %rdx
2d: e9 0e ff ff ff jmpq 0xffffffffffffff40
32: 48 89 44 24 38 mov %rax,0x38(%rsp)
37: e9 db fd ff ff jmpq 0xfffffffffffffe17
3c: be .byte 0xbe
3d: 08 00 or %al,(%rax)
...
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 5a pop %rdx
3: e9 0e ff ff ff jmpq 0xffffffffffffff16
8: 48 89 44 24 38 mov %rax,0x38(%rsp)
d: e9 db fd ff ff jmpq 0xfffffffffffffded
12: be .byte 0xbe
13: 08 00 or %al,(%rax)
...
[ 184.232027][ T3695] RSP: 0000:ffffc9000b6ffd48 EFLAGS: 00010286
[ 184.237919][ T3695] RAX: 0000000000000000 RBX: ffffffff83c96de0 RCX: fffff520016dff6e
[ 184.245734][ T3695] RDX: 0000000000000004 RSI: 0000000000000008 RDI: ffff88878c7ac38c
[ 184.253611][ T3695] RBP: ffff888160af9d28 R08: ffffffff821fca01 R09: ffffc9000b6ffb6f
[ 184.261422][ T3695] R10: fffff520016dff6d R11: 0000000000000001 R12: 1ffff920016dffad
[ 184.269254][ T3695] R13: ffff8881b2dc8040 R14: ffff888160af9d28 R15: 0000000000000001
[ 184.277051][ T3695] FS: 00007f57e6448740(0000) GS:ffff88878c780000(0000) knlGS:0000000000000000
[ 184.285796][ T3695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 184.292208][ T3695] CR2: 00007f57bde00000 CR3: 0000000100e30004 CR4: 00000000003706e0
[ 184.300016][ T3695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 184.307809][ T3695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 184.315616][ T3695] Call Trace:
[ 184.318762][ T3695] <TASK>
[ 184.321563][ T3695] ? up_write (kernel/locking/rwsem.c:1339)
[ 184.325575][ T3695] ? lock_release (kernel/locking/lockdep.c:5693)
[ 184.330105][ T3695] handle_mm_fault (mm/memory.c:5276)
[ 184.334721][ T3695] ? lock_is_held_type (kernel/locking/lockdep.c:5410 kernel/locking/lockdep.c:5712)
[ 184.339595][ T3695] ? __handle_mm_fault (mm/memory.c:5201)
[ 184.344556][ T3695] ? find_vma (mm/mmap.c:1854)
[ 184.348651][ T3695] ? can_vma_merge_before+0x330/0x330
[ 184.354479][ T3695] do_user_addr_fault (arch/x86/mm/fault.c:1407)
[ 184.359351][ T3695] exc_page_fault (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1506 arch/x86/mm/fault.c:1554)
[ 184.363707][ T3695] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[ 184.368408][ T3695] RIP: 0033:0x403b2e
[ 184.372145][ T3695] Code: 89 75 e0 48 89 55 d8 48 c7 45 f8 00 00 00 00 eb 2c 48 8b 55 e8 48 8b 45 f8 48 01 d0 0f b6 10 48 8b 4d e0 48 8b 45 f8 48 01 c8 <0f> b6 00 38 c2 74 07 b8 01 00 00 00 eb 14 48 83 45 f8 01 48 8b 45
All code
========
0: 89 75 e0 mov %esi,-0x20(%rbp)
3: 48 89 55 d8 mov %rdx,-0x28(%rbp)
7: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
e: 00
f: eb 2c jmp 0x3d
11: 48 8b 55 e8 mov -0x18(%rbp),%rdx
15: 48 8b 45 f8 mov -0x8(%rbp),%rax
19: 48 01 d0 add %rdx,%rax
1c: 0f b6 10 movzbl (%rax),%edx
1f: 48 8b 4d e0 mov -0x20(%rbp),%rcx
23: 48 8b 45 f8 mov -0x8(%rbp),%rax
27: 48 01 c8 add %rcx,%rax
2a:* 0f b6 00 movzbl (%rax),%eax <-- trapping instruction
2d: 38 c2 cmp %al,%dl
2f: 74 07 je 0x38
31: b8 01 00 00 00 mov $0x1,%eax
36: eb 14 jmp 0x4c
38: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
3d: 48 rex.W
3e: 8b .byte 0x8b
3f: 45 rex.RB
Code starting with the faulting instruction
===========================================
0: 0f b6 00 movzbl (%rax),%eax
3: 38 c2 cmp %al,%dl
5: 74 07 je 0xe
7: b8 01 00 00 00 mov $0x1,%eax
c: eb 14 jmp 0x22
e: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
13: 48 rex.W
14: 8b .byte 0x8b
15: 45 rex.RB
[ 184.391546][ T3695] RSP: 002b:00007ffe05336170 EFLAGS: 00010206
[ 184.397460][ T3695] RAX: 00007f57bde00000 RBX: 00007ffe05336340 RCX: 00007f57bde00000
[ 184.405251][ T3695] RDX: 00000000000000ff RSI: 00007f57bde00000 RDI: 00007f57ce000000
[ 184.413057][ T3695] RBP: 00007ffe05336170 R08: 00000000ffffffff R09: 0000000000000000
[ 184.420863][ T3695] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[ 184.428670][ T3695] R13: 00007ffe05336560 R14: 000000000040be00 R15: 00007f57e6668020
[ 184.436482][ T3695] </TASK>
[ 184.439365][ T3695] irq event stamp: 368859
[ 184.443530][ T3695] hardirqs last enabled at (368859): finish_task_switch+0x21c/0x910
[ 184.453841][ T3695] hardirqs last disabled at (368858): __schedule (kernel/sched/core.c:6521 (discriminator 1))
[ 184.462946][ T3695] softirqs last enabled at (368830): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600)
[ 184.472118][ T3695] softirqs last disabled at (368825): __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
[ 184.481495][ T3695] ---[ end trace 0000000000000000 ]---
[ 184.486819][ T3695] VM_SHARED_PT vma with NULL ptshare_data
[ 184.486823][ T3695] CPU: 3 PID: 3695 Comm: userfaultfd Tainted: G W 6.3.0-rc1-00015-ga2eef9e49f57 #1
[ 184.502865][ T3695] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
View attachment "config-6.3.0-rc1-00015-ga2eef9e49f57" of type "text/plain" (161585 bytes)
View attachment "job-script" of type "text/plain" (6093 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (53480 bytes)
View attachment "kernel-selftests" of type "text/plain" (37134 bytes)
View attachment "job.yaml" of type "text/plain" (5131 bytes)
View attachment "reproduce" of type "text/plain" (277 bytes)
Powered by blists - more mailing lists