[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202211061521.28931f7-oliver.sang@intel.com>
Date: Sun, 6 Nov 2022 16:14:10 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Xu <peterx@...hat.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
James Houghton <jthoughton@...gle.com>,
Miaohe Lin <linmiaohe@...wei.com>,
David Hildenbrand <david@...hat.com>,
Muchun Song <songmuchun@...edance.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Nadav Amit <nadav.amit@...il.com>,
Mike Kravetz <mike.kravetz@...cle.com>, <peterx@...hat.com>,
Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe
Greeting,
FYI, we noticed WARNING:suspicious_RCU_usage due to commit (built with gcc-11):
commit: 8b7e3b7ca3897ebc4cb7b23c65a4618d64056e3b ("[PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe")
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-hugetlb-Make-huge_pte_offset-thread-safe-for-pmd-unshare/20221031-053221
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/lkml/20221030212929.335473-6-peterx@redhat.com
patch subject: [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe
in testcase: kernel-selftests
version: kernel-selftests-x86_64-9313ba54-1_20221017
with following parameters:
sc_nr_hugepages: 2
group: vm
test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
on test machine: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202211061521.28931f7-oliver.sang@intel.com
kern :warn : [ 181.942648] WARNING: suspicious RCU usage
kern :warn : [ 181.943175] 6.1.0-rc1-00309-g8b7e3b7ca389 #1 Tainted: G S
kern :warn : [ 181.943972] -----------------------------
kern :warn : [ 181.944526] include/linux/rcupdate.h:364 Illegal context switch in RCU read-side critical section!
kern :warn : [ 181.945559]
other info that might help us debug this:
kern :warn : [ 181.946625]
rcu_scheduler_active = 2, debug_locks = 1
kern :warn : [ 181.947473] 2 locks held by hmm-tests/9934:
kern :warn : [ 181.948016] #0: ffff8884325b2d18 (&mm->mmap_lock#2){++++}-{3:3}, at: dmirror_fault (test_hmm.c:?) test_hmm
kern :warn : [ 181.949129] #1: ffffffff858a7860 (rcu_read_lock){....}-{1:2}, at: walk_hugetlb_range (pagewalk.c:?)
kern :warn : [ 181.950161]
stack backtrace:
kern :warn : [ 181.950780] CPU: 9 PID: 9934 Comm: hmm-tests Tainted: G S 6.1.0-rc1-00309-g8b7e3b7ca389 #1
kern :warn : [ 181.951863] Hardware name: Dell Inc. Vostro 3670/0HVPDY, BIOS 1.5.11 12/24/2018
kern :warn : [ 181.952709] Call Trace:
kern :warn : [ 181.953070] <TASK>
kern :warn : [ 181.953403] dump_stack_lvl (??:?)
kern :warn : [ 181.953890] __might_resched (??:?)
kern :warn : [ 181.954403] __mutex_lock (mutex.c:?)
kern :warn : [ 181.954886] ? validate_chain (lockdep.c:?)
kern :warn : [ 181.955405] ? hugetlb_fault (??:?)
kern :warn : [ 181.955926] ? mark_lock+0xca/0xac0
kern :warn : [ 181.956450] ? mutex_lock_io_nested (mutex.c:?)
kern :warn : [ 181.957039] ? check_prev_add (lockdep.c:?)
kern :warn : [ 181.957580] ? hugetlb_vm_op_pagesize (hugetlb.c:?)
kern :warn : [ 181.958177] ? hugetlb_fault (??:?)
kern :warn : [ 181.958690] hugetlb_fault (??:?)
kern :warn : [ 181.959199] ? find_held_lock (lockdep.c:?)
kern :warn : [ 181.959709] ? hugetlb_no_page (??:?)
kern :warn : [ 181.960255] ? __lock_release (lockdep.c:?)
kern :warn : [ 181.960772] ? lock_downgrade (lockdep.c:?)
kern :warn : [ 181.961292] ? lock_is_held_type (??:?)
kern :warn : [ 181.961830] ? handle_mm_fault (??:?)
kern :warn : [ 181.962363] handle_mm_fault (??:?)
kern :warn : [ 181.962870] ? hmm_vma_walk_hugetlb_entry (hmm.c:?)
kern :warn : [ 181.963501] hmm_vma_fault (hmm.c:?)
kern :warn : [ 181.964096] walk_hugetlb_range (pagewalk.c:?)
kern :warn : [ 181.964639] __walk_page_range (pagewalk.c:?)
kern :warn : [ 181.965160] walk_page_range (??:?)
kern :warn : [ 181.965670] ? __walk_page_range (??:?)
kern :warn : [ 181.966213] ? rcu_read_unlock (main.c:?)
kern :warn : [ 181.966718] ? lock_is_held_type (??:?)
kern :warn : [ 181.967259] ? mmu_interval_read_begin (??:?)
kern :warn : [ 181.967855] ? lock_is_held_type (??:?)
kern :warn : [ 181.968400] hmm_range_fault (??:?)
kern :warn : [ 181.968911] ? down_read (??:?)
kern :warn : [ 181.969383] ? hmm_vma_fault (??:?)
kern :warn : [ 181.969891] ? __lock_release (lockdep.c:?)
kern :warn : [ 181.970416] dmirror_fault (test_hmm.c:?) test_hmm
kern :warn : [ 181.971012] ? dmirror_migrate_to_system+0x590/0x590 test_hmm
kern :warn : [ 181.971847] ? find_held_lock (lockdep.c:?)
kern :warn : [ 181.972355] ? dmirror_write+0x202/0x310 test_hmm
kern :warn : [ 181.973069] ? __lock_release (lockdep.c:?)
kern :warn : [ 181.973586] ? lock_downgrade (lockdep.c:?)
kern :warn : [ 181.974107] ? lock_is_held_type (??:?)
kern :warn : [ 181.974641] ? dmirror_write+0x202/0x310 test_hmm
kern :warn : [ 181.975355] ? lock_release (??:?)
kern :warn : [ 181.975845] ? __mutex_unlock_slowpath (mutex.c:?)
kern :warn : [ 181.976444] ? bit_wait_io_timeout (mutex.c:?)
kern :warn : [ 181.977008] ? lock_is_held_type (??:?)
kern :warn : [ 181.977547] ? dmirror_do_write (test_hmm.c:?) test_hmm
kern :warn : [ 181.978185] dmirror_write+0x1bf/0x310 test_hmm
kern :warn : [ 181.978881] ? dmirror_fault (test_hmm.c:?) test_hmm
kern :warn : [ 181.979484] ? lock_is_held_type (??:?)
kern :warn : [ 181.980021] ? __might_fault (??:?)
kern :warn : [ 181.980523] ? lock_release (??:?)
kern :warn : [ 181.981019] dmirror_fops_unlocked_ioctl (test_hmm.c:?) test_hmm
kern :warn : [ 181.981732] ? dmirror_exclusive+0x780/0x780 test_hmm
kern :warn : [ 181.982485] ? do_user_addr_fault (fault.c:?)
kern :warn : [ 181.983042] ? __lock_release (lockdep.c:?)
kern :warn : [ 181.983562] __x64_sys_ioctl (??:?)
kern :warn : [ 181.984074] do_syscall_64 (??:?)
kern :warn : [ 181.984545] ? do_user_addr_fault (fault.c:?)
kern :warn : [ 181.985103] ? do_user_addr_fault (fault.c:?)
kern :warn : [ 181.985654] ? irqentry_exit_to_user_mode (??:?)
kern :warn : [ 181.986256] ? lockdep_hardirqs_on_prepare (lockdep.c:?)
kern :warn : [ 181.986945] entry_SYSCALL_64_after_hwframe (??:?)
kern :warn : [ 181.987569] RIP: 0033:0x7fac2f598e9b
kern :warn : [ 181.988047] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1b 48 8b 44 24 18 64 48 2b 04 25 28 00
All code
========
0: 00 48 89 add %cl,-0x77(%rax)
3: 44 24 18 rex.R and $0x18,%al
6: 31 c0 xor %eax,%eax
8: 48 8d 44 24 60 lea 0x60(%rsp),%rax
d: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
14: 48 89 44 24 08 mov %rax,0x8(%rsp)
19: 48 8d 44 24 20 lea 0x20(%rsp),%rax
1e: 48 89 44 24 10 mov %rax,0x10(%rsp)
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 41 89 c0 mov %eax,%r8d <-- trapping instruction
2d: 3d 00 f0 ff ff cmp $0xfffff000,%eax
32: 77 1b ja 0x4f
34: 48 8b 44 24 18 mov 0x18(%rsp),%rax
39: 64 fs
3a: 48 rex.W
3b: 2b .byte 0x2b
3c: 04 25 add $0x25,%al
3e: 28 00 sub %al,(%rax)
Code starting with the faulting instruction
===========================================
0: 41 89 c0 mov %eax,%r8d
3: 3d 00 f0 ff ff cmp $0xfffff000,%eax
8: 77 1b ja 0x25
a: 48 8b 44 24 18 mov 0x18(%rsp),%rax
f: 64 fs
10: 48 rex.W
11: 2b .byte 0x2b
12: 04 25 add $0x25,%al
14: 28 00 sub %al,(%rax)
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.1.0-rc1-00309-g8b7e3b7ca389" of type "text/plain" (171287 bytes)
View attachment "job-script" of type "text/plain" (6219 bytes)
Download attachment "kmsg.xz" of type "application/x-xz" (49020 bytes)
View attachment "kernel-selftests" of type "text/plain" (224193 bytes)
View attachment "job.yaml" of type "text/plain" (4855 bytes)
View attachment "reproduce" of type "text/plain" (273 bytes)
Powered by blists - more mailing lists