lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Sun, 5 Dec 2021 21:16:02 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     0day robot <lkp@...el.com>, LKML <linux-kernel@...r.kernel.org>,
        lkp@...ts.01.org, Christoph Hellwig <hch@....de>,
        Jens Axboe <axboe@...nel.dk>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        Sagi Grimberg <sagi@...mberg.me>, linux-block@...r.kernel.org,
        linux-nvme@...ts.infradead.org, linux-scsi@...r.kernel.org,
        Keith Busch <kbusch@...nel.org>, Ming Lei <ming.lei@...hat.com>
Subject: [blk]  4575a8b36e: WARNING:at_kernel/sched/core.c:#__might_sleep



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 4575a8b36e3a56fa87b1f77e0064fc2ec36ebb7c ("[PATCH V2 1/5] blk-mq: remove hctx_lock and hctx_unlock")
url: https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-quiesce-improvement/20211130-154015
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/linux-block/20211130073752.3005936-2-ming.lei@redhat.com

in testcase: blktests
version: blktests-x86_64-f51ee53-1_20211202
with following parameters:

	test: nbd-group-01
	ucode: 0xe2



on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


[ 36.018984][ T964] WARNING: CPU: 1 PID: 964 at kernel/sched/core.c:9461 __might_sleep (kernel/sched/core.c:9461 (discriminator 9)) 
[   36.028135][  T964] Modules linked in: nbd loop xfs libcrc32c dm_multipath dm_mod sd_mod t10_pi sg ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm_intel mei_wdt kvm intel_gtt ttm wmi_bmof drm_kms_helper irqbypass crct10dif_pclmul syscopyarea sysfillrect crc32_pclmul crc32c_intel ghash_clmulni_intel sysimgblt rapl fb_sys_fops intel_cstate ahci libahci drm intel_uncore mei_me intel_pch_thermal libata mei wmi video intel_pmc_core acpi_pad ip_tables
[   36.048867][  T303] RESULT_ROOT=/result/blktests/nbd-group-01-ucode=0xe2/lkp-skl-d03/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3-func/gcc-9/4575a8b36e3a56fa87b1f77e0064fc2ec36ebb7c/6
[   36.073878][  T303]
[   36.073994][  T964] CPU: 1 PID: 964 Comm: parted Tainted: G          I       5.16.0-rc3-00091-g4575a8b36e3a #1
[   36.102953][  T964] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
[ 36.111136][ T964] RIP: 0010:__might_sleep (kernel/sched/core.c:9461 (discriminator 9)) 
[ 36.116481][ T964] Code: 48 8d bb 00 23 00 00 48 89 fa 48 c1 ea 03 80 3c 02 00 75 31 48 8b 93 00 23 00 00 44 89 f6 48 c7 c7 a0 f6 a7 83 e8 96 40 26 02 <0f> 0b e9 6e ff ff ff e8 1a eb 65 00 e9 1d ff ff ff e8 50 eb 65 00
All code
========
   0:	48 8d bb 00 23 00 00 	lea    0x2300(%rbx),%rdi
   7:	48 89 fa             	mov    %rdi,%rdx
   a:	48 c1 ea 03          	shr    $0x3,%rdx
   e:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
  12:	75 31                	jne    0x45
  14:	48 8b 93 00 23 00 00 	mov    0x2300(%rbx),%rdx
  1b:	44 89 f6             	mov    %r14d,%esi
  1e:	48 c7 c7 a0 f6 a7 83 	mov    $0xffffffff83a7f6a0,%rdi
  25:	e8 96 40 26 02       	callq  0x22640c0
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e9 6e ff ff ff       	jmpq   0xffffffffffffff9f
  31:	e8 1a eb 65 00       	callq  0x65eb50
  36:	e9 1d ff ff ff       	jmpq   0xffffffffffffff58
  3b:	e8 50 eb 65 00       	callq  0x65eb90

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e9 6e ff ff ff       	jmpq   0xffffffffffffff75
   7:	e8 1a eb 65 00       	callq  0x65eb26
   c:	e9 1d ff ff ff       	jmpq   0xffffffffffffff2e
  11:	e8 50 eb 65 00       	callq  0x65eb66
[   36.136101][  T964] RSP: 0018:ffffc90000f87580 EFLAGS: 00010286
[   36.142134][  T964] RAX: 0000000000000000 RBX: ffff888859690000 RCX: 0000000000000000
[   36.150086][  T964] RDX: 0000000000000004 RSI: 0000000000000008 RDI: fffff520001f0ea2
[   36.157941][  T964] RBP: ffffc90000f875a8 R08: 0000000000000001 R09: ffffed10fae96741
[   36.165885][  T964] R10: ffff8887d74b3a07 R11: ffffed10fae96740 R12: ffffffff83c07c00
[   36.173851][  T964] R13: 000000000000081b R14: 0000000000000002 R15: ffffffff84c9cfc0
[   36.181821][  T964] FS:  00007fbfd9ece5c0(0000) GS:ffff8887d7480000(0000) knlGS:0000000000000000
[   36.190633][  T964] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.197185][  T964] CR2: 000056330e10b000 CR3: 0000000857e7e002 CR4: 00000000003706e0
[   36.205042][  T964] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.213012][  T964] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.220965][  T964] Call Trace:
[   36.224218][  T964]  <TASK>
[ 36.227024][ T964] blk_mq_run_hw_queue (block/blk-mq.c:2075 (discriminator 2)) 
[ 36.232000][ T964] blk_mq_sched_insert_requests (arch/x86/include/asm/preempt.h:80 include/linux/rcupdate.h:68 include/linux/rcupdate.h:686 include/linux/percpu-refcount.h:330 include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:487) 
[ 36.237871][ T964] blk_mq_flush_plug_list (block/blk-mq.c:2546) 
[ 36.243227][ T964] ? update_io_ticks (block/blk-core.c:1040) 
[ 36.248080][ T964] ? blk_mq_insert_requests (block/blk-mq.c:2546) 
[ 36.253517][ T964] ? blk_mq_submit_bio (block/blk-mq.c:866 block/blk-mq.c:2386 block/blk-mq.c:2789) 
[ 36.258603][ T964] ? blk_mq_try_issue_list_directly (block/blk-mq.c:2760) 
[ 36.264718][ T964] blk_flush_plug (block/blk-core.c:1262) 
[ 36.269269][ T964] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:513 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) 
[ 36.274603][ T964] ? blk_start_plug_nr_ios (block/blk-core.c:1251) 
[ 36.279943][ T964] ? __filemap_add_folio (mm/filemap.c:953) 
[ 36.285106][ T964] io_schedule (kernel/sched/core.c:8371) 
[   36.287859][  T303] job=/lkp/jobs/scheduled/lkp-skl-d03/blktests-nbd-group-01-ucode=0xe2-debian-10.4-x86_64-20200603.cgz-4575a8b36e3a56fa87b1f77e0064fc2ec36ebb7c-20211203-44701-13xrrcj-6.yaml
[ 36.289315][ T964] bit_wait_io (include/linux/sched/signal.h:398 kernel/sched/wait_bit.c:210) 
[   36.289654][  T303]
[ 36.306422][ T964] __wait_on_bit (kernel/sched/wait_bit.c:49) 
[ 36.317251][ T964] ? bit_wait_timeout (kernel/sched/wait_bit.c:208) 
[ 36.322168][ T964] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:60) 
[ 36.327427][ T964] ? __wait_on_bit (kernel/sched/wait_bit.c:60) 
[ 36.332063][ T964] ? var_wake_function (kernel/sched/wait_bit.c:22) 
[ 36.337048][ T964] __block_write_begin_int (include/linux/wait_bit.h:101 fs/buffer.c:122 include/linux/buffer_head.h:356 fs/buffer.c:2048) 
[ 36.342483][ T964] ? blkdev_direct_IO (block/fops.c:27) 
[ 36.347382][ T964] ? invalidate_bh_lrus_cpu (fs/buffer.c:1974) 
[ 36.352624][ T964] ? inode_owner_or_capable (fs/inode.c:2289) 
[ 36.358042][ T964] ? blkdev_direct_IO (block/fops.c:27) 
[ 36.362952][ T964] block_write_begin (fs/buffer.c:2060 fs/buffer.c:2120) 
[ 36.367674][ T964] generic_perform_write (mm/filemap.c:3758) 
[ 36.372832][ T964] ? trace_event_raw_event_file_check_and_advance_wb_err (mm/filemap.c:3721) 
[ 36.380789][ T964] ? inode_update_time (fs/inode.c:1985) 
[ 36.385599][ T964] ? mem_cgroup_oom_trylock (mm/memcontrol.c:2585) 
[ 36.391016][ T964] __generic_file_write_iter (mm/filemap.c:3884) 
[ 36.396524][ T964] blkdev_write_iter (block/fops.c:558) 
[ 36.401358][ T964] ? blkdev_bio_end_io (block/fops.c:527) 
[ 36.406415][ T964] ? folio_add_lru (mm/swap.c:468) 
[ 36.410875][ T964] ? do_anonymous_page (mm/memory.c:3800) 
[ 36.415947][ T964] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[ 36.420496][ T964] ? __handle_mm_fault (mm/memory.c:4549 mm/memory.c:4686) 
[ 36.425670][ T964] ? new_sync_read (fs/read_write.c:493) 
[ 36.430317][ T964] ? enqueue_hrtimer (kernel/time/hrtimer.c:1092 (discriminator 3)) 
[ 36.435041][ T964] vfs_write (fs/read_write.c:590) 
[ 36.439168][ T964] ksys_write (fs/read_write.c:643) 
[ 36.443280][ T964] ? __ia32_sys_read (fs/read_write.c:633) 
[ 36.447914][ T964] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[ 36.452202][ T964] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[   36.457985][  T964] RIP: 0033:0x7fbfda4fc504
[ 36.462271][ T964] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
All code
========
   0:	00 f7                	add    %dh,%bh
   2:	d8 64 89 02          	fsubs  0x2(%rcx,%rcx,4)
   6:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
   d:	eb b3                	jmp    0xffffffffffffffc2
   f:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  16:	48 8d 05 f9 61 0d 00 	lea    0xd61f9(%rip),%rax        # 0xd6216
  1d:	8b 00                	mov    (%rax),%eax
  1f:	85 c0                	test   %eax,%eax
  21:	75 13                	jne    0x36
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 54                	ja     0x86
  32:	c3                   	retq   
  33:	0f 1f 00             	nopl   (%rax)
  36:	41 54                	push   %r12
  38:	49 89 d4             	mov    %rdx,%r12
  3b:	55                   	push   %rbp
  3c:	48 89 f5             	mov    %rsi,%rbp
  3f:	53                   	push   %rbx

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 54                	ja     0x5c
   8:	c3                   	retq   
   9:	0f 1f 00             	nopl   (%rax)
   c:	41 54                	push   %r12
   e:	49 89 d4             	mov    %rdx,%r12
  11:	55                   	push   %rbp
  12:	48 89 f5             	mov    %rsi,%rbp
  15:	53                   	push   %rbx


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.16.0-rc3-00091-g4575a8b36e3a" of type "text/plain" (177757 bytes)

View attachment "job-script" of type "text/plain" (5623 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (42200 bytes)

View attachment "blktests" of type "text/plain" (2894 bytes)

View attachment "job.yaml" of type "text/plain" (4594 bytes)

View attachment "reproduce" of type "text/plain" (85 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ