lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220509011511.GA21666@xsang-OptiPlex-9020>
Date:   Mon, 9 May 2022 09:15:11 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Logan Gunthorpe <logang@...tatee.com>
Cc:     Stephen Bates <sbates@...thlin.com>, lkp@...ts.01.org,
        lkp@...el.com, LKML <linux-kernel@...r.kernel.org>
Subject: [md/raid5]  e55e386993:
 BUG:sleeping_function_called_from_invalid_context_at_block/blk-core.c



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: e55e386993704ea537d58cf0ded57678f70b1a77 ("md/raid5-cache: Add RCU protection to conf->log accesses")
https://github.com/sbates130272/linux-p2pmem.git md-bug

in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20220325
with following parameters:

	disk: 1HDD
	test_prefix: 20
	ucode: 0x28



on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


[   89.177263][ T1008] BUG: sleeping function called from invalid context at block/blk-core.c:781
[   89.185915][ T1008] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1008, name: md0_raid5
[   89.194741][ T1008] preempt_count: 1, expected: 0
[   89.199480][ T1008] CPU: 3 PID: 1008 Comm: md0_raid5 Not tainted 5.18.0-rc3-00094-ge55e38699370 #1
[   89.208460][ T1008] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[   89.217939][ T1008] Call Trace:
[   89.221092][ T1008]  <TASK>
[ 89.223893][ T1008] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 89.228269][ T1008] __might_resched.cold (kernel/sched/core.c:9740) 
[ 89.233322][ T1008] submit_bio_noacct (block/blk-core.c:781) 
[ 89.238113][ T1008] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274) 
[ 89.242327][ T1008] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 89.248636][ T1008] ? load_balance (kernel/sched/fair.c:10902) 
[ 89.253356][ T1008] handle_active_stripes+0x359/0x580 raid456
[ 89.260422][ T1008] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 89.266369][ T1008] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 89.271246][ T1008] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169) 
[ 89.276211][ T1008] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 89.282151][ T1008] raid5d (drivers/md/raid5.c:6551) raid456
[ 89.286881][ T1008] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 89.292367][ T1008] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) 
[ 89.297680][ T1008] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 89.303356][ T1008] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407) 
[ 89.307716][ T1008] md_thread (drivers/md/md.c:7902) 
[ 89.311833][ T1008] ? super_90_load (drivers/md/md.c:909) 
[ 89.317067][ T1008] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 89.322769][ T1008] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) 
[ 89.328543][ T1008] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) 
[ 89.333352][ T1008] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1)) 
[ 89.337460][ T1008] ? super_90_load (drivers/md/md.c:909) 
[ 89.342719][ T1008] kthread (kernel/kthread.c:376) 
[ 89.346671][ T1008] ? kthread_complete_and_exit (kernel/kthread.c:331) 
[ 89.352157][ T1008] ret_from_fork (arch/x86/entry/entry_64.S:304) 
[   89.356449][ T1008]  </TASK>
[   89.658538][ T1054] md0: detected capacity change from 107520 to 0
[   89.664741][ T1054] md: md0 stopped.
[   90.016999][ T1066] md/raid:md0: device loop2 operational as raid disk 2
[   90.023748][ T1066] md/raid:md0: device loop1 operational as raid disk 1
[   90.030558][ T1066] md/raid:md0: device loop0 operational as raid disk 0
[   90.038551][ T1066] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
[   90.047912][ T1066] md/raid:md0: starting from clean shutdown
[   90.070777][ T1066] md0: detected capacity change from 0 to 107520
[   90.104394][ T1076] md: recovery of RAID array md0
[   92.038319][ T1076] md: md0: recovery done.
[   92.466320][ T1073] BUG: sleeping function called from invalid context at block/blk-core.c:781
[   92.474957][ T1073] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1073, name: md0_raid5
[   92.483819][ T1073] preempt_count: 1, expected: 0
[   92.488565][ T1073] CPU: 5 PID: 1073 Comm: md0_raid5 Tainted: G        W         5.18.0-rc3-00094-ge55e38699370 #1
[   92.498942][ T1073] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[   92.508452][ T1073] Call Trace:
[   92.511620][ T1073]  <TASK>
[ 92.514421][ T1073] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 92.518805][ T1073] __might_resched.cold (kernel/sched/core.c:9740) 
[ 92.523875][ T1073] submit_bio_noacct (block/blk-core.c:781) 
[ 92.528677][ T1073] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274) 
[ 92.532879][ T1073] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 92.539176][ T1073] handle_active_stripes+0x359/0x580 raid456
[ 92.546245][ T1073] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 92.552198][ T1073] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 92.557113][ T1073] ? blk_start_plug_nr_ios (block/blk-core.c:1240) 
[ 92.562443][ T1073] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 92.567342][ T1073] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169) 
[ 92.572339][ T1073] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 92.578285][ T1073] raid5d (drivers/md/raid5.c:6551) raid456
[ 92.583002][ T1073] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 92.588503][ T1073] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) 
[ 92.593832][ T1073] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 92.599500][ T1073] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407) 
[ 92.603877][ T1073] md_thread (drivers/md/md.c:7902) 
[ 92.607985][ T1073] ? super_90_load (drivers/md/md.c:909) 
[ 92.613219][ T1073] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 92.618895][ T1073] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) 
[ 92.624658][ T1073] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) 
[ 92.629462][ T1073] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1)) 
[ 92.633568][ T1073] ? super_90_load (drivers/md/md.c:909) 
[ 92.638803][ T1073] kthread (kernel/kthread.c:376) 
[ 92.642747][ T1073] ? kthread_complete_and_exit (kernel/kthread.c:331) 
[ 92.648245][ T1073] ret_from_fork (arch/x86/entry/entry_64.S:304) 
[   92.652559][ T1073]  </TASK>
[   92.948929][ T1113] md0: detected capacity change from 107520 to 0
[   92.955175][ T1113] md: md0 stopped.
[   93.026649][ T1120] debugfs: Directory 'md0' with parent 'block' already present!
[   93.034818][ T1120] md: md0 stopped.
[   93.038880][ T1120] md: md0 stopped.
[   93.072111][ T1130] md: md0 stopped.
[   93.130654][ T1135] debugfs: Directory 'md0' with parent 'block' already present!
[   93.139203][ T1135] md: md0 stopped.
[   93.147556][ T1135] md/raid:md0: device loop0 operational as raid disk 0
[   93.154322][ T1135] md/raid:md0: device loop3 operational as raid disk 3
[   93.161738][ T1135] md/raid:md0: device loop2 operational as raid disk 2
[   93.168536][ T1135] md/raid:md0: device loop1 operational as raid disk 1
[   93.169095][  T372] /lkp/benchmarks/mdadm-selftests/tests/20raid5journal... IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
[   93.175316][  T372]
[   93.177818][ T1135] md/raid:md0: journal disk is missing, force array readonly
[   93.197775][ T1135] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[   93.206200][ T1135] md0: detected capacity change from 0 to 107520
[   93.280028][ T1155] md0: detected capacity change from 107520 to 0
[   93.286281][ T1155] md: md0 stopped.
[   93.655138][ T1164] md/raid:md0: device loop2 operational as raid disk 2
[   93.661910][ T1164] md/raid:md0: device loop1 operational as raid disk 1
[   93.668660][ T1164] md/raid:md0: device loop0 operational as raid disk 0
[   93.676532][ T1164] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
[   93.685650][ T1164] md/raid:md0: starting from clean shutdown
[   93.717900][ T1164] md0: detected capacity change from 0 to 107520
[   93.784831][ T1174] md: recovery of RAID array md0
[   96.004310][ T1174] md: md0: recovery done.
[   96.655923][ T1171] BUG: sleeping function called from invalid context at block/blk-core.c:781
[   96.664595][ T1171] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1171, name: md0_raid5
[   96.673412][ T1171] preempt_count: 1, expected: 0
[   96.678177][ T1171] CPU: 0 PID: 1171 Comm: md0_raid5 Tainted: G        W         5.18.0-rc3-00094-ge55e38699370 #1
[   96.688614][ T1171] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[   96.698108][ T1171] Call Trace:
[   96.701262][ T1171]  <TASK>
[ 96.704070][ T1171] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 96.708439][ T1171] __might_resched.cold (kernel/sched/core.c:9740) 
[ 96.713508][ T1171] submit_bio_noacct (block/blk-core.c:781) 
[ 96.718319][ T1171] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274) 
[ 96.722548][ T1171] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 96.728849][ T1171] ? load_balance (kernel/sched/fair.c:10902) 
[ 96.733562][ T1171] handle_active_stripes+0x359/0x580 raid456
[ 96.740637][ T1171] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 96.746606][ T1171] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 96.751503][ T1171] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169) 
[ 96.756487][ T1171] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 96.762441][ T1171] raid5d (drivers/md/raid5.c:6551) raid456
[ 96.767174][ T1171] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 96.772676][ T1171] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) 
[ 96.778005][ T1171] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 96.783683][ T1171] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407) 
[ 96.788068][ T1171] md_thread (drivers/md/md.c:7902) 
[ 96.792176][ T1171] ? super_90_load (drivers/md/md.c:909) 
[ 96.797419][ T1171] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161) 
[ 96.803087][ T1171] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) 
[ 96.808859][ T1171] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) 
[ 96.813678][ T1171] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1)) 
[ 96.817786][ T1171] ? super_90_load (drivers/md/md.c:909) 
[ 96.823021][ T1171] kthread (kernel/kthread.c:376) 
[ 96.826955][ T1171] ? kthread_complete_and_exit (kernel/kthread.c:331) 
[ 96.832459][ T1171] ret_from_fork (arch/x86/entry/entry_64.S:304) 
[   96.836740][ T1171]  </TASK>
[   97.142371][ T1219] md0: detected capacity change from 107520 to 0
[   97.148616][ T1219] md: md0 stopped.
[   97.303158][ T1236] md/raid:md0: device loop3 operational as raid disk 3
[   97.309921][ T1236] md/raid:md0: device loop2 operational as raid disk 2
[   97.316662][ T1236] md/raid:md0: device loop1 operational as raid disk 1
[   97.323396][ T1236] md/raid:md0: device loop0 operational as raid disk 0
[   97.331323][ T1236] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[   97.341004][ T1236] md/raid:md0: recovering 0 data-only stripes and 1 data-parity stripes
[   97.364607][ T1236] md0: detected capacity change from 0 to 107520
[   97.595426][ T1246] md0: detected capacity change from 107520 to 0
[   97.601652][ T1246] md: md0 stopped.
[   97.681071][ T1250] debugfs: Directory 'md0' with parent 'block' already present!
[   97.735167][ T1262] md/raid:md0: device loop3 operational as raid disk 3
[   97.741921][ T1262] md/raid:md0: device loop2 operational as raid disk 2
[   97.748656][ T1262] md/raid:md0: device loop1 operational as raid disk 1
[   97.755383][ T1262] md/raid:md0: device loop0 operational as raid disk 0
[   97.763326][ T1262] md/raid:md0: journal disk is missing, force array readonly
[   97.770602][ T1262] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[   97.778987][ T1262] md0: detected capacity change from 0 to 107520
[   97.847467][ T1271] md0: detected capacity change from 107520 to 0
[   97.853688][ T1271] md: md0 stopped.
[   97.907870][  T372]
[   97.907881][  T372]
[   97.913480][  T372] 	ERROR: dmesg prints errors when testing 20raid5journal!
[   97.913486][  T372]
[   97.922918][  T372]
[   97.922923][  T372]
[   97.933286][  T372] FAILED - see /var/tmp/20raid5journal.log and /var/tmp/fail20raid5journal.log for details
[   97.933317][  T372]
[   99.026901][  T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&job_state=post_run -O /dev/null
[   99.026912][  T372]
[  100.754729][  T372] kill 580 vmstat --timestamp -n 10
[  100.754738][  T372]
[  100.763240][  T372] kill 578 dmesg --follow --decode
[  100.763247][  T372]
[  100.774962][  T372] wait for background processes: 587 582 oom-killer meminfo
[  100.774969][  T372]
[  106.728036][  T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&loadavg=1.00%200.40%200.15%201/190%201421&start_time=1651964881&end_time=1651964947&version=/lkp/lkp/.src-20220507-195649:72233616:be1045ec3& -O /dev/null
[  106.728046][  T372]
[  106.779860][  T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&job_state=finished -O /dev/null
[  106.779867][  T372]
[  106.820412][  T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-post-run?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml -O /dev/null
[  106.820419][  T372]
[  107.521530][  T372] LKP: stdout: 358: LKP: rebooting forcely
[  107.521538][  T372]
[  107.564739][  T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-wtmp?tbox_name=lkp-hsw-d05&tbox_state=rebooting -O /dev/null
[  107.564749][  T372]
[  109.025846][    T1] kvm: exiting hardware virtualization
[  109.032774][    T1] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[  109.038801][    T1] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  109.062244][    T1] e1000e: EEE TX LPI TIMER: 00000011


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://01.org/lkp



View attachment "config-5.18.0-rc3-00094-ge55e38699370" of type "text/plain" (166090 bytes)

View attachment "job-script" of type "text/plain" (5603 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (19008 bytes)

View attachment "mdadm-selftests" of type "text/plain" (1274 bytes)

View attachment "job.yaml" of type "text/plain" (4714 bytes)

View attachment "reproduce" of type "text/plain" (98 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ