linux-kernel - Re: [PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate sdebug_queued

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bdbfbbc-bac1-84a1-5f50-33a443e3292a@oracle.com>
Date:   Wed, 12 Apr 2023 11:00:53 +0100
From:   John Garry <john.g.garry@...cle.com>
To:     kernel test robot <yujie.liu@...el.com>
Cc:     oe-lkp@...ts.linux.dev, lkp@...el.com, linux-scsi@...r.kernel.org,
        jejb@...ux.ibm.com, martin.petersen@...cle.com,
        dgilbert@...erlog.com, linux-kernel@...r.kernel.org,
        bvanassche@....org
Subject: Re: [PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate
 sdebug_queued_cmd

On 07/04/2023 05:18, kernel test robot wrote:
> Hello,
> 
> kernel test robot noticed "BUG_sdebug_queued_cmd(Tainted:G_S):Objects_remaining_in_sdebug_queued_cmd_on__kmem_cache_shutdown()" on:
> 
> commit: f28c8a7d0f7a705395439889a52b09e2b61ea422 ("[PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
> url:https://github.com/intel-lab-lkp/linux/commits/John-Garry/scsi-scsi_debug-Fix-check-for-sdev-queue-full/20230327-154448
> base:https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git  for-next
> patch link:https://lore.kernel.org/all/20230327074310.1862889-7-john.g.garry@oracle.com/
> patch subject: [PATCH v3 06/11] scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
> 
> in testcase: blktests
> version: blktests-x86_64-676d42c-1_20230323
> with following parameters:
> 
> 	disk: 1HDD
> 	test: scsi-group-00
> 
> compiler: gcc-11
> test machine: 16 threads 1 sockets Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (Broadwell-DE) with 48G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 

I don't know how I missed this. Maybe it's because running blktests with 
buildroot initrd is not streamlined.

Anyway, the issue is that we don't properly abort the scsi cmnd in 
scsi_debug_device_reset() after the scsi cmnd timeouts for the 2nd time.

We get away with this in the previous code as all active IOs are 
terminated when the in scsi_debug_exit() -> stop_all_queued(), which was 
not the right thing to do.

I suppose scsi_debug_device_reset() should abort all IO for that sdev 
(which it doesn't do) - I'll look to make that change.

Thanks,
John

> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot<yujie.liu@...el.com>
> | Link:https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.com
> 
> 
> [  101.910746][ T7924] scsi host6: waking up host to restart
> [  101.910751][ T7924] scsi host6: scsi_eh_6: sleeping
> [  101.976012][  T203] Buffer I/O error on dev sdc, logical block 2032, async page read
> [  102.135530][ T8020] sd 6:0:0:0: [sdc] Synchronizing SCSI cache
> [  102.312331][ T8020] =============================================================================
> [  102.322321][ T8020] BUG sdebug_queued_cmd (Tainted: G S                ): Objects remaining in sdebug_queued_cmd on __kmem_cache_shutdown()
> [  102.336810][ T8020] -----------------------------------------------------------------------------
> [  102.336810][ T8020]
> [  102.349880][ T8020] Slab 0x0000000013ac9b84 objects=32 used=1 fp=0x00000000a6dc3cb1 flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> [  102.365549][ T8020] CPU: 4 PID: 8020 Comm: modprobe Tainted: G S                 6.3.0-rc1-00188-gf28c8a7d0f7a #1
> [  102.376919][ T8020] Hardware name: Supermicro SYS-5018D-FN4T/X10SDV-8C-TLN4F, BIOS 1.1 03/02/2016
> [  102.386904][ T8020] Call Trace:
> [  102.391151][ T8020]  <TASK>
> [ 102.395042][ T8020] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
> [ 102.400503][ T8020] slab_err (mm/slub.c:995)
> [ 102.405432][ T8020] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169)
> [ 102.411316][ T8020] ? start_poll_synchronize_srcu (kernel/rcu/srcutree.c:1306)
> [ 102.418070][ T8020] __kmem_cache_shutdown (include/linux/spinlock.h:350 mm/slub.c:4555 mm/slub.c:4586 mm/slub.c:4618)
> [ 102.424308][ T8020] kmem_cache_destroy (mm/slab_common.c:457 mm/slab_common.c:497 mm/slab_common.c:480)
> [ 102.430196][ T8020] scsi_debug_exit (drivers/scsi/scsi_debug.c:7807) scsi_debug
> [ 102.436885][ T8020] __do_sys_delete_module+0x2ea/0x530
> [ 102.444259][ T8020] ? module_flags (kernel/module/main.c:694)
> [ 102.449892][ T8020] ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:227 include/linux/atomic/atomic-instrumented.h:35 fs/file.c:1015)
> [ 102.455439][ T8020] ? __blkcg_punt_bio_submit (block/blk-cgroup.c:1840)
> [ 102.462034][ T8020] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:186 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
> [ 102.467667][ T8020] ? exit_to_user_mode_loop (include/linux/sched.h:2326 include/linux/resume_user_mode.h:61 kernel/entry/common.c:171)
> [ 102.474080][ T8020] ? exit_to_user_mode_prepare (kernel/entry/common.c:203)
> [ 102.480660][ T8020] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> [ 102.486014][ T8020] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> [  102.492844][ T8020] RIP: 0033:0x7f4dddaaa417
> [ 102.498191][ T8020] Code: 73 01 c3 48 8b 0d 79 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 1a 0d 00 f7 d8 64 89 01 48
> All code