linux-kernel - Re: [Linux kernel bug] INFO: task hung in blk_mq_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cfe6b902-5e2d-415d-afeb-672cafd8d0b7@I-love.SAKURA.ne.jp>
Date: Tue, 14 May 2024 23:45:57 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Sam Sun <samsun1006219@...il.com>, Hillf Danton <hdanton@...a.com>
Cc: linux-kernel@...r.kernel.org, linux-block@...r.kernel.org, axboe@...nel.dk,
        syzkaller-bugs@...glegroups.com, xrivendell7@...il.com
Subject: Re: [Linux kernel bug] INFO: task hung in blk_mq_get_tag

On 2024/05/14 21:07, Sam Sun wrote:
> I tried to run
> 
> # echo 0 > /proc/sys/kernel/hung_task_all_cpu_backtrace
> 
> before running the reproducer, the kernel stops panic. But still, even
> if I terminate the execution of the reproducer, kernel continues
> dumping task hung logs. After setting bung_task_all_cpu_backtrace back
> to 1, it panic immediately during next dump. So I guess it is still a
> task hung instead of general protection fault.

What execute_one() in the reproducer is doing is only

  int fd1 = open("/dev/sg0", O_RDONLY);
  int fd2 = open("/sys/module/sg/parameters/allow_dio", O_RDWR);
  write(fd2, "100\0", 4); // returns 4
  ioctl(fd1, FIBMAP, 0x20000140); // returns 2

 But your hung task report includes device rescan sequence.

 schedule+0x147/0x310 kernel/sched/core.c:6838
 io_schedule+0x87/0x100 kernel/sched/core.c:9044
 blk_mq_get_tag+0x509/0xba0 block/blk-mq-tag.c:187
 __blk_mq_alloc_requests+0xbc1/0x1710 block/blk-mq.c:499
 blk_mq_alloc_request+0x513/0xbc0 block/blk-mq.c:599
 scsi_alloc_request drivers/scsi/scsi_lib.c:1229 [inline]
 scsi_execute_cmd+0x17a/0x1140 drivers/scsi/scsi_lib.c:304
 scsi_vpd_inquiry drivers/scsi/scsi.c:312 [inline]
 scsi_get_vpd_size+0x2e3/0x6b0 drivers/scsi/scsi.c:363
 scsi_get_vpd_buf+0x89/0x460 drivers/scsi/scsi.c:433
 scsi_attach_vpd+0xdc/0x5e0 drivers/scsi/scsi.c:501
 scsi_rescan_device+0xd8/0x290 drivers/scsi/scsi_scan.c:1698
 ata_scsi_dev_rescan+0x1fe/0x3c0 drivers/ata/libata-scsi.c:4764
 process_one_work kernel/workqueue.c:3254 [inline]

Something is triggering this sequence, and writing to allow_dio interface
confuses the "if (sg_allow_dio && ...)" path in sg_start_req()
in drivers/scsi/sg.c ?

What happens if you disable

  sysfd = write(sysfd, input, hash - input + 1);

line (i.e. stop updating sg_allow_dio value) in the reproducer?