linux-kernel - Re: 4.15.14 crash with iscsi target and dvd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180406014644.GA16112@animx.eu.org>
Date:   Thu, 5 Apr 2018 21:46:45 -0400
From:   Wakko Warner <wakko@...mx.eu.org>
To:     Bart Van Assche <Bart.VanAssche@....com>
Cc:     "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "richard.weinberger@...il.com" <richard.weinberger@...il.com>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn from
> > > > the initiator with out problems.  I'll test other kernels between 4.9 and
> > > > 4.14.
> > > 
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find -type
> > > f | xargs cat > /dev/null from the initiator.
> > 
> > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
> 
> Hello Wakko,
> 
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
>   regression. However, I'm not sure whether you are familiar with the bisect
>   process.
> - Or that you identify the command that triggers this crash such that others
>   can reproduce this issue without needing access to your setup.
> 
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Ok, so I tried this, but scsi_print_command doesn't print anything.  I added
a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
if statement above this thinking it might have crashed during WARN_ON_ONCE.
It still didn't print anything.  My printk shows this:
[  36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0

I also had scsi_print_command in the same if block which again didn't print
anything.  Is there some debug option I need to turn on to make it print?  I
tried looking through the code for this and following some of the function
calls but didn't see any config options.

> Subject: [PATCH] Report commands with no physical segments in the system log
> 
> ---
>  drivers/scsi/scsi_lib.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6b6a6705f6e5..74a39db57d49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
>  	bool is_mq = (rq->mq_ctx != NULL);
>  	int error = BLKPREP_KILL;
>  
> -	if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> +	if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> +		scsi_print_command(cmd);
>  		goto err_exit;
> +	}
>  
>  	error = scsi_init_sgtable(rq, &cmd->sdb);
>  	if (error)
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.