lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 24 Apr 2018 18:27:00 +0000
From:   Bart Van Assche <Bart.VanAssche@....com>
To:     "pmenzel+linux-block@...gen.mpg.de" 
        <pmenzel+linux-block@...gen.mpg.de>,
        "axboe@...nel.dk" <axboe@...nel.dk>
CC:     "jejb@...ux.vnet.ibm.com" <jejb@...ux.vnet.ibm.com>,
        "regressions@...mhuis.info" <regressions@...mhuis.info>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: Regression 4.17-rc1: SSD doesn properly resume causing system
 hang (NULL pointer dereference)

On Tue, 2018-04-24 at 19:37 +0200, Paul Menzel wrote:
> On 04/24/18 19:31, Bart Van Assche wrote:
> Here it is, pasted as citation, as otherwise Thunderbird would wrap the 
> line.
> 
> > (gdb) disas blk_set_runtime_active
> > Dump of assembler code for function blk_set_runtime_active:
> >    0xc1518610 <+0>:	call   0xc106ac9c <__fentry__>
> >    0xc1518615 <+5>:	push   %ebp
> >    0xc1518616 <+6>:	mov    %esp,%ebp
> >    0xc1518618 <+8>:	sub    $0x14,%esp
> >    0xc151861b <+11>:	mov    %ebx,-0xc(%ebp)
> >    0xc151861e <+14>:	mov    %eax,%ebx
> >    0xc1518620 <+16>:	mov    %gs:0x14,%eax
> >    0xc1518626 <+22>:	mov    %eax,-0x10(%ebp)
> >    0xc1518629 <+25>:	xor    %eax,%eax
> >    0xc151862b <+27>:	test   %ebx,%ebx
> >    0xc151862d <+29>:	mov    %esi,-0x8(%ebp)
> >    0xc1518630 <+32>:	mov    %edi,-0x4(%ebp)
> >    0xc1518633 <+35>:	je     0xc15186b3 <blk_set_runtime_active+163>
> >    0xc1518635 <+37>:	mov    0xfc(%ebx),%eax
> >    0xc151863b <+43>:	call   0xc1a4b920 <_raw_spin_lock_irq>
> >    0xc1518640 <+48>:	mov    0x150(%ebx),%esi
> >    0xc1518646 <+54>:	xor    %eax,%eax
> >    0xc1518648 <+56>:	mov    0xc1ca7d20,%edi
> >    0xc151864e <+62>:	mov    %eax,0x154(%ebx)
> >    0xc1518654 <+68>:	cmp    $0xffffff0c,%esi
> >    0xc151865a <+74>:	mov    %edi,-0x14(%ebp)
> >    0xc151865d <+77>:	je     0xc15186a5 <blk_set_runtime_active+149>
> >    0xc151865f <+79>:	mov    %edi,0xf4(%esi)

The e-mail at the start of this e-mail thread shows that %esi == NULL at
the time of the crash and also that the crash occurred at offset 79 (0x4f)
in this function. I think that means that the crash occurred in the following
code: pm_request_autosuspend(q->dev) and also that this means that q->dev ==
NULL. Can you test the (untested) patch below?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 57cae47ab1c2..b029a94a1e66 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3272,7 +3272,6 @@ static void sd_probe_async(struct work_struct *work)
 		gd->events |= DISK_EVENT_MEDIA_CHANGE;
 	}
 
-	blk_pm_runtime_init(sdp->request_queue, dev);
 	device_add_disk(dev, gd);
 	if (sdkp->capacity)
 		sd_dif_config_host(sdkp);
@@ -3390,6 +3389,8 @@ static int sd_probe(struct device *dev)
 	get_device(dev);
 	dev_set_drvdata(dev, sdkp);
 
+	blk_pm_runtime_init(sdp->request_queue, dev);
+
 	get_device(&sdkp->dev);	/* prevent release before sd_probe_async() */
 	WARN_ON_ONCE(!queue_work(system_unbound_wq, &sdkp->probe_work));
 

Thanks,

Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ