lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1384409722.2062.4.camel@dabdike.int.hansenpartnership.com>
Date:	Wed, 13 Nov 2013 22:15:22 -0800
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	David Decotigny <decot@...glers.com>
Cc:	Bart Van Assche <bvanassche@....org>, linux-scsi@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] scsi: avoid use of reclaimed reference

On Wed, 2013-11-13 at 18:50 -0800, David Decotigny wrote:
> Hello,
> 
> Thank you for looking into this. I could reproduce the oops on some
> Dell Poweredge R720 with the following config flags, otherwise the
> problem goes un-noticed:
> 
> CONFIG_DEBUG_PAGEALLOC=y
> CONFIG_DEBUG_SLAB=y
> 
> [    4.924033] BUG: unable to handle kernel paging request at ffff88000004dd10
> [    4.931823] IP: [<ffffffff8139797f>] __scsi_scan_target+0x3ef/0x6f0
> [    4.938846] PGD 1ba1067 PUD 1ba2067 PMD 1ba3067 PTE 800000000004d060
> [    4.945985] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [    4.951074] Modules linked in:
> [    4.954492] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.0-smp-scsi01 #1
> 
> This points to this line on the return path of scsi_report_lun_scan:
> if (scsi_device_created(sdev))
> 
> Kernel is jejb/scsi/for-next at 2aee240c68ed32 and I could reproduce
> the bug with other 3.x kernels on same hardware. For me, it is 100%
> reproducible.
> 
> The ref counter values I indicated in my previous email are the result
> of a basic instrumentation. It shows that ref count drops from 3 to 1
> as a result of scsi_probe_and_add_lun(). I believe this is because the
> latter calls __scsi_remove_device(sdev).
> 
> Now, if sdev reclaiming is not allowed to happen at the end of
> scsi_report_lun_scan by design because someone else is expected to
> hold a reference to it, then I'd be happy to add a BUG_ON() on the
> return path and explicit the post-condition in the function
> documentation, and also try to find out where a ref is killed by
> mistake. However, if sdev relcaiming at the end of
> scsi_report_lun_scan is allowed, then I'd argue that the "if
> (scsi_device_created(sdev))" on the potentially reclaimed sdev is not
> right, that's why I was proposing this patch.

Heh, perhaps this is why bug reports are so useful.  Your patch told me
where you thought the bug was but this report actually tells me where
the bug is: it's in scsi_probe_and_add_luns().  There's no way that
routine should reduce the refcount of a struct scsi_device.  It should
either leave it as it is, or if an sdevp is specified, increment the
reference and return the sdev in sdevp.  This should be enough
information for you to come up with the fix.  Please document it with
the actual bug details like you have above and I'll apply it.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ