lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Nov 2017 15:12:31 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Fengguang Wu <fengguang.wu@...el.com>
Cc:     IDE-ML <linux-ide@...r.kernel.org>, Christoph Hellwig <hch@....de>,
        Tejun Heo <tj@...nel.org>, Hannes Reinecke <hare@...e.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Johannes Thumshirn <jthumshirn@...e.de>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        linux-scsi <linux-scsi@...r.kernel.org>,
        James Bottomley <James.Bottomley@...senpartnership.com>
Subject: Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid
 context at kernel/locking/mutex.c:238

On Mon, Nov 6, 2017 at 2:53 PM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>
> The same dmesg happen to contain another libata related bug. Attached again.
> It's rare and in the error handling path, so unlikely a new regression.
>
> [   49.608280] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
> [   49.647821]  mutex_lock+0x20/0x50
> [   49.651443]  kernfs_find_and_get_ns+0x23/0x60
> [   49.656104]  sysfs_notify+0x77/0x90
> [   49.659900]  scsi_device_set_state+0x63/0x150
> [   49.664559]  ata_scsi_offline_dev+0x1c/0x30 [libata]
> [   49.669817]  ata_eh_detach_dev+0x3b/0xb0 [libata]

ata_eh_detach_dev() does

        spin_lock_irqsave(ap->lock, flags);

and then does

        if (ata_scsi_offline_dev(dev)) {
                dev->flags |= ATA_DFLAG_DETACHED;
                ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
        }

inside that spinlock. And this code is not new - it has done it since
2006 or so.

But it does seem to be a new regression in 4.14, caused by commit
8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
that's what added the sysfs_notify() call to scsi_device_set_state(),
which made that spinlock be a problem.

That commit came in through the SCSI merge this merge window, and it
seems to still revert cleanly.

So I do suspect that by now we should just revert that commit. It's
not clear why that state attribute should be pollable, and the new
code is clearly very much buggy.

Hannes, Martin?

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ