linux-kernel - Re: [PATCH] nvme: fix doulbe irq free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200917152656.GA3766534@dhcp-10-100-145-180.wdl.wdc.com>
Date:   Thu, 17 Sep 2020 08:26:56 -0700
From:   Keith Busch <kbusch@...nel.org>
To:     Tong Zhang <ztong0001@...il.com>
Cc:     Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme: fix doulbe irq free

On Thu, Sep 17, 2020 at 11:22:54AM -0400, Tong Zhang wrote:
> On Thu, Sep 17, 2020 at 4:30 AM Christoph Hellwig <hch@....de> wrote:
> >
> > On Wed, Sep 16, 2020 at 11:37:00AM -0400, Tong Zhang wrote:
> > > the irq might already been released before reset work can run
> >
> > If it is we have a problem with the state machine, as we shouldn't
> > have two resets running at the same time.  Can you share your reproducer?
> >
> 
> Hi Christoph,
> Thanks for replying.
> 
> It is possible that pci_free_irq() can be called twice for the same IRQ,
> the first call can be
> 
> [  124.413113] nvme nvme0: I/O 6 QID 0 timeout, disable controller
> [  124.414317] Workqueue: kblockd blk_mq_timeout_work
> [  124.414317] Call Trace:
> [  124.414317]  dump_stack+0x7d/0xb0
> [  124.414317]  nvme_suspend_queue.cold+0x11/0x58
> [  124.414317]  nvme_dev_disable+0x116/0x7b0
> [  124.414317]  nvme_timeout+0x309/0x320
> [  124.414317]  ? nvme_shutdown+0x40/0x40
> [  124.414317]  ? load_balance+0xe76/0x1450
> [  124.414317]  ? deref_stack_reg+0x40/0x40
> [  124.414317]  ? sched_clock_local+0x99/0xc0
> [  124.414317]  blk_mq_check_expired+0x313/0x340
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? _find_next_bit.constprop.0+0x3e/0xf0
> [  124.414317]  blk_mq_queue_tag_busy_iter+0x2e9/0x4a0
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? blk_mq_all_tag_iter+0x10/0x10
> [  124.414317]  ? _raw_write_lock_irqsave+0xd0/0xd0
> [  124.414317]  ? run_rebalance_domains+0x80/0x80
> [  124.414317]  ? blk_mq_freeze_queue_wait+0x180/0x180
> [  124.414317]  ? debug_object_deactivate.part.0+0x1c4/0x210
> [  124.414317]  blk_mq_timeout_work+0xaa/0x1f0
> [  124.414317]  ? __blk_mq_end_request+0x1f0/0x1f0
> [  124.414317]  ? __schedule+0x581/0xc40
> [  124.414317]  ? read_word_at_a_time+0xe/0x20
> [  124.414317]  ? strscpy+0xbf/0x190
> [  124.414317]  process_one_work+0x4ad/0x7e0
> [  124.414317]  worker_thread+0x73/0x690
> [  124.414317]  ? process_one_work+0x7e0/0x7e0
> [  124.414317]  kthread+0x199/0x1f0
> [  124.414317]  ? kthread_create_on_node+0xd0/0xd0
> [  124.414317]  ret_from_fork+0x22/0x30
> 
> And the second call can be
> [  149.763974] nvme nvme0: Failed to read smart log (error -4)
> [  149.774307] Trying to free already-free IRQ 10
> [  149.774385] Call Trace:
> [  149.774385]  pci_free_irq+0x13/0x20
> [  149.774385]  nvme_reset_work.cold+0x1be/0xa78
> [  149.774385]  ? nvme_probe+0x8c0/0x8c0
> [  149.774385]  ? check_preempt_curr+0x9d/0xf0
> [  149.774385]  ? ttwu_do_wakeup.isra.0+0x176/0x190
> [  149.774385]  ? try_to_wake_up+0x37c/0x900
> [  149.774385]  ? migration_cpu_stop+0x1e0/0x1e0
> [  149.774385]  ? __schedule+0x581/0xc40
> [  149.774385]  ? read_word_at_a_time+0xe/0x20
> [  149.774385]  ? strscpy+0xbf/0x190
> [  149.774385]  process_one_work+0x4ad/0x7e0
> [  149.774385]  worker_thread+0x73/0x690
> [  149.774385]  ? process_one_work+0x7e0/0x7e0
> [  149.774385]  kthread+0x199/0x1f0
> [  149.774385]  ? kthread_create_on_node+0xd0/0xd0
> [  149.774385]  ret_from_fork+0x22/0x30
> 
> and clearly something is not working as expected.

And the problem is that nvme_hwmon_init() doesn't return an error, so it
returns to continue initialization on a disabled controller. That's the
bug that needs to be fixed.