lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Jan 2016 12:14:42 -0600
From:	David Milburn <dmilburn@...hat.com>
To:	Dmitry Vyukov <dvyukov@...gle.com>
Cc:	syzkaller <syzkaller@...glegroups.com>, linux-ide@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Jeff Garzik <jgarzik@...hat.com>,
	Sergei Shtylyov <sshtylyov@...mvista.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: ata: BUG in ata_sff_hsm_move


Hi Tejun,

Can ata_sff_hsm_move grab the lock and save off the task_state,
like this patch?

Thanks,
David

On Fri, Jan 29, 2016 at 02:40:33PM +0100, Dmitry Vyukov wrote:
> On Fri, Jan 29, 2016 at 2:18 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo <tj@...nel.org> wrote:
> >> Hello, Dmitry.
> >>
> >> On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote:
> >>> > Hmmm... the port interrupt handler checks for IDLE before calling into
> >>> > hsm_move, so the only explanation would be that something is resetting
> >>> > it to IDLE inbetween.  ce7514526742 ("libata: prevent HSM state change
> >>> > race between ISR and PIO") describes and fixes the same problem.  The
> >>> > fix seems correct and I can't find anywhere else where this can
> >>> > happen.  :(
> >>> >
> >>> > Can you please post the kernel log leading to the BUG?  Also, I don't
> >>> > think that condition needs to be BUG.  I'll change it to WARN.
> >>>
> >>> Here are two logs, in both cases no kernel messages prior to the bug:
> >>> https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt
> >>> https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt
> >>
> >> lol, this is kinda embarrassing.  It looks like the poll path wasn't
> >> doing any locking.  Can you please verify the following patch at least
> >> doesn't crash the machine immediately and if so keep it applied to the
> >> test kernel so that we can verify that the problem actually goes away?
> >
> >
> > Great that you managed to debug it without a repro!
> > I've applied this patch to my tree and will rerun fuzzer. I will
> > notify you if I see this warning again.
> > Thanks
> 
> It now deadlocks at this stack. It is probably not OK to call
> ata_sff_hsm_move holding the spinlock.
> 
> [    8.168809] NMI backtrace for cpu 2
> [    8.168809] CPU: 2 PID: 17 Comm: kworker/2:0 Not tainted 4.5.0-rc1+ #302
> [    8.168809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Bochs 01/01/2011
> [    8.168809] Workqueue: ata_sff ata_sff_pio_task
> [    8.168809] task: ffff88006d24c740 ti: ffff88006d090000 task.ti:
> ffff88006d090000
> [    8.168809] RIP: 0010:[<ffffffff82c0e388>]  [<ffffffff82c0e388>]
> delay_tsc+0x18/0x70
> [    8.168809] RSP: 0000:ffff88006d097a60  EFLAGS: 00000086
> [    8.168809] RAX: 000000008edaeede RBX: ffff880036e88028 RCX: 000000000000001e
> [    8.168809] RDX: 0000000000000033 RSI: 000000338edaee9c RDI: 0000000000000001
> [    8.168809] RBP: ffff88006d097a60 R08: 0000000000000002 R09: 0000000000000000
> [    8.168809] R10: fffffbfff11baf82 R11: 1ffffffff134edc5 R12: ffff880036e88038
> [    8.168809] R13: 000000009a9d2d40 R14: ffff880036e88030 R15: 0000000099383d98
> [    8.168809] FS:  0000000000000000(0000) GS:ffff88006d600000(0000)
> knlGS:0000000000000000
> [    8.168809] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    8.168809] CR2: 00000000ffffffff CR3: 0000000007a4b000 CR4: 00000000000006e0
> [    8.168809] Stack:
> [    8.168809]  ffff88006d097a70 ffffffff82c0e2da ffff88006d097aa8
> ffffffff814661a9
> [    8.168809]  ffff880036e88028 0000000000000082 ffff880036e90008
> ffff880036e90010
> [    8.168809]  0000000000000001 ffff88006d097ad0 ffffffff86652c47
> ffffffff83a80c4c
> [    8.168809] Call Trace:
> [    8.168809]  [<ffffffff82c0e2da>] __delay+0xa/0x10
> [    8.168809]  [<ffffffff814661a9>] do_raw_spin_lock+0x149/0x2b0
> [    8.168809]  [<ffffffff86652c47>] _raw_spin_lock_irqsave+0xa7/0xd0
> [    8.168809]  [<ffffffff83a80c4c>] ? ata_hsm_qc_complete+0x12c/0x420
> [    8.168809]  [<ffffffff83a80c4c>] ata_hsm_qc_complete+0x12c/0x420
> [    8.168809]  [<ffffffff83a82f42>] ata_sff_hsm_move+0x1e2/0x1c60
> [    8.168809]  [<ffffffff83a85f17>] ? ata_sff_pio_task+0x87/0x530
> [    8.168809]  [<ffffffff83a86296>] ata_sff_pio_task+0x406/0x530
> [    8.168809]  [<ffffffff813a2386>] process_one_work+0x796/0x1440
> [    8.168809]  [<ffffffff813a22ba>] ? process_one_work+0x6ca/0x1440
> [    8.168809]  [<ffffffff813d6190>] ? finish_task_switch+0x120/0x5f0
> [    8.168809]  [<ffffffff813a1bf0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0
> [    8.168809]  [<ffffffff813a310b>] worker_thread+0xdb/0xfc0
> [    8.168809]  [<ffffffff813b637f>] kthread+0x23f/0x2d0
> [    8.168809]  [<ffffffff813a3030>] ? process_one_work+0x1440/0x1440
> [    8.168809]  [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
> [    8.168809]  [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
> [    8.168809]  [<ffffffff8665362f>] ret_from_fork+0x3f/0x70
> [    8.168809]  [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
> 
> 
> 
> 
> >> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> >> index 608677d..6991efc 100644
> >> --- a/drivers/ata/libata-sff.c
> >> +++ b/drivers/ata/libata-sff.c
> >> @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work)
> >>         u8 status;
> >>         int poll_next;
> >>
> >> +       spin_lock_irq(ap->lock);
> >> +
> >>         BUG_ON(ap->sff_pio_task_link == NULL);
> >>         /* qc can be NULL if timeout occurred */
> >>         qc = ata_qc_from_tag(ap, link->active_tag);
> >>         if (!qc) {
> >>                 ap->sff_pio_task_link = NULL;
> >> -               return;
> >> +               goto out_unlock;
> >>         }
> >>
> >>  fsm_start:
> >> @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work)
> >>          */
> >>         status = ata_sff_busy_wait(ap, ATA_BUSY, 5);
> >>         if (status & ATA_BUSY) {
> >> +               spin_unlock_irq(ap->lock);
> >>                 ata_msleep(ap, 2);
> >> +               spin_lock_irq(ap->lock);
> >> +
> >>                 status = ata_sff_busy_wait(ap, ATA_BUSY, 10);
> >>                 if (status & ATA_BUSY) {
> >>                         ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE);
> >> -                       return;
> >> +                       goto out_unlock;
> >>                 }
> >>         }
> >>
> >> @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work)
> >>          */
> >>         if (poll_next)
> >>                 goto fsm_start;
> >> +out_unlock:
> >> +       spin_unlock_irq(ap->lock);
> >>  }
> >>
> >>  /**
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@...glegroups.com.
> >> For more options, visit https://groups.google.com/d/optout.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

View attachment "libata-sff-check-task_state-hsm-patch" of type "text/plain" (894 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ