linux-kernel - RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <MWHPR03MB26699D4657FB67A535972208BF430@MWHPR03MB2669.namprd03.prod.outlook.com>
Date:   Tue, 7 Feb 2017 02:23:06 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     Hannes Reinecke <hare@...e.com>,
        Bart Van Assche <Bart.VanAssche@...disk.com>,
        "hare@...e.de" <hare@...e.de>, "axboe@...nel.dk" <axboe@...nel.dk>
CC:     "hch@....de" <hch@....de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "jth@...nel.org" <jth@...nel.org>
Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue
 elements

> From: linux-block-owner@...r.kernel.org [mailto:linux-block-
> owner@...r.kernel.org] On Behalf Of Dexuan Cui
> Sent: Friday, February 3, 2017 20:23
> To: Hannes Reinecke <hare@...e.com>; Bart Van Assche
> <Bart.VanAssche@...disk.com>; hare@...e.de; axboe@...nel.dk
> Cc: hch@....de; linux-kernel@...r.kernel.org; linux-block@...r.kernel.org;
> jth@...nel.org
> Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue
> elements
> 
> > From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-
> > owner@...r.kernel.org] On Behalf Of Hannes Reinecke
> > Sent: Wednesday, February 1, 2017 00:15
> > To: Bart Van Assche <Bart.VanAssche@...disk.com>; hare@...e.de;
> > axboe@...nel.dk
> > Cc: hch@....de; linux-kernel@...r.kernel.org; linux-block@...r.kernel.org;
> > jth@...nel.org
> > Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling
> workqueue
> > elements
> >
> > On 01/31/2017 01:31 AM, Bart Van Assche wrote:
> > > On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote:
> > >> @@ -1488,26 +1487,13 @@ static unsigned long
> > disk_events_poll_jiffies(struct gendisk *disk)
> > >>  void disk_block_events(struct gendisk *disk)
> > >>  {
> > >>         struct disk_events *ev = disk->ev;
> > >> -       unsigned long flags;
> > >> -       bool cancel;
> > >>
> > >>         if (!ev)
> > >>                 return;
> > >>
> > >> -       /*
> > >> -        * Outer mutex ensures that the first blocker completes canceling
> > >> -        * the event work before further blockers are allowed to finish.
> > >> -        */
> > >> -       mutex_lock(&ev->block_mutex);
> > >> -
> > >> -       spin_lock_irqsave(&ev->lock, flags);
> > >> -       cancel = !ev->block++;
> > >> -       spin_unlock_irqrestore(&ev->lock, flags);
> > >> -
> > >> -       if (cancel)
> > >> +       if (atomic_inc_return(&ev->block) == 1)
> > >>                 cancel_delayed_work_sync(&disk->ev->dwork);
> > >>
> > >> -       mutex_unlock(&ev->block_mutex);
> > >>  }
> > >
> > > Hello Hannes,
> > >
> > > I have already encountered a few times a deadlock that was caused by the
> > > event checking code so I agree with you that it would be a big step forward
> > > if such deadlocks wouldn't occur anymore. However, this patch realizes a
> > > change that has not been described in the patch description, namely that
> > > disk_block_events() calls are no longer serialized. Are you sure it is safe
> > > to drop the serialization of disk_block_events() calls?
> > >
> > Well, this whole synchronization stuff it a bit weird; I so totally fail
> > to see the rationale for it.
> > But anyway, once we've converted ev->block to atomics I _think_ the
> > mutex_lock can remain; will be checking.
> >
> > Cheers,
> >
> > Hannes
> > --
> 
> Hi, I think I got the same calltrace with today's linux-next (next-20170203).
> 
> The issue happened every time when my Linux virtual machine booted and
> Hannes's patch could NOT help.
> 
> The calltrace is pasted below.
> 
> -- Dexuan
 
Any news on this thread?

The issue is still blocking Linux from booting up normally in my test. :-(

Have we identified the faulty patch?
If so, at least I can try to revert it to boot up.

Thanks,
-- Dexuan