linux-kernel - Re: [PATCH RESEND 2/3 v2.6.39-rc7] block: make disk_block_events() properly wait for work cancellation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimj2guK6kw7QSijYZyvoRY+JKgbTg@mail.gmail.com>
Date:	Tue, 17 May 2011 07:46:25 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	Jens Axboe <axboe@...nel.dk>, Sitsofe Wheeler <sitsofe@...oo.com>,
	Borislav Petkov <bp@...en8.de>, Meelis Roos <mroos@...ux.ee>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kay Sievers <kay.sievers@...y.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND 2/3 v2.6.39-rc7] block: make disk_block_events()
 properly wait for work cancellation

This is pretty disgusting.

You're not using a real lock, and to compensate for that you use a
bloccking bit-lock hack. And to make that hack extra ugly, you define
the bit as a bitmask, and use the ilog2() macro to turn it into a bit
pos.

Horrid. Horrid.

Is there some fundamental reason why you cannot just turn the ev->lock
into a real semaphore (allowing blocking), and then doing the dwork
cancel under the semaphore - avoiding all the crazy bit-lock crud.

Or just _add_ a semaphore to the 'struct disk_events', for chrissake.

This is just too ugly to survive. And even if you fixed the ilog()
(hint: just define the bit, and then use (1u<<BIT) to define the
mask), it would be too ugly.

Don't do these kinds of ad-hock locks. They are WRONG.

              Linus

On Tue, May 17, 2011 at 3:28 AM, Tejun Heo <tj@...nel.org> wrote:
> disk_block_events() should guarantee that the event work is not in
> flight on return and once blocked it shouldn't issue further
> cancellations.
>
> Because there was no synchronization between the first blocker doing
> cancel_delayed_work_sync() and the following blockers, the following
> blockers could finish before cancellation was complete, which broke
> both guarantees - event work could be in flight and cancellation could
> happen after return.
>
> This bug triggered WARN_ON_ONCE() in disk_clear_events() reported in
> bug#34662.
>
>  https://bugzilla.kernel.org/show_bug.cgi?id=34662
>
> Fix it by introducing DISK_EVENT_CANCELING bit which is set by the
> first blocker while cancellation is in progress.  Further blockers
> wait until the bit is cleared by the first blocker.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Tested-by: Sitsofe Wheeler <sitsofe@...oo.com>
> Reported-by: Sitsofe Wheeler <sitsofe@...oo.com>
> Reported-by: Borislav Petkov <bp@...en8.de>
> Reported-by: Meelis Roos <mroos@...ux.ee>
> Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Jens Axboe <axboe@...nel.dk>
> Cc: Kay Sievers <kay.sievers@...y.org>
> ---
>  block/genhd.c |   36 +++++++++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)
>
> Index: work/block/genhd.c
> ===================================================================
> --- work.orig/block/genhd.c
> +++ work/block/genhd.c
> @@ -1371,7 +1371,7 @@ struct disk_events {
>        struct gendisk          *disk;          /* the associated disk */
>        spinlock_t              lock;
>
> -       int                     block;          /* event blocking depth */
> +       unsigned int            block;          /* event blocking depth */
>        unsigned int            pending;        /* events already sent out */
>        unsigned int            clearing;       /* events being cleared */
>
> @@ -1379,6 +1379,8 @@ struct disk_events {
>        struct delayed_work     dwork;
>  };
>
> +#define DISK_EVENT_CANCELING                   0x80000000U
> +
>  static const char *disk_events_strs[] = {
>        [ilog2(DISK_EVENT_MEDIA_CHANGE)]        = "media_change",
>        [ilog2(DISK_EVENT_EJECT_REQUEST)]       = "eject_request",
> @@ -1414,6 +1416,12 @@ static unsigned long disk_events_poll_ji
>        return msecs_to_jiffies(intv_msecs);
>  }
>
> +static int disk_block_wait_canceling(void *word)
> +{
> +       schedule();
> +       return 0;
> +}
> +
>  /**
>  * disk_block_events - block and flush disk event checking
>  * @disk: disk to block events for
> @@ -1438,12 +1446,34 @@ void disk_block_events(struct gendisk *d
>        if (!ev)
>                return;
>
> +       /*
> +        * Bump block count and set CANCELLING if we're the first blocker
> +        * and have to cancel the event work.
> +        */
>        spin_lock_irqsave(&ev->lock, flags);
> -       cancel = !ev->block++;
> +       if ((cancel = !ev->block++))
> +               ev->block |= DISK_EVENT_CANCELING;
>        spin_unlock_irqrestore(&ev->lock, flags);
>
> -       if (cancel)
> +       if (cancel) {
> +               /*
> +                * Cancel the event work, clear CANCELING and wake up
> +                * waiters.
> +                */
>                cancel_delayed_work_sync(&disk->ev->dwork);
> +
> +               spin_lock_irqsave(&ev->lock, flags);
> +               ev->block &= ~DISK_EVENT_CANCELING;
> +               spin_unlock_irqrestore(&ev->lock, flags);
> +               wake_up_bit(&ev->block, ilog2(DISK_EVENT_CANCELING));
> +       } else {
> +               /*
> +                * The first blocker might not have finished canceling the
> +                * event work.  Wait for CANCELING to clear.
> +                */
> +               wait_on_bit(&ev->block, ilog2(DISK_EVENT_CANCELING),
> +                           disk_block_wait_canceling, TASK_UNINTERRUPTIBLE);
> +       }
>  }
>
>  static void __disk_unblock_events(struct gendisk *disk, bool check_now)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/