lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMP5Xgfbgj__qCC6hC85py+TYVjDBjX+37UPZBnNb2xhKXKBmA@mail.gmail.com>
Date:	Fri, 22 Mar 2013 15:27:34 -0700
From:	Arve Hjønnevåg <arve@...roid.com>
To:	Eric Wong <normalperson@...t.net>
Cc:	linux-kernel@...r.kernel.org,
	Davide Libenzi <davidel@...ilserver.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	linux-fsdevel@...r.kernel.org
Subject: Re: [RFC v3 1/2] epoll: avoid spinlock contention with wfcqueue

On Fri, Mar 22, 2013 at 12:24 PM, Eric Wong <normalperson@...t.net> wrote:
...
> Perhaps just using epitem->ws and removing ep->ws can work.
>
> I think the following change to keep wakeup_source in sync with
> epi->state is sufficient to prevent suspend.
>
> But I'm not familiar with suspend.  Is it possible to suspend while
> a) spinning on a lock?
> b) holding a spinlock?
>

If you code cannot be preempted suspend will not complete. However, if
you drop the last wakeup_source that was active, you may trigger a new
suspend attempt, so it is best to avoid this.

> Since we avoid spinlocks in the main ep_poll_callback path, maybe the
> chance of entering suspend is reduced anyways since we may activate
> the ws sooner.
>
> What do you think?
>

I think you should make sure ep_poll_callback does not return without
an active wakeup_source if EPOLLWAKEUP is set.

> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e04175..531ad46 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -214,9 +214,6 @@ struct eventpoll {
>         /* RB tree root used to store monitored fd structs */
>         struct rb_root rbr;
>
> -       /* wakeup_source used when ep_send_events is running */
> -       struct wakeup_source *ws;
> -
>         /* The user that created the eventpoll descriptor */
>         struct user_struct *user;
>
> @@ -718,7 +715,6 @@ static void ep_free(struct eventpoll *ep)
>         mutex_unlock(&epmutex);
>         mutex_destroy(&ep->mtx);
>         free_uid(ep->user);
> -       wakeup_source_unregister(ep->ws);
>         kfree(ep);
>  }
>
> @@ -1137,12 +1133,6 @@ static int ep_create_wakeup_source(struct epitem *epi)
>         const char *name;
>         struct wakeup_source *ws;
>
> -       if (!epi->ep->ws) {
> -               epi->ep->ws = wakeup_source_register("eventpoll");
> -               if (!epi->ep->ws)
> -                       return -ENOMEM;
> -       }
> -
>         name = epi->ffd.file->f_path.dentry->d_name.name;
>         ws = wakeup_source_register(name);
>
> @@ -1390,22 +1380,6 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                 WARN_ON(state != EP_STATE_READY);
>                 wfcq_node_init(&epi->rdllink);
>
> -               /*
> -                * Activate ep->ws before deactivating epi->ws to prevent
> -                * triggering auto-suspend here (in case we reactive epi->ws
> -                * below).
> -                *
> -                * This could be rearranged to delay the deactivation of epi->ws
> -                * instead, but then epi->ws would temporarily be out of sync
> -                * with epi->state.
> -                */
> -               ws = ep_wakeup_source(epi);
> -               if (ws) {
> -                       if (ws->active)
> -                               __pm_stay_awake(ep->ws);
> -                       __pm_relax(ws);
> -               }
> -
>                 revents = ep_item_poll(epi, &pt);
>
>                 /*
> @@ -1419,7 +1393,6 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                             __put_user(epi->event.data, &uevent->data)) {
>                                 wfcq_enqueue_local(&ep->txlhead, &ep->txltail,
>                                                         &epi->rdllink);
> -                               ep_pm_stay_awake(epi);
>                                 if (!eventcnt)
>                                         eventcnt = -EFAULT;
>                                 break;
> @@ -1441,13 +1414,34 @@ static int ep_send_events(struct eventpoll *ep, bool *eavail,
>                                  */
>                                 wfcq_enqueue_local(&lthead, &lttail,
>                                                         &epi->rdllink);
> -                               ep_pm_stay_awake(epi);
>                                 continue;
>                         }
>                 }
>
>                 /*
> -                * reset item state for EPOLLONESHOT and EPOLLET
> +                * Deactivate the wakeup source before marking it idle.
> +                * The barrier implied by the spinlock in __pm_relax ensures
> +                * any ep_poll_callback callers running will see the
> +                * deactivated ws before epi->state == EP_STATE_IDLE.
> +                *
> +                * For EPOLLET, the event may still be merged into the one
> +                * that is currently on its way into userspace, but it has
> +                * always been the responsibility of userspace to trigger
> +                * EAGAIN on the file before it expects the item to appear
> +                * again in epoll_wait.
> +                *
> +                * Level Trigger never gets here, so the ws remains active.

I don't think this statement is true.

> +                *
> +                * EPOLLONESHOT will either be dropped by ep_poll_callback
> +                * or dropped the next time ep_send_events is called, so the
> +                * ws is irrelevant until it is hit by ep_modify
> +                */
> +               ws = ep_wakeup_source(epi);
> +               if (ws)
> +                       __pm_relax(ws);
> +
> +               /*
> +                * reset item state for EPOLLONESHOT and EPOLLET.
>                  * no barrier here, rely on ep->mtx release for write barrier
>                  */
>                 epi->state = EP_STATE_IDLE;
>
> --
> Eric Wong



-- 
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ