[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523143132.i_YkS3R3@linutronix.de>
Date: Fri, 23 May 2025 16:31:32 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Nam Cao <namcao@...utronix.de>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
John Ogness <john.ogness@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
linux-rt-users@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
Martin Karsten <mkarsten@...terloo.ca>,
Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <frederic@...nel.org>,
Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH v2] eventpoll: Fix priority inversion problem
On 2025-05-23 08:11:04 [+0200], Nam Cao wrote:
> @@ -867,10 +837,25 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
>
> rb_erase_cached(&epi->rbn, &ep->rbr);
>
> - write_lock_irq(&ep->lock);
> - if (ep_is_linked(epi))
> - list_del_init(&epi->rdllink);
> - write_unlock_irq(&ep->lock);
> + /*
> + * ep->mtx is held, which means no waiter is touching the ready list. This item is also no
> + * longer being added. Therefore, the ready flag can only mean one thing: this item is on
> + * the ready list.
> + */
> + if (smp_load_acquire(&epi->ready)) {
> + put_back_last = NULL;
> + while (true) {
> + struct llist_node *n = llist_del_first(&ep->rdllist);
> +
> + if (&epi->rdllink == n || WARN_ON(!n))
> + break;
> + if (!put_back_last)
> + put_back_last = n;
> + llist_add(n, &put_back);
put_back is local, you cam use __llist_add()
You could llist_del_all() and then you could use the non-atomic
operations for the replacement. But you need to walk the whole list to
know the first and last for the batch.
the "wait" perf bench throws in "16320" items. Avoiding
llist_del_first() makes hardly a different to, worse, to slight
improvement. Since it is all random it depends when the atomic operation
outweighs
The ctl case gets worse with this approach. On average it has to iterate
over 45% items to find the right item and it adds less than 200 items.
So the atomic does not outweigh the while iteration.
> + }
> + if (put_back_last)
> + llist_add_batch(put_back.first, put_back_last, &ep->rdllist);
> + }
>
> wakeup_source_unregister(ep_wakeup_source(epi));
> /*
> @@ -1867,19 +1767,20 @@ static int ep_send_events(struct eventpoll *ep,
> init_poll_funcptr(&pt, NULL);
>
> mutex_lock(&ep->mtx);
> - ep_start_scan(ep, &txlist);
>
> - /*
> - * We can loop without lock because we are passed a task private list.
> - * Items cannot vanish during the loop we are holding ep->mtx.
> - */
> - list_for_each_entry_safe(epi, tmp, &txlist, rdllink) {
> + while (res < maxevents) {
> struct wakeup_source *ws;
> + struct llist_node *n;
> __poll_t revents;
>
> - if (res >= maxevents)
> + n = llist_del_first(&ep->rdllist);
> + if (!n)
> break;
>
> + epi = llist_entry(n, struct epitem, rdllink);
> + llist_add(n, &txlist);
txlist is local, you can use __list_add()
> + smp_store_release(&epi->ready, false);
> +
> /*
> * Activate ep->ws before deactivating epi->ws to prevent
> * triggering auto-suspend here (in case we reactive epi->ws
Sebastian
Powered by blists - more mailing lists