[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211116140252.GA348770@lothringen>
Date: Tue, 16 Nov 2021 15:02:52 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Mike Galbraith <efault@....de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Roman Penyaev <rpenyaev@...e.de>,
Davidlohr Bueso <dbueso@...e.de>,
Jason Baron <jbaron@...mai.com>,
Al Viro <viro@...iv.linux.org.uk>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: [RFC] How to fix eventpoll rwlock based priority inversion on
PREEMPT_RT?
Hi,
I'm iterating again on this topic, this time with the author of
the patch Cc'ed.
The following commit:
a218cc491420 (epoll: use rwlock in order to reduce ep_poll
callback() contention)
has changed the ep->lock into an rwlock. This can cause priority inversion
on PREEMPT_RT. Here is an example:
1) High priority task A waits for events on epoll_wait(), nothing shows up so
it goes to sleep for new events in the ep_poll() loop.
2) Lower prio task B brings new events in ep_poll_callback(), waking up A
while still holding read_lock(ep->lock)
3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
to wait for task B to release read_lock(ep->lock). Unfortunately there is
no priority inheritance when write_lock() is called on an rwlock that is
already read_lock'ed. So back to task B that may even be preempted by
yet another task before releasing read_lock(ep->lock).
Now how to solve this? Several possibilities:
== Delay the wake up after releasing the read_lock()? ==
That solves part of the problem only. If another event comes up
concurrently we are back to the original issue.
== Make rwlock more fair ? ==
Currently read_lock() only acquires the rtmutex if the lock is already
write-held (or write_lock() is waiting to acquire). So if read_lock() happens
after write_lock(), fairness is observed but if write_lock() happens after
read_lock(), priority inheritance doesn't happen.
I think there has been attempts to solve this by the past but some issues
arised (don't know the exact details, comments on rwbase_rt.c bring some clues).
== Convert the rwlock to RCU ? ==
Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
situation fits here because the rwlock is used the other way around:
the epoll consumer does the write_lock() and the producers do read_lock(). Then
concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
to handle racy modifications.
There are also list modifications on both side. There are added from the
producers and read and deleted (even re-added sometimes) on the consumer side.
Perhaps RCU could be used with keeping locking on the consumer side...
== Convert to llist ? ==
It's a possibility but some operations like single element deletion may be
costly because only llist_add() and llist_del_all() are atomic on llist.
!CONFIG_PREEMPT_RT might not be happy about it.
== Consider epoll not PREEMPT_RT friendly? ==
A last resort is to simply consider epoll is not RT-friendly and suggest
using more simple alternatives like poll()....
Any thoughts?
Powered by blists - more mailing lists