[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200224163835.08ab964483519052d7c2e39b@linux-foundation.org>
Date: Mon, 24 Feb 2020 16:38:35 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Jason Baron <jbaron@...mai.com>
Cc: dave@...olabs.net, rpenyaev@...e.de, linux-kernel@...r.kernel.org,
normalperson@...t.net, viro@...iv.linux.org.uk
Subject: Re: [PATCH] fs/epoll: make nesting accounting safe for -rt kernel
On Fri, 17 Jan 2020 14:16:47 -0500 Jason Baron <jbaron@...mai.com> wrote:
> Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
> ep_poll_safewake() can take several non-raw spinlocks after disabling
> pre-emption which is no no for the -rt kernel. So let's re-work how we
> determine the nesting level such that it plays nicely with -rt kernel.
"no no" isn't terribly informative, and knowledge of -rt's requirements
isn't widespread. Can we please spell this requirement out in full
detail, if only to spread the -rt education a bit?
> Let's introduce a 'nests' field in struct eventpoll that records the
> current nesting level during ep_poll_callback(). Then, if we nest again we
> can find the previous struct eventpoll that we were called from and
> increase our count by 1. The 'nests' field is protected by
> ep->poll_wait.lock.
>
> I've also moved napi_id field into a hole in struct eventpoll as part of
> introduing the nests field. This change reduces the struct eventpoll from
> 184 bytes to 176 bytes on x86_64 for the !CONFIG_DEBUG_LOCK_ALLOC
> production config.
>
> ...
>
> @@ -551,30 +557,43 @@ static int ep_call_nested(struct nested_calls *ncalls,
> */
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>
> -static DEFINE_PER_CPU(int, wakeup_nest);
> -
> -static void ep_poll_safewake(wait_queue_head_t *wq)
> +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
> {
> + struct eventpoll *ep_src;
> unsigned long flags;
> - int subclass;
> + u8 nests = 0;
>
> - local_irq_save(flags);
> - preempt_disable();
> - subclass = __this_cpu_read(wakeup_nest);
> - spin_lock_nested(&wq->lock, subclass + 1);
> - __this_cpu_inc(wakeup_nest);
> - wake_up_locked_poll(wq, POLLIN);
> - __this_cpu_dec(wakeup_nest);
> - spin_unlock(&wq->lock);
> - local_irq_restore(flags);
> - preempt_enable();
> + /*
> + * If we are not being call from ep_poll_callback(), epi is
> + * NULL and we are at the first level of nesting, 0. Otherwise,
> + * we are being called from ep_poll_callback() and if a previous
> + * wakeup source is not an epoll file itself, we are at depth
> + * 1 since the wakeup source is depth 0. If the wakeup source
> + * is a previous epoll file in the wakeup chain then we use its
> + * nests value and record ours as nests + 1. The previous epoll
> + * file nests value is stable since its already holding its
> + * own poll_wait.lock.
> + */
Similarly, it would be helpful if this comment were to explain that
this code exists for -rt's requirements, and to briefly describe what
that requirement is.
> + if (epi) {
> + if ((is_file_epoll(epi->ffd.file))) {
> + ep_src = epi->ffd.file->private_data;
> + nests = ep_src->nests;
> + } else {
> + nests = 1;
> + }
> + }
> + spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
> + ep->nests = nests + 1;
> + wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
> + ep->nests = 0;
> + spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
> }
Powered by blists - more mailing lists