lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Feb 2020 16:38:35 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Jason Baron <jbaron@...mai.com>
Cc:     dave@...olabs.net, rpenyaev@...e.de, linux-kernel@...r.kernel.org,
        normalperson@...t.net, viro@...iv.linux.org.uk
Subject: Re: [PATCH] fs/epoll: make nesting accounting safe for -rt kernel

On Fri, 17 Jan 2020 14:16:47 -0500 Jason Baron <jbaron@...mai.com> wrote:

> Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
> ep_poll_safewake() can take several non-raw spinlocks after disabling
> pre-emption which is no no for the -rt kernel. So let's re-work how we
> determine the nesting level such that it plays nicely with -rt kernel.

"no no" isn't terribly informative, and knowledge of -rt's requirements
isn't widespread.  Can we please spell this requirement out in full
detail, if only to spread the -rt education a bit?

> Let's introduce a 'nests' field in struct eventpoll that records the
> current nesting level during ep_poll_callback(). Then, if we nest again we
> can find the previous struct eventpoll that we were called from and
> increase our count by 1. The 'nests' field is protected by
> ep->poll_wait.lock.
> 
> I've also moved napi_id field into a hole in struct eventpoll as part of
> introduing the nests field. This change reduces the struct eventpoll from
> 184 bytes to 176 bytes on x86_64 for the !CONFIG_DEBUG_LOCK_ALLOC
> production config.
> 
> ...
>
> @@ -551,30 +557,43 @@ static int ep_call_nested(struct nested_calls *ncalls,
>   */
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  
> -static DEFINE_PER_CPU(int, wakeup_nest);
> -
> -static void ep_poll_safewake(wait_queue_head_t *wq)
> +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
>  {
> +	struct eventpoll *ep_src;
>  	unsigned long flags;
> -	int subclass;
> +	u8 nests = 0;
>  
> -	local_irq_save(flags);
> -	preempt_disable();
> -	subclass = __this_cpu_read(wakeup_nest);
> -	spin_lock_nested(&wq->lock, subclass + 1);
> -	__this_cpu_inc(wakeup_nest);
> -	wake_up_locked_poll(wq, POLLIN);
> -	__this_cpu_dec(wakeup_nest);
> -	spin_unlock(&wq->lock);
> -	local_irq_restore(flags);
> -	preempt_enable();
> +	/*
> +	 * If we are not being call from ep_poll_callback(), epi is
> +	 * NULL and we are at the first level of nesting, 0. Otherwise,
> +	 * we are being called from ep_poll_callback() and if a previous
> +	 * wakeup source is not an epoll file itself, we are at depth
> +	 * 1 since the wakeup source is depth 0. If the wakeup source
> +	 * is a previous epoll file in the wakeup chain then we use its
> +	 * nests value and record ours as nests + 1. The previous epoll
> +	 * file nests value is stable since its already holding its
> +	 * own poll_wait.lock.
> +	 */

Similarly, it would be helpful if this comment were to explain that
this code exists for -rt's requirements, and to briefly describe what
that requirement is.

> +	if (epi) {
> +		if ((is_file_epoll(epi->ffd.file))) {
> +			ep_src = epi->ffd.file->private_data;
> +			nests = ep_src->nests;
> +		} else {
> +			nests = 1;
> +		}
> +	}
> +	spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
> +	ep->nests = nests + 1;
> +	wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
> +	ep->nests = 0;
> +	spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
>  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ