lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e813814e-7094-4673-bc69-731af065a0eb@amd.com>
Date: Mon, 24 Feb 2025 14:56:09 +0530
From: "Sapkal, Swapnil" <swapnil.sapkal@....com>
To: Oleg Nesterov <oleg@...hat.com>, Manfred Spraul
	<manfred@...orfullife.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
	Christian Brauner <brauner@...nel.org>, David Howells <dhowells@...hat.com>
CC: WangYuli <wangyuli@...ontech.com>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, K Prateek Nayak <kprateek.nayak@....com>,
	"Shenoy, Gautham Ranjal" <gautham.shenoy@....com>, <Neeraj.Upadhyay@....com>
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still
 full

Hello Oleg,

On 1/2/2025 7:37 PM, Oleg Nesterov wrote:
> wake_up(pipe->wr_wait) makes no sense if pipe_full() is still true after
> the reading, the writer sleeping in wait_event(wr_wait, pipe_writable())
> will check the pipe_writable() == !pipe_full() condition and sleep again.
> 
> Only wake the writer if we actually released a pipe buf, and the pipe was
> full before we did so.
> 

We saw hang in hackbench in our weekly regression testing on mainline 
kernel. The bisect pointed to this commit.

This patch avoids the unnecessary writer wakeup but I think there may be 
a subtle race due to which the writer is never woken up in certain cases.

On zen5 system with 2 sockets with 192C/384T each, I ran hackbench with 
16 groups or 32 groups. In 1 out of 20 runs, the race condition is 
occurring where the writer is not getting woken up and the benchmarks 
hangs. I tried reverting this commit and it again started working fine.

I also tried with
https://lore.kernel.org/all/20250210114039.GA3588@redhat.com/. After 
applying this patch, the frequency of hang is reduced to 1 in 100 times, 
but hang still
exists.

Whenever I compare the case where was_full would have been set but 
wake_writer was not set, I see the following pattern:

ret = 100 (Read was successful)
pipe_full() = 1
total_len = 0
buf->len != 0

total_len is computed using iov_iter_count() while the buf->len is the 
length of the buffer corresponding to tail(pipe->bufs[tail & mask].len).
Looking at pipe_write(), there seems to be a case where the writer can 
make progress when (chars && !was_empty) which only looks at 
iov_iter_count(). Could it be the case that there is still room in the 
buffer but we are not waking up the writer?

> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
> ---
>   fs/pipe.c | 19 ++++++++++---------
>   1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 12b22c2723b7..82fede0f2111 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -253,7 +253,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>   	size_t total_len = iov_iter_count(to);
>   	struct file *filp = iocb->ki_filp;
>   	struct pipe_inode_info *pipe = filp->private_data;
> -	bool was_full, wake_next_reader = false;
> +	bool wake_writer = false, wake_next_reader = false;
>   	ssize_t ret;
>   
>   	/* Null read succeeds. */
> @@ -264,14 +264,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>   	mutex_lock(&pipe->mutex);
>   
>   	/*
> -	 * We only wake up writers if the pipe was full when we started
> -	 * reading in order to avoid unnecessary wakeups.
> +	 * We only wake up writers if the pipe was full when we started reading
> +	 * and it is no longer full after reading to avoid unnecessary wakeups.
>   	 *
>   	 * But when we do wake up writers, we do so using a sync wakeup
>   	 * (WF_SYNC), because we want them to get going and generate more
>   	 * data for us.
>   	 */
> -	was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
>   	for (;;) {
>   		/* Read ->head with a barrier vs post_one_notification() */
>   		unsigned int head = smp_load_acquire(&pipe->head);
> @@ -340,8 +339,10 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>   				buf->len = 0;
>   			}
>   
> -			if (!buf->len)
> +			if (!buf->len) {
> +				wake_writer |= pipe_full(head, tail, pipe->max_usage);
>   				tail = pipe_update_tail(pipe, buf, tail);
> +			}
>   			total_len -= chars;
>   			if (!total_len)
>   				break;	/* common path: read succeeded */
> @@ -377,7 +378,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>   		 * _very_ unlikely case that the pipe was full, but we got
>   		 * no data.
>   		 */
> -		if (unlikely(was_full))
> +		if (unlikely(wake_writer))
>   			wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
>   		kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
>   
> @@ -390,15 +391,15 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>   		if (wait_event_interruptible_exclusive(pipe->rd_wait, pipe_readable(pipe)) < 0)
>   			return -ERESTARTSYS;
>   
> -		mutex_lock(&pipe->mutex);
> -		was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
> +		wake_writer = false;
>   		wake_next_reader = true;
> +		mutex_lock(&pipe->mutex);
>   	}
>   	if (pipe_empty(pipe->head, pipe->tail))
>   		wake_next_reader = false;
>   	mutex_unlock(&pipe->mutex);
>   
> -	if (was_full)
> +	if (wake_writer)
>   		wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
>   	if (wake_next_reader)
>   		wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
--
Thanks and Regards,
Swapnil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ