linux-kernel - Re: wakeup_pipe_readers/writers() && pipe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20250107172512.GB29771@redhat.com>
Date: Tue, 7 Jan 2025 18:25:12 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Manfred Spraul <manfred@...orfullife.com>,
	Christian Brauner <brauner@...nel.org>,
	David Howells <dhowells@...hat.com>,
	WangYuli <wangyuli@...ontech.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()

On 01/06, Linus Torvalds wrote:
>
> On Mon, 6 Jan 2025 at 11:34, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > 1. pipe_read() says
> >
> >          * But when we do wake up writers, we do so using a sync wakeup
> >          * (WF_SYNC), because we want them to get going and generate more
> >          * data for us.
> >
> > OK, WF_SYNC makes sense if pipe_read() or pipe_write() is going to do wait_event()
> > after wake_up(). But wake_up_interruptible_sync_poll() looks at bit misleading if
> > we are going to wakeup the writer or next_reader before return.
>
> This heuristic has always been a bit iffy. And honestly, I think it's
> been driven by benchmarks that aren't necessarily always realistic (ie
> for ping-pong benchmarks, the best behavior is often to stay on the
> same CPU and just schedule between the reader/writer).

Agreed. But my question was not about performance, I just tried to
understand this logic. So in the case of

	wake_up_interruptible_sync_poll(wr_wait);
	wait_event_interruptible_exclusive(wr_read);

WF_SYNC is understandable, "stay on the same CPU" looks like the right
thing, and "_sync_" matches the comment above.

But if we are going to return, wake_up_interruptible_sync_poll() looks
a bit misleading to me.

> > 2. I can't understand this code in pipe_write()
> >
> >         if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) {
> >                 int err = file_update_time(filp);
> >                 if (err)
> >                         ret = err;
> >                 sb_end_write(file_inode(filp)->i_sb);
> >         }
> >
> >         - it only makes sense in the "fifo" case, right? When
> >           i_sb->s_magic != PIPEFS_MAGIC...
>
> I think we've done it for regular pipes too. You can see it with
> 'fstat()', after all.

Ah, indeed, thanks for correcting me...

And thanks for your other explanations. Again, it is not that I thought
this needs changes, just I was a bit confused. In particular by

	err = file_update_time();
	if (err)
		ret = err;

which doesn't match the usage of file_accessed() in pipe_read().

Oleg.