lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250202170131.GA6507@redhat.com>
Date: Sun, 2 Feb 2025 18:01:32 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: K Prateek Nayak <kprateek.nayak@....com>,
	Manfred Spraul <manfred@...orfullife.com>,
	Christian Brauner <brauner@...nel.org>,
	David Howells <dhowells@...hat.com>,
	WangYuli <wangyuli@...ontech.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	"Gautham R. Shenoy" <gautham.shenoy@....com>,
	Swapnil Sapkal <swapnil.sapkal@....com>,
	Neeraj Upadhyay <Neeraj.Upadhyay@....com>
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still
 full

On 01/31, Linus Torvalds wrote:
>
> On Fri, 31 Jan 2025 at 01:50, K Prateek Nayak <kprateek.nayak@....com> wrote:
> >
> > On my 3rd Generation EPYC system (2 x 64C/128T), I see that on reverting
> > the changes on the above mentioned commit, sched-messaging sees a
> > regression up until the 8 group case which contains 320 tasks, however
> > with 16 groups (640 tasks), the revert helps with performance.
>
> I suspect that the extra wakeups just end up perturbing timing, and
> then you just randomly get better performance on that particular
> test-case and machine.
>
> I'm not sure this is worth worrying about, unless there's a real load
> somewhere that shows this regression.

Well yes, but the problem is that people seem to believe that hackbench
is the "real" workload, even in the "overloaded" case...

And if we do care about performance... Could you look at the trivial patch
at the end? I don't think {a,c,m}time make any sense in the !fifo case, but
as you explained before they are visible to fstat() so we probably shouldn't
remove file_accessed/file_update_time unconditionally.

This patch does help if I change hackbench to uses pipe2(O_NOATIME) instead
of pipe(). And in fact it helps even in the simplest case:

	static char buf[17 * 4096];

	static struct timeval TW, TR;

	int wr(int fd, int size)
	{
		int c, r;
		struct timeval t0, t1;

		gettimeofday(&t0, NULL);
		for (c = 0; (r = write(fd, buf, size)) > 0; c += r);
		gettimeofday(&t1, NULL);
		timeradd(&TW, &t1, &TW);
		timersub(&TW, &t0, &TW);

		return c;
	}

	int rd(int fd, int size)
	{
		int c, r;
		struct timeval t0, t1;

		gettimeofday(&t0, NULL);
		for (c = 0; (r = read(fd, buf, size)) > 0; c += r);
		gettimeofday(&t1, NULL);
		timeradd(&TR, &t1, &TR);
		timersub(&TR, &t0, &TR);

		return c;
	}

	int main(int argc, const char *argv[])
	{
		int fd[2], nb = 1, noat, loop, size;

		assert(argc == 4);
		noat = atoi(argv[1]) ? O_NOATIME : 0;
		loop = atoi(argv[2]);
		size = atoi(argv[3]);

		assert(pipe2(fd, noat) == 0);
		assert(ioctl(fd[0], FIONBIO, &nb) == 0);
		assert(ioctl(fd[1], FIONBIO, &nb) == 0);

		assert(size <= sizeof(buf));
		while (loop--)
			assert(wr(fd[1], size) == rd(fd[0], size));

		printf("TW = %lu.%03lu\n", TW.tv_sec, TW.tv_usec/1000);
		printf("TR = %lu.%03lu\n", TR.tv_sec, TR.tv_usec/1000);

		return 0;
	}


Now,

	/# for i in 1 2 3; do /host/tmp/test 0 10000 100; done
	TW = 7.692
	TR = 5.704
	TW = 7.930
	TR = 5.858
	TW = 7.685
	TR = 5.697
	/#
	/# for i in 1 2 3; do /host/tmp/test 1 10000 100; done
	TW = 6.432
	TR = 4.533
	TW = 6.612
	TR = 4.638
	TW = 6.409
	TR = 4.523

Oleg.
---

diff --git a/fs/pipe.c b/fs/pipe.c
index a3f5fd7256e9..14b2c0f8b616 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1122,6 +1122,9 @@ int create_pipe_files(struct file **res, int flags)
 		}
 	}
 
+	if (flags & O_NOATIME)
+		inode->i_flags |= S_NOCMTIME;
+
 	f = alloc_file_pseudo(inode, pipe_mnt, "",
 				O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT)),
 				&pipefifo_fops);
@@ -1134,7 +1137,7 @@ int create_pipe_files(struct file **res, int flags)
 	f->private_data = inode->i_pipe;
 	f->f_pipe = 0;
 
-	res[0] = alloc_file_clone(f, O_RDONLY | (flags & O_NONBLOCK),
+	res[0] = alloc_file_clone(f, O_RDONLY | (flags & (O_NONBLOCK | O_NOATIME)),
 				  &pipefifo_fops);
 	if (IS_ERR(res[0])) {
 		put_pipe_info(inode, inode->i_pipe);
@@ -1154,7 +1157,7 @@ static int __do_pipe_flags(int *fd, struct file **files, int flags)
 	int error;
 	int fdw, fdr;
 
-	if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT | O_NOTIFICATION_PIPE))
+	if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT | O_NOATIME | O_NOTIFICATION_PIPE))
 		return -EINVAL;
 
 	error = create_pipe_files(files, flags);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ