[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj8V1v6QmJ_8X6zznautRq29tXMYzxorOniFx4NtxRE1A@mail.gmail.com>
Date: Mon, 10 Feb 2025 09:37:47 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>, Christian Brauner <brauner@...nel.org>,
Jeff Layton <jlayton@...nel.org>, David Howells <dhowells@...hat.com>,
"Gautham R. Shenoy" <gautham.shenoy@....com>, Mateusz Guzik <mjguzik@...il.com>,
Neeraj Upadhyay <Neeraj.Upadhyay@....com>, Oliver Sang <oliver.sang@...el.com>,
Swapnil Sapkal <swapnil.sapkal@....com>, WangYuli <wangyuli@...ontech.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] pipe: change pipe_write() to never add a zero-sized buffer
On Mon, 10 Feb 2025 at 09:22, Oleg Nesterov <oleg@...hat.com> wrote:
>
> + int avail = PAGE_SIZE - offset;
>
> - if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
> - offset + chars <= PAGE_SIZE) {
> + if (avail && (buf->flags & PIPE_BUF_FLAG_CAN_MERGE)) {
> ret = pipe_buf_confirm(pipe, buf);
> if (ret)
> goto out;
>
> + chars = min_t(ssize_t, chars, avail);
If I read this correctly, this patch is horribly broken.
You can't do partial writes. Pipes have one very core atomicity
guarantee: from the man-pages:
PIPE_BUF
POSIX.1 says that writes of less than PIPE_BUF bytes must be
atomic: the output data is written to the pipe as a contiguous
sequence. Writes of more than PIPE_BUF bytes may be nonatomic:
the kernel may interleave the data with data written by other
processes. POSIX.1 requires PIPE_BUF to be at least 512 bytes.
(On Linux, PIPE_BUF is 4096 bytes.)
IOW, that whole "try to write as many chars as there is room" is very
very broken. You have to write all or nothing.
So you can't (say) first write just 50 bytes of a 100-byte pipe write
because it fits in the last buffer, and then wait for another buffer
to become free to write the rest. Not just because another writer
might come in and start mixing in data, but because *readers* may well
expect to get 100 bytes or nothing.
And to make matters worse, you'll never notice the bug until something
breaks very subtly (unless we happen to have a good test-case for this
somewhere - there might be a test for this in LTP).
And yes, this is actually something I know for a fact that people
depend on. Lots of traditional UNIX "send packet commands over pipes"
programs around, which expect the packets to be atomic.
So things *will* break, but it might take a while before you hit just
the right race condition for things to go south, and the errors migth
end up being very non-obvious indeed.
Note that the initial
chars = total_len & (PAGE_SIZE-1);
before the whole test for "can we merge" is fine, because if total_len
is larger than a page, it's no longer a write we need to worry about
atomicity with.
Maybe we should add a comment somewhere about this.
Linus
Powered by blists - more mailing lists