[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <qsehsgqnti4csvsg2xrrsof4qm4smhdhv6s4v4twspf76bp3jo@2mpz5xtqhmgt>
Date: Wed, 26 Feb 2025 14:18:59 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: "Sapkal, Swapnil" <swapnil.sapkal@....com>,
Manfred Spraul <manfred@...orfullife.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
Christian Brauner <brauner@...nel.org>, David Howells <dhowells@...hat.com>,
WangYuli <wangyuli@...ontech.com>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
K Prateek Nayak <kprateek.nayak@....com>, "Shenoy, Gautham Ranjal" <gautham.shenoy@....com>,
Neeraj.Upadhyay@....com
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still
full
On Mon, Feb 24, 2025 at 03:24:32PM +0100, Oleg Nesterov wrote:
> On 02/24, Sapkal, Swapnil wrote:
> > Whenever I compare the case where was_full would have been set but
> > wake_writer was not set, I see the following pattern:
> >
> > ret = 100 (Read was successful)
> > pipe_full() = 1
> > total_len = 0
> > buf->len != 0
> >
> > total_len is computed using iov_iter_count() while the buf->len is the
> > length of the buffer corresponding to tail(pipe->bufs[tail & mask].len).
> > Looking at pipe_write(), there seems to be a case where the writer can make
> > progress when (chars && !was_empty) which only looks at iov_iter_count().
> > Could it be the case that there is still room in the buffer but we are not
> > waking up the writer?
>
> I don't think so, but perhaps I am totally confused.
>
> If the writer sleeps on pipe->wr_wait, it has already tried to write into
> the pipe->bufs[head - 1] buffer before the sleep.
>
> Yes, the reader can read from that buffer, but this won't make it more "writable"
> for this particular writer, "PAGE_SIZE - buf->offset + buf->len" won't be changed.
While I think the now-removed wakeup was indeed hiding a bug, I also
think the write thing pointed out above is a fair point (orthogonal
though).
The initial call to pipe_write allows for appending to an existing page.
However, should the pipe be full, the loop which follows it insists on
allocating a new one and waits for a slot, even if ultimately *there is*
space now.
The hackbench invocation used here passes around 100 bytes.
Both readers and writers do rounds over pipes issuing 100 byte-sized
ops.
Suppose the pipe does not have space to hold the extra 100 bytes. The
writer goes to sleep and waits for the tail to move. A reader shows up,
reads 100 bytes (now there is space!) but since the current buf was not
depleted it does not mess with the tail.
The bench spawns tons of threads, ensuring there is a lot of competition
for the cpu time. The reader might get just enough time to largely
deplete the pipe to a point where there is only one buf in there with
space in it. Should pipe_write() be invoked now it would succeed
appending to a page. But if the writer was already asleep, it is going
to insist on allocating a new page.
As for the bug, I don't see anything obvious myself.
However, I think there are 2 avenues which warrant checking.
Sapkal, if you have time, can you please boot up the kernel which is
more likely to run into the problem and then run hackbench as follows:
1. with 1 fd instead of 20:
/usr/bin/hackbench -g 16 -f 1 --threads --pipe -l 100000 -s 100
2. with a size which divides 4096 evenly (e.g., 128):
/usr/bin/hackbench -g 1 -f 20 --threads --pipe -l 100000 -s 128
Powered by blists - more mailing lists