[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aUjXxSAD2-c4Ivy1@casper.infradead.org>
Date: Mon, 22 Dec 2025 05:31:50 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrii Nakryiko <andrii@...nel.org>,
Shaurya Rane <ssrane_b23@...vjti.ac.in>,
"Darrick J . Wong" <djwong@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Meta kernel team <kernel-team@...a.com>, bpf@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
syzbot+09b7d050e4806540153d@...kaller.appspotmail.com,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH bpf v2] lib/buildid: use __kernel_read() for sleepable
context
On Thu, Dec 18, 2025 at 09:58:43PM -0800, Shakeel Butt wrote:
> On Fri, Dec 19, 2025 at 04:07:51AM +0000, Matthew Wilcox wrote:
> > > I am assuming that __kernel_read() can return less data than the
> > > requested. Is that assumption incorrect?
> >
> > I think it can, but I don't think a second call will get any more data.
> > For example, it could hit EOF. What led you to think that calling it in
> > a loop was the right approach?
>
> I am kind of following the convention of a userspace application doing
> read() syscall i.e. repeatedly call read() until you hit an error or EOF
> in which case 0 will be returned or you successfully read the amount of
> data you want. I am handling negative error and 0 and for 0, I am
> returning -EIO as that would be unexpected end of an ELF file.
Oh, you sweet summer child. I hope Father Christmas leaves you an
extra special present in your stocking this week!
While it would be lovely to believe that userspace does that kind of loop,
it just doesn't. That's why mounting NFS filesystems with the "intr"
option (before I made it a no-op) was dangerous -- userspace just isn't
prepared for short reads. I mean, we're lucky if userspace even checks
for errors, let alone does this kind of advanced "oh, we got fewer bytes
than we wanted, keep trying" scheme.
A filesystem's ->read_iter() implementation can stop short of reading
all bytes requested if:
- We hit EIO. No amount of asking will return more bytes, the data's
just not there any more.
- We hit EOF. There's no more data to read.
- We're unable to access the buffer. That's only possible for user
buffers, not kernel buffers.
- We receive a fatal signal. I suppose there is the tiniest chance
that the I/O completes while we're processing the "we returned early"
loop, but in practice, the user has asked that we die now, and even
trying again is rude. Just die as quickly as we can.
I can't think of any other cases. It's just not allowed to return
short reads to userspace (other than EIO/EOF), and that drives all
the behaviour.
> Anyways the question is if __kernel_read() returns less amount of data
> than requested, should we return error instead of retrying? I looked
> couple of callers of __kernel_read() & kernel_read(). Some are erroring
> out if received data is less than requested (e.g. big_key_read()) and
> some are calling in the loop (e.g. kernel_read_file()).
kernel_read_file() is wrong. Thanks for reporting it; I'll send a patch.
Powered by blists - more mailing lists