[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7gyxkpozyno7hl2jz5k2v2k5yo6gpvr3i5whqrgqlc5eahxvjz@p7p2a2aezsbt>
Date: Mon, 22 Dec 2025 11:41:07 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrii Nakryiko <andrii@...nel.org>, Shaurya Rane <ssrane_b23@...vjti.ac.in>,
"Darrick J . Wong" <djwong@...nel.org>, Christoph Hellwig <hch@...radead.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Meta kernel team <kernel-team@...a.com>, bpf@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
syzbot+09b7d050e4806540153d@...kaller.appspotmail.com, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH bpf v2] lib/buildid: use __kernel_read() for sleepable
context
On Mon, Dec 22, 2025 at 05:31:50AM +0000, Matthew Wilcox wrote:
> On Thu, Dec 18, 2025 at 09:58:43PM -0800, Shakeel Butt wrote:
> > On Fri, Dec 19, 2025 at 04:07:51AM +0000, Matthew Wilcox wrote:
> > > > I am assuming that __kernel_read() can return less data than the
> > > > requested. Is that assumption incorrect?
> > >
> > > I think it can, but I don't think a second call will get any more data.
> > > For example, it could hit EOF. What led you to think that calling it in
> > > a loop was the right approach?
> >
> > I am kind of following the convention of a userspace application doing
> > read() syscall i.e. repeatedly call read() until you hit an error or EOF
> > in which case 0 will be returned or you successfully read the amount of
> > data you want. I am handling negative error and 0 and for 0, I am
> > returning -EIO as that would be unexpected end of an ELF file.
>
> Oh, you sweet summer child. I hope Father Christmas leaves you an
> extra special present in your stocking this week!
>
> While it would be lovely to believe that userspace does that kind of loop,
> it just doesn't. That's why mounting NFS filesystems with the "intr"
> option (before I made it a no-op) was dangerous -- userspace just isn't
> prepared for short reads. I mean, we're lucky if userspace even checks
> for errors, let alone does this kind of advanced "oh, we got fewer bytes
> than we wanted, keep trying" scheme.
>
> A filesystem's ->read_iter() implementation can stop short of reading
> all bytes requested if:
>
> - We hit EIO. No amount of asking will return more bytes, the data's
> just not there any more.
> - We hit EOF. There's no more data to read.
> - We're unable to access the buffer. That's only possible for user
> buffers, not kernel buffers.
> - We receive a fatal signal. I suppose there is the tiniest chance
> that the I/O completes while we're processing the "we returned early"
> loop, but in practice, the user has asked that we die now, and even
> trying again is rude. Just die as quickly as we can.
>
> I can't think of any other cases. It's just not allowed to return
> short reads to userspace (other than EIO/EOF), and that drives all
> the behaviour.
Thanks for the explanation. Is calling kernel_read() again after it
returned less amount of data unsafe or unnecessary?
>
> > Anyways the question is if __kernel_read() returns less amount of data
> > than requested, should we return error instead of retrying? I looked
> > couple of callers of __kernel_read() & kernel_read(). Some are erroring
> > out if received data is less than requested (e.g. big_key_read()) and
> > some are calling in the loop (e.g. kernel_read_file()).
>
> kernel_read_file() is wrong. Thanks for reporting it; I'll send a patch.
There are couple more like do_read() and do_verify() in
drivers/usb/gadget/function/f_mass_storage.c. There might be more but I
have not looked into every caller.
Powered by blists - more mailing lists