[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiJFsu70BqrgxtoAfMHeJVJMfsWzQ42PXFduGNhFSVGDA@mail.gmail.com>
Date: Fri, 24 Nov 2023 10:25:15 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Christian Brauner <brauner@...nel.org>,
Omar Sandoval <osandov@...com>,
David Howells <dhowells@...hat.com>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] vfs fixes
On Fri, 24 Nov 2023 at 02:28, Christian Brauner <brauner@...nel.org> wrote:
>
> * Fix a bug introduced with the iov_iter rework from last cycle.
>
> This broke /proc/kcore by copying too much and without the correct
> offset.
Ugh. I think the whole /proc/kcore vmalloc handling is just COMPLETELY broken.
It does this:
/*
* vmalloc uses spinlocks, so we optimistically try to
* read memory. If this fails, fault pages in and try
* again until we are done.
*/
while (true) {
read += vread_iter(iter, src, left);
if (read == tsz)
break;
src += read;
left -= read;
if (fault_in_iov_iter_writeable(iter, left)) {
ret = -EFAULT;
goto out;
}
}
and that is just broken beyond words for two totally independent reasons:
(a) vread_iter() looks like it can fail because of not having a
source, and return 0 (I dunno - it seems to try to avoid that, but it
all looks pretty dodgy)
At that point fault_in_iov_iter_writeable() will try to fault
in the destination, which may work just fine, but if the source was
the problem, you'd have an endless loop.
(b) That "read += X" is completely broken anyway. It should be just a
"=". So that whole loop is crazy broken, and only works for the case
where you get it all in one go. This code is crap.
Now, I think it all works in practice for one simple reason: I doubt
anybody uses this (and it looks like the callees in the while loop try
very hard to always fill the whole area - maybe people noticed the
first bug and tried to work around it that way).
I guess there is at least one test program, but it presumably doesn't
trigger or care about the bugs here.
But I think we can get rid of this all, and just deal with the
KCORE_VMALLOC case exactly the same way we already deal with VMEMMAP
and TEXT: by just doing copy_from_kernel_nofault() into a bounce
buffer, and then doing a regular _copy_to_iter() or whatever.
NOTE! I looked at the code, and threw up in my mouth a little, and
maybe I missed something. Maybe it all works fine. But Omar - since
you found the original problem, may I implore you to test this
attached patch?
I just like how the patch looks:
6 files changed, 1 insertion(+), 368 deletions(-)
and those 350+ deleted lines really looked disgusting to me.
This patch is on top of the pull I did, because obviously the fix in
that pull was correct, I just think we should go further and get rid
of this whole mess entirely.
Linus
View attachment "patch.diff" of type "text/x-patch" (11981 bytes)
Powered by blists - more mailing lists