[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whUWjLqe1=4O5B=PwfhwxUDqg7C7b0Yq50+bG-Jtvov6Q@mail.gmail.com>
Date: Fri, 29 Jan 2021 12:02:29 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hans de Goede <hdegoede@...hat.com>,
Christoph Hellwig <hch@....de>,
Jiufei Xue <jiufei.xue@...ux.alibaba.com>,
Miklos Szeredi <mszeredi@...hat.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [5.11 regression] "tty: implement write_iter" breaks TIOCCONS
On Fri, Jan 29, 2021 at 11:17 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Fri, Jan 29, 2021 at 10:31 AM Hans de Goede <hdegoede@...hat.com> wrote:
> >
> > You are using Fedora now a days, right ? In that case you should be
> > able to reproduce this yourself (depending on how custom your kernel
> > setup is) if you are using the standard Fedora initrd generated by
> > dracut and have "rhgb" on your kernel cmdline, then you can check
> > for this problem by doing:
>
> Thanks, I can see it, that should make it much easier to figure out.
Ahh, interesting.
It turns out that the problem isn't actually really in the tty layer,
it's that vfs_iocb_iter_write() is very very subtly buggy.
So the tty layer "trivial" conversion from using "vfs_write()" - for
the old redirected tty_write() call - to using "vfs_iocb_iter_write()"
caused problems.
Why? Because both vfs_write() and vfs_iocb_iter_write() take the
target "struct file *file" as an argument, but vfs_iocb_iter_write()
doesn't actually *use* that target file!
Well, to be specific, it does actually use the target file pointer for
two things:
- the security checks
- to pick the actual ->write_iter function.
But once you actually call ->write_iter() to do the IO, the 'file'
pointer isn't actually passed down at all, and the write_iter()
function depends not on 'file', but on 'iocb->ki_filp".
In other words, vfs_iocb_iter_write() is completely broken, because it
will do the preliminary work using one 'struct file *', but then do
the actual IO using _another_ 'struct file *' entirely.
In the case of the console redirect code, that meant that the
"redirect" never actually redirected anything, it really just called
tty_write() with the original iocb, which used the original target
file pointer.
Let's just say that I stared at those tty changes for a while, saying
"there is no *POSSIBLE* way that introduces a bug". And yeah, the tty
changes themselves were actually not the real culprit.
Of course, there is only one other user of vfs_iocb_iter_write() -
ovlfs - and that one fills in the iocb with the same file pointer that
it uses as the first argument, so nobody has ever noticed this oddity
before.
The function has been buggy like this since the very first
implementation, and vfs_iocb_iter_read() has the exact same issue.
It's fairly easy to work around in this in the tty layer by just
avoiding that function entirely, so I'll cook up a patch to do that.
But I'm adding the appropriate people to the participants here because
this really is very subtle if you ever hit it.
It might be best to just remove the "struct file *file" argument from
vfs_iocb_iter_{read,write}(), because it really is very wrong to use
anything but iocb.ki_file, and it's really subtle.
Linus
Powered by blists - more mailing lists