lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170114012428.GX1555@ZenIV.linux.org.uk>
Date:   Sat, 14 Jan 2017 01:24:28 +0000
From:   Al Viro <viro@...IV.linux.org.uk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Alan J. Wylie" <alan@...ie.me.uk>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn

On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:

> EXCEPT.
> 
> I don't think "i->iov_offset" is actually correct. If you truncated
> the whole thing, you should have cleared iov_offset too, and that
> never happened. So truncating everything will not empty the buffers
> for us, we'll still get to that "if (off)" case and have nrbufs=1.
> 
> So I'd expect something like
> 
>     if (!size)
>         i->iov_offset = 0;
> 
> in pipe_advance(), in order to really free all buffers for that case. No?

Why would advance by 0 change ->iov_offset here?

> Or is there some guarantee that iov_offset was already zero there? I
> don't see it, but I'm batting zero on this code...

It was zero initially (see iov_iter_pipe()).  It was not affected by
iov_iter_get_pages() and friends.  If there was copy_to_iter/zero_iter/
copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
should bloody well stay such.

Note that the normal case is something like O_DIRECT read grabbing pages,
doing a bunch of IO into such and, once it's done, iov_iter_advance()
for the actual amount read being done by
                retval = mapping->a_ops->direct_IO(iocb, &data);
                if (retval >= 0) {
                        iocb->ki_pos += retval;
                        iov_iter_advance(iter, retval);
                }
which advances iter past the actually read data.  If we proceed to
fall back to the do_generic_file_read(), it will do a bunch of
copy_page_to_iter() to the points past that.

default_file_splice_read() pretty much imitates O_DIRECT read in that
respect, with kernel_readv() being a counterpart of direct_IO.

generic_file_splice_read() is another PIPE_ITER user; that one really
calls ->read_iter().  On any non-error (including short reads) it will
not bother with iov_iter_advance() at all - ->read_iter() should've
called that already, if at all needed.

On error it does use iov_iter_advance(), pretty much as a way to
trigger pipe_truncate().  There we directly reset ->iov_offset to 0
and ->idx to its original value.  Strictly speaking, we could live
without calling it there at all - normally ->read_iter() instances follow
the usual "if you'd managed to read something, report a short read,
not an error" approach.  None of the exceptions are good candidates for
use of generic_file_splice_read() anyway.

However, theoretically it is possible that ->read_iter() instance does
successful copy_to_iter() and then decides to return an error.  This
        } else if (ret < 0) {
                to.idx = idx;
                to.iov_offset = 0;
                iov_iter_advance(&to, 0); /* to free what was emitted */
in generic_file_splice_read() catches any such cases.  Here the call
of iov_iter_advance(&to, 0) is guaranteed to be equivalent to
pipe_truncate(&to); we might make the latter non-static and call it
directly, but I'm not sure it's worth doing.
 
> Can you please explain when you are once more awake..

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ