linux-kernel - Re: [0/3] net: Kill skb_copy_datagram_const

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141104054513.GB7996@ZenIV.linux.org.uk>
Date:	Tue, 4 Nov 2014 05:45:13 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	David Miller <davem@...emloft.net>
Cc:	herbert@...dor.apana.org.au, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, bcrl@...ck.org
Subject: Re: [0/3] net: Kill skb_copy_datagram_const_iovec

On Mon, Nov 03, 2014 at 03:05:53PM -0500, David Miller wrote:

> I'll see if I can make some progress converting the networking over
> to iov_iter.  It can't be that difficult... albeit perhaps a little
> time consuming.

FWIW, I have a queue that got started back in April; basically, the plan
of attack was
	* separate kernel-side and userland msghdr.
	* localize ->msg_iov uses - most of that gets taken care of by
several new helpers, as in
    new helper: skb_copy_datagram_msg()
    
    Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
    are passing it msg->msg_iov as iovec.  Provide a trivial wrapper that
    takes msg as argument instead of iovec.
and several like that (the numbers in the above are probably incorrect these
days - it was done more than half a year ago).
	* switch kernel-side msghdr to iov_iter.  That means diverging
layouts; it's really not hard, since we have copying of msghdr from
userland already localized.  Initially - just a mechanical conversion
(i.e. direct uses of iov_iter fields instead of ->msg_iov/->msg_iovlen;
note that after the introduction of wrappers the number of such places
is very much reduced).
	* start converting those relatively few places to iov_iter primitives.

And that's where it got stalled, since we have to deal with expectations
of callers.  Syscall ones are trivial; that's not a problem.  Unfortunately,
there are kernel_{send,recv}msg() users, and those do care about the state the
iovec is left in.  Strictly speaking, the state of iovec after e.g.
->sendmsg() is undefined.  And it's not just protocol-dependent - unless
I'm seriously misreading it, tcp_sendmsg() ends up modifying iovec in case
when it hits tcp_send_rcvq(), while in the normal case it leaves iovec
unmodified.  So in general you need to feed ->{send,recv}msg() a throwaway copy
of iovec.  Leads to wonders like
        /* NB we can't trust socket ops to either consume our iovs
         * or leave them alone. */
        LASSERT (niov > 0);

        for (nob = i = 0; i < niov; i++) {
                scratchiov[i] = iov[i];
                nob += scratchiov[i].iov_len;
        }
        LASSERT (nob <= conn->ksnc_rx_nob_wanted);

        rc = kernel_recvmsg(conn->ksnc_sock, &msg,
                (struct kvec *)scratchiov, niov, nob, MSG_DONTWAIT);
etc.  However, there are places that don't bother and do this:
        while (total_rx < data) {
                rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
                                        (data - total_rx), MSG_WAITALL);
                if (rx_loop <= 0) {
                        pr_debug("rx_loop: %d total_rx: %d\n",
                                rx_loop, total_rx); 
                        return rx_loop;
                }
                total_rx += rx_loop;
                pr_debug("rx_loop: %d, total_rx: %d, data: %d\n",
                                rx_loop, total_rx, data);
        }
(that's iscsit_do_rx_data()).  Maybe it's a bug; maybe it's relying on
specific behaviour of the protocol known to be used - this code clearly
expects recvmsg to advance iovec, which seems to depend only on the
protocol.  At the moment.  In any case, it's very brittle...

Hell knows; I hadn't finished digging through that zoo - got sidetracked back
then.  *IF* all such places either use a throwaway copy or assume that iovec
gets modified, we can do the following: switch the access to iovecs to
iov_iter primitives, with the first kind of callers creating a throwaway
iov_iter and the second just feeding the same iov_iter to e.g.
kernel_recvmsg().  iovec will remain constant, iov_iter will be advanced.
Moreover, in a lot of cases of first kind will be able to get rid of
throwaway iov_iter (and of manually advancing it), effectively converting
to the second one.

If we have places that currently rely on iovec remaining unchanged (i.e.
manually advancing it after kernel_{send,recv}msg()), the series will be
more painful ;-/  I very much hope that no such places exist...

FWIW, there is also a tactical question that needs to be dealt with.  We
can, of course, start with renaming the "kernel-side" (i.e. post
copy_msghdr_from_user()/get_compat_msghdr()) to struct kmsghdr.  OTOH,
that's a _lot_ of churn for very little reason - most of the instances
in the tree are of that kind.  So I did it the other way round - introduced
struct user_msghdr (only in linux/socket.h; note that we do *not* have
struct msghdr in uabi/linux/socket.h, or anywhere else in uabi/*),
made the syscalls take pointers to it and (initially) rely upon the identical
layouts in copy_msghdr_from_user(); once we put iov_iter into kernel-side
msghdr, we'll just do it like get_compat_msghdr() does.

Is that acceptable?  It would greatly reduce the amount of churn in net/* -
we don't need to pass iov_iter separately and most of the functions in
the middle of call chains are completely unchanged.  Only the originators
of ->sendmsg()/->recvmsg() and the places doing actual data copying
need to be touched.  OTOH, it makes for kernel struct msghdr looking
odd - instead of normal ->msg_iov and ->msg_iovlen it would have
->msg_iov_iter, with ->sendmsg()/->recvmsg() callers needing to set it
up...  OTTH, the things *are* odd from userland programmer POV - sendmsg(2)
and recvmsg(2) leave the iovec unchanged, and having it changed unpredicatably
in the kernel-side counterparts seems to make for a nasty trap.  Certainly
makes for a bunch of nasty comments in the code using those...

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/