[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF=yD-K07q_ygjRrsau3fPWX4==WPjEtZN1y3eZUTABYaG0vWg@mail.gmail.com>
Date: Wed, 20 Sep 2023 21:41:38 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: David Howells <dhowells@...hat.com>, netdev@...r.kernel.org
Cc: syzbot+62cbf263225ae13ff153@...kaller.appspotmail.com,
Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>,
bpf@...r.kernel.org, syzkaller-bugs@...glegroups.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()
On Wed, Sep 20, 2023 at 9:54 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> David Howells wrote:
> > Including the transhdrlen in length is a problem when the packet is
> > partially filled (e.g. something like send(MSG_MORE) happened previously)
> > when appending to an IPv4 or IPv6 packet as we don't want to repeat the
> > transport header or account for it twice. This can happen under some
> > circumstances, such as splicing into an L2TP socket.
> >
> > The symptom observed is a warning in __ip6_append_data():
> >
> > WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
> >
> > that occurs when MSG_SPLICE_PAGES is used to append more data to an already
> > partially occupied skbuff. The warning occurs when 'copy' is larger than
> > the amount of data in the message iterator. This is because the requested
> > length includes the transport header length when it shouldn't. This can be
> > triggered by, for example:
> >
> > sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
> > bind(sfd, ...); // ::1
> > connect(sfd, ...); // ::1 port 7
> > send(sfd, buffer, 4100, MSG_MORE);
> > sendfile(sfd, dfd, NULL, 1024);
> >
> > Fix this by deducting transhdrlen from length in ip{,6}_append_data() right
> > before we clear transhdrlen if there is already a packet that we're going
> > to try appending to.
> >
> > Reported-by: syzbot+62cbf263225ae13ff153@...kaller.appspotmail.com
> > Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/
> > Signed-off-by: David Howells <dhowells@...hat.com>
> > cc: Eric Dumazet <edumazet@...gle.com>
> > cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>
> > cc: "David S. Miller" <davem@...emloft.net>
> > cc: David Ahern <dsahern@...nel.org>
> > cc: Paolo Abeni <pabeni@...hat.com>
> > cc: Jakub Kicinski <kuba@...nel.org>
> > cc: netdev@...r.kernel.org
> > cc: bpf@...r.kernel.org
> > cc: syzkaller-bugs@...glegroups.com
> > Link: https://lore.kernel.org/r/75315.1695139973@warthog.procyon.org.uk/ # v1
> > ---
> > net/ipv4/ip_output.c | 1 +
> > net/ipv6/ip6_output.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 4ab877cf6d35..9646f2d9afcf 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
> > if (err)
> > return err;
> > } else {
> > + length -= transhdrlen;
> > transhdrlen = 0;
> > }
> >
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index 54fc4c711f2c..6a4ce7f622e9 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk,
> > length += exthdrlen;
> > transhdrlen += exthdrlen;
> > } else {
> > + length -= transhdrlen;
> > transhdrlen = 0;
> > }
> >
>
> Definitely a much simpler patch, thanks.
>
> So the current model is that callers with non-zero transhdrlen always
> pass to __ip_append_data payload length + transhdrlen.
>
> I do see that udp does this: ulen += sizeof(struct udphdr); This calls
> ip_make_skb if not corked, but directly ip_append_data if corked.
>
> Then __ip_append_data will use transhdrlen in its packet calculations,
> and reset that to zero after allocating the first new skb.
>
> So if corked *and* fragmentation, which would cause a new skb to be
> allocated, the next skb would incorrectly reserve udp header space,
> because the second __ip_append_data call will again pass transhdrlen.
> If so, then this patch fixes that. But that has never been reported,
> so I'm most likely misreading some part..
This works today because udp only includes transhdrlen if not corked.
In udpv6_sendmsg:
if (up->pending) {
...
goto do_append_data;
}
ulen += sizeof(struct udphdr);
So ip6_append_data is called with ulen == len once data is pending, so
subtracting transhdrlen (which is still sizeof(udphdr)) would not be
correct.
l2tp_ip6_sendmsg more or less follows udpv6_sendmsg, but it
unconditionally sets ulen = len + transhdrlen. So maybe the fix is in
L2TP:
+++ b/net/l2tp/l2tp_ip6.c
@@ -507,7 +507,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk,
struct msghdr *msg, size_t len)
*/
if (len > INT_MAX - transhdrlen)
return -EMSGSIZE;
- ulen = len + transhdrlen;
/* Mirror BSD error message compatibility */
if (msg->msg_flags & MSG_OOB)
@@ -628,6 +627,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk,
struct msghdr *msg, size_t len)
back_from_confirm:
lock_sock(sk);
+ ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
As said, only raw, udp and l2p can possibly pass MSG_MORE and so cause
secondary invocations of ip6_append_data for the same send. With raw
passing transhdrlen 0, and udp as discussed above, we only have to
consider l2tp.
Powered by blists - more mailing lists