[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1435794834.11970.6.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 02 Jul 2015 01:53:54 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Eric Dumazet <edumazet@...gle.com>,
Alex Gartrell <alexgartrell@...il.com>
Cc: David Miller <davem@...emloft.net>,
"agartrell@...com" <agartrell@...com>,
netdev <netdev@...r.kernel.org>, kernel-team <kernel-team@...com>
Subject: Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we
have a TCP_TIMEWAIT sk
On Thu, 2015-07-02 at 01:26 +0200, Eric Dumazet wrote:
> On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell <alexgartrell@...il.com> wrote:
> > On Wednesday, July 1, 2015, Eric Dumazet <edumazet@...gle.com> wrote:
> >>
> >> On Wed, Jul 1, 2015 at 11:14 PM, David Miller <davem@...emloft.net> wrote:
> >> > From: Alex Gartrell <agartrell@...com>
> >> > Date: Wed, 1 Jul 2015 13:13:09 -0700
> >> >
> >> >> If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan
> >> >> it
> >> >> (as we need to do in the ipvs forwarding case), sock_wfree and
> >> >> sock_rfree
> >> >> are going to reach into the inet_timewait_sock and mess with fields
> >> >> that
> >> >> don't exist.
> >> >>
> >> >> Signed-off-by: Alex Gartrell <agartrell@...com>
> >> >
> >> > If we're forwarding, we should not find a local socket, period.
> >>
> >> A socket cannot change state to TCP_TIMEWAIT.
> >>
> >> A new object is allocated and old one is removed from ehash, then
> >> freed (rcu rules being applied)
> >>
> >> Also sock_wfree() has nothing to do with early demux. It is for output
> >> path skbs only.
> >
> >
> > Alright I kind of cheated and didn't include full context here. The problem
> > is that within ipvs we are getting packets that have been early demuxed and
> > associated with time wait sockets which we then wish to forward immediately
> > (ip_vs_xmit.c). Under normal circumstances it would never be associated
> > with any sk at all, but it is because of early demux, so we want to drop the
> > relationship by calling skb_orphan. This invokes the destructor which lands
> > us there.
> >
> > So that is how we reach this illegal "treating a twsk like an sk" state.
> >
> > If there is a better way to drop the association than skb_orphan I will use
> > it.
>
> I think you are mistaken Alex.
>
> socket early demux cannot possibly set skb->destructor to sock_rfree()
>
> If skb->destructor is set by early demux, it correctly points to sock_edemux()
>
> And this one correctly handles all socket variants.
If ipvs is the problem, could you try instead following patch ?
Shoot in the dark, as you do not give a lot of details :(
diff --git a/include/net/sock.h b/include/net/sock.h
index 05a8c1aea251..f77fe9acc7a4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1932,6 +1932,14 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
}
}
+/* This helper checks if a socket is a full socket,
+ * ie _not_ a timewait or request socket.
+ */
+static inline bool sk_fullsock(const struct sock *sk)
+{
+ return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
+}
+
/*
* Queue a received datagram if it will fit. Stream and sequenced
* protocols can't normally use this as they need to fit buffers in
@@ -1944,6 +1952,9 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
{
skb_orphan(skb);
+ if (unlikely(!sk_fullsock(sk))
+ return;
+
skb->sk = sk;
skb->destructor = sock_wfree;
skb_set_hash_from_sk(skb, sk);
@@ -2204,14 +2215,6 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb)
return NULL;
}
-/* This helper checks if a socket is a full socket,
- * ie _not_ a timewait or request socket.
- */
-static inline bool sk_fullsock(const struct sock *sk)
-{
- return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
-}
-
void sock_enable_timestamp(struct sock *sk, int flag);
int sock_get_timestamp(struct sock *, struct timeval __user *);
int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 5d2b806a862e..ff05ec5a9016 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1161,9 +1161,10 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT &&
af == AF_INET)) {
struct sock *sk = skb->sk;
- struct inet_sock *inet = inet_sk(skb->sk);
- if (inet && sk->sk_family == PF_INET && inet->nodefrag)
+ if (sk_fullsock(sk) &&
+ sk->sk_family == PF_INET &&
+ inet_sk(sk)->nodefrag)
return NF_ACCEPT;
}
@@ -1640,9 +1641,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT &&
af == AF_INET)) {
struct sock *sk = skb->sk;
- struct inet_sock *inet = inet_sk(skb->sk);
- if (inet && sk->sk_family == PF_INET && inet->nodefrag)
+ if (sk_fullsock(sk) &&
+ sk->sk_family == PF_INET &&
+ inet_sk(sk)->nodefrag)
return NF_ACCEPT;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists