[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1177539237.21594.3.camel@w-sridhar2.beaverton.ibm.com>
Date: Wed, 25 Apr 2007 15:13:57 -0700
From: Sridhar Samudrala <sri@...ibm.com>
To: Vlad Yasevich <vladislav.yasevich@...com>
Cc: netdev <netdev@...r.kernel.org>
Subject: Re: very strange inet_sock corruption with rpc
On Wed, 2007-04-25 at 17:03 -0400, Vlad Yasevich wrote:
> Hi All
>
> To support a piece of custom functionality, we needed to add
> 2 member to the struct inet_sock. During testing, we started
> seeing an interesting corruption. Following a hunch, we've
> completely ripped out all of our code with the exception of
> 5 lines that do this:
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index ce6da97..605f5c0 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -140,6 +140,8 @@ struct inet_sock {
> __be32 addr;
> struct flowi fl;
> } cork;
> + void *foo;
> + u32 bar;
> };
>
> #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index cf358c8..98ad2c2 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -335,6 +335,9 @@ lookup_protocol:
>
> sk_refcnt_debug_inc(sk);
>
> + inet->foo = NULL;
> + inet->bar = 0;
> +
> if (inet->num) {
> /* It assumes that any protocol which allows
> * the user to assign a number at socket
>
> (Variables were really named something else, but I hacked this into
> net-2.6 to see if I could reproduce).
>
> With just the above patch, I can catch a corruption of the inet_sock
> in the inet_cks_bind_conflict() with this:
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 43fb160..5cd5b6d 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -45,6 +45,18 @@ int inet_csk_bind_conflict(const struct sock *sk,
> int reuse = sk->sk_reuse;
>
> sk_for_each_bound(sk2, node, &tb->owners) {
> + if (inet_sk(sk2)->foo) {
> + printk(KERN_WARN "sk2 might be corrupt. Info:\n");
> + printk(KERN_WARN "\tsk2 = %p\n", sk2);
> + printk(KERN_WARN "\ttb->port = %d\n", tb->port);
> + printk(KERN_WARN "\tinet_sk(sk2)->num = %d\n",
> + inet_sk(sk2)->num);
> + printk(KERN_WARN "\tinet_sk(sk2)->foo = %p\n",
> + inet_sk(sk2)->foo);
> + printk(KERN_WARN "\tinet_sk(sk2)->bar = %p\n",
> + inet_sk(sk2)->bar);
> + WARN_ON(1);
> + }
>
> Nobody outside of inet_create() writes to the foo pointer so it should
> always be NULL. I've enabled SLAB debugging, stack overflow debugging, VM
> debugging and nothing triggers.
>
> The corruption is triggered after about 10 minutes of running the following
> script:
>
> nfspath = $1
> localpath = $2
> while true; do
> mount "$nfspath" "$localpath"
> sleep 5
> cp /boot/vmlinuz "$localpath"
> sleep 5
> rm $localpath/vmlinuz
> sleep 5
> umount "$localpath"
> done
>
>
> And looks like this:
>
> sk2 might be corrupt. Info:
> sk2 = ffff8100f004d080
> tb->port = 844
> inet_sk(sk2)->num = 61695
> inet_sk(sk2)->foo = 24242424243f243f
> inet_sk(sk2)->bar = 3f24243f
> BUG: at net/ipv4/inet_connection_sock.c:58 inet_csk_bind_conflict()
>
> Call Trace:
> [<ffffffff803cc591>] inet_csk_bind_conflict+0xcb/0x178
> [<ffffffff803cc4c6>] inet_csk_bind_conflict+0x0/0x178
> [<ffffffff803cc2ff>] inet_csk_get_port+0x11a/0x1ef
> [<ffffffff803ddf51>] inet_bind+0x117/0x1f5
> [<ffffffff88184e13>] :sunrpc:xs_bindresvport+0x4e/0xbf
> [<ffffffff881853a4>] :sunrpc:xs_tcp_connect_worker+0x0/0x2a0
> [<ffffffff88185433>] :sunrpc:xs_tcp_connect_worker+0x8f/0x2a0
If you are using NFS over UDP, why is a TCP routine
getting called by sunrpc?
> [<ffffffff80248bd3>] run_workqueue+0x8f/0x137
> [<ffffffff80245687>] worker_thread+0x0/0x14a
> [<ffffffff8024579b>] worker_thread+0x114/0x14a
> [<ffffffff8027e544>] default_wake_function+0x0/0xe
> [<ffffffff8022ff49>] kthread+0xd1/0x100
> [<ffffffff80258f68>] child_rip+0xa/0x12
> [<ffffffff8022fe78>] kthread+0x0/0x100
> [<ffffffff80258f5e>] child_rip+0x0/0x12
>
>
> It looks like someone is stepping all over the inet_sock.
> We'll continue looking, but if anyone has any ideas of what might
> be going on, I'd appreciate it.
>
> It looks like a serious bug lurking somewhere.
>
> -vlad
>
> p.s the mount is using nfsv3 over UDP (nothing fancy at all)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists