lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <462FC238.4040305@hp.com>
Date:	Wed, 25 Apr 2007 17:03:52 -0400
From:	Vlad Yasevich <vladislav.yasevich@...com>
To:	netdev <netdev@...r.kernel.org>
Subject: very strange inet_sock corruption with rpc

Hi All

To support a piece of custom functionality, we needed to add
2 member to the struct inet_sock.  During testing, we started
seeing an interesting corruption.  Following a hunch, we've
completely ripped out all of our code with the exception of
5 lines that do this:

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..605f5c0 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -140,6 +140,8 @@ struct inet_sock {
                __be32                  addr;
                struct flowi            fl;
        } cork;
+       void *foo;
+       u32  bar;
 };
 
 #define IPCORK_OPT     1       /* ip-options has been held in ipcork.opt */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cf358c8..98ad2c2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -335,6 +335,9 @@ lookup_protocol:
 
        sk_refcnt_debug_inc(sk);
 
+       inet->foo = NULL;
+       inet->bar = 0;
+
        if (inet->num) {
                /* It assumes that any protocol which allows
                 * the user to assign a number at socket

(Variables were really named something else, but I hacked this into
 net-2.6 to see if I could reproduce).

With just the above patch, I can catch a corruption of the inet_sock
in the inet_cks_bind_conflict() with this:

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 43fb160..5cd5b6d 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -45,6 +45,18 @@ int inet_csk_bind_conflict(const struct sock *sk,
        int reuse = sk->sk_reuse;
 
        sk_for_each_bound(sk2, node, &tb->owners) {
+               if (inet_sk(sk2)->foo) {
+                       printk(KERN_WARN "sk2 might be corrupt.  Info:\n");
+                       printk(KERN_WARN "\tsk2 = %p\n", sk2);
+                       printk(KERN_WARN "\ttb->port = %d\n", tb->port);
+                       printk(KERN_WARN "\tinet_sk(sk2)->num = %d\n",
+                                       inet_sk(sk2)->num);
+                       printk(KERN_WARN "\tinet_sk(sk2)->foo = %p\n",
+                                       inet_sk(sk2)->foo);
+                       printk(KERN_WARN "\tinet_sk(sk2)->bar = %p\n",
+                                       inet_sk(sk2)->bar);
+                       WARN_ON(1);
+               }

Nobody outside of inet_create() writes to the foo pointer so it should
always be NULL.  I've enabled SLAB debugging, stack overflow debugging, VM
debugging and nothing triggers.

The corruption is triggered after about 10 minutes of running the following
script:

nfspath = $1
localpath = $2
while true; do
	mount "$nfspath" "$localpath"
	sleep 5
	cp /boot/vmlinuz "$localpath"
	sleep 5
	rm $localpath/vmlinuz
	sleep 5
	umount "$localpath"
done


And looks like this:

sk2 might be corrupt.  Info:
        sk2 = ffff8100f004d080
        tb->port = 844
        inet_sk(sk2)->num = 61695
        inet_sk(sk2)->foo = 24242424243f243f
        inet_sk(sk2)->bar = 3f24243f
BUG: at net/ipv4/inet_connection_sock.c:58 inet_csk_bind_conflict()

Call Trace:
 [<ffffffff803cc591>] inet_csk_bind_conflict+0xcb/0x178
 [<ffffffff803cc4c6>] inet_csk_bind_conflict+0x0/0x178
 [<ffffffff803cc2ff>] inet_csk_get_port+0x11a/0x1ef
 [<ffffffff803ddf51>] inet_bind+0x117/0x1f5
 [<ffffffff88184e13>] :sunrpc:xs_bindresvport+0x4e/0xbf
 [<ffffffff881853a4>] :sunrpc:xs_tcp_connect_worker+0x0/0x2a0
 [<ffffffff88185433>] :sunrpc:xs_tcp_connect_worker+0x8f/0x2a0
 [<ffffffff80248bd3>] run_workqueue+0x8f/0x137
 [<ffffffff80245687>] worker_thread+0x0/0x14a
 [<ffffffff8024579b>] worker_thread+0x114/0x14a
 [<ffffffff8027e544>] default_wake_function+0x0/0xe
 [<ffffffff8022ff49>] kthread+0xd1/0x100
 [<ffffffff80258f68>] child_rip+0xa/0x12
 [<ffffffff8022fe78>] kthread+0x0/0x100
 [<ffffffff80258f5e>] child_rip+0x0/0x12


It looks like someone is stepping all over the inet_sock.
We'll continue looking, but if anyone has any ideas of what might
be going on, I'd appreciate it.

It looks like a serious bug lurking somewhere.

-vlad

p.s  the mount is using nfsv3 over UDP (nothing fancy at all)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ