netdev - race in skb_splice

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <200805270325.24323.opurdila@ixiacom.com>
Date:	Tue, 27 May 2008 03:25:23 +0300
From:	Octavian Purdila <opurdila@...acom.com>
To:	netdev@...r.kernel.org
Subject: race in skb_splice_bits?


Hi,

The following socket lock dropping in skb_splice_bits seems to open a race 
condition which causes an invalid kernel access:

>        if (spd.nr_pages) {
>                int ret;
>
>                /*                                                                                                                            
>                 * Drop the socket lock, otherwise we have reverse                                                                            
>                 * locking dependencies between sk_lock and i_mutex                                                                           
>                 * here as compared to sendfile(). We enter here                                                                              
>                 * with the socket lock held, and splice_to_pipe() will                                                                       
>                 * grab the pipe inode lock. For sendfile() emulation,                                                                        
>                 * we call into ->sendpage() with the i_mutex lock held                                                                       
>                 * and networking will grab the socket lock.                                                                                  
>                 */
>                release_sock(__skb->sk);                                                                                                    
>                ret = splice_to_pipe(pipe, &spd);
>                lock_sock(__skb->sk);                                                                                                       
>                return ret;
>        }

Setup: 

- powerpc, non-SMP, no preemption, 2.6.25
- RX side: LRO enabled, splice from socket to /dev/null; 
- TX side: MTU set to 128 bytes (on the TX side), GSO enabled, splice from 
file to socket

The oops - on the RX side: 

Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0x80109ee0
Oops: Kernel access of bad area, sig: 11 [#1]
Ixia TCPX
Modules linked in: almfmanager(P) filtermanager ixnam_llm(P) ixna
m_tcpx(P) hwstate ixllm ixhostm ixsysctl(P) nlproc_driver
NIP: 80109ee0 LR: 80109edc CTR: 8010c52c
REGS: bcd25b90 TRAP: 0300   Tainted: P          (2.6.25-00005-gf7b547d)
MSR: 00009032 <EE,ME,IR,DR>  CR: 24000822  XER: 20000000
DAR: 00000030, DSISR: 40000000
TASK = bfbe1bf0[156] 'splice' THREAD: bcd24000
GPR00: 8010c94c bcd25c40 bfbe1bf0 00000000 00000000 802835f8 00000001 0000004c 
GPR08: 00024000 00000100 00000032 bcd24000 00010dc4 100198b4 390046a8 0a5042f3 
GPR16: 8028238c bd18fe00 00000008 10010000 6fbcbac0 00000000 10001060 bcd25dd8 
GPR24: 8014b520 00000000 bcd25e30 bccefa00 bf33e300 fffffe00 bcd25d70 00000000 
NIP [80109ee0] lock_sock_nested+0x1c/0x50
LR [80109edc] lock_sock_nested+0x18/0x50
Call Trace:
[bcd25c60] [8010c94c] skb_splice_bits+0x130/0x134
[bcd25dc0] [8014b548] tcp_splice_data_recv+0x28/0x38
[bcd25dd0] [8014d08c] tcp_read_sock+0x108/0x1f8
[bcd25e20] [8014b58c] __tcp_splice_read+0x34/0x44
[bcd25e40] [8014b61c] tcp_splice_read+0x80/0x220
[bcd25e90] [80105730] sock_splice_read+0x2c/0x44
[bcd25ea0] [8008a374] do_splice_to+0x90/0xac
[bcd25ed0] [8008a850] do_splice+0x258/0x2f0
[bcd25f10] [8008b1d4] sys_splice+0xe0/0xe8
[bcd25f40] [8000ff14] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0x10000894
     LR = 0x10000e2c


Analysis: 

Printks show that __skb->sk is non-NULL before splice_to_pipe and NULL after. 
Using a hardware watchpoint I was able to see that the write in __skb->sk is 
caused by __allock_skb()'s memset() which seems to indicate that the __skb is 
freed between release_sock() and lock_sock(). Turning on slab debugging and 
the hardware watchpoint shows that the free happens during tcp_collapse() 
which was initiated as a result of an timer interrupt -> softirq -> NAPI 
polling -> lro_flush_all().

Commenting out the sequence that drops the socket lock seems to fix the 
problem on my setup.

Regards,
tavi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html