netdev - Re: [PATCH v2] net: rds: acquire refcount on TCP sockets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7be0b6ae-08c8-59a6-7cc2-3caee827a7e6@I-love.SAKURA.ne.jp>
Date:   Tue, 3 May 2022 23:08:02 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>
Cc:     Santosh Shilimkar <santosh.shilimkar@...cle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        syzbot <syzbot+694120e1002c117747ed@...kaller.appspotmail.com>,
        netdev <netdev@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        OFED mailing list <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH v2] net: rds: acquire refcount on TCP sockets

On 2022/05/03 22:45, Eric Dumazet wrote:
>> This looks equivalent to the fix presented here:
>>
>> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/

I retested the fix above using

unshare -n sh -c '
ip link set lo up
iptables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
ip6tables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
telnet 127.0.0.1 16385
dmesg -c
netstat -tanpe' < /dev/null

as a test case, but it seems racy; sometimes timer function is called again and crashes.

[  426.086565][    C2] general protection fault, probably for non-canonical address 0x6b6af3ebcc3b6bc3: 0000 [#1] PREEMPT SMP KASAN
[  426.096339][    C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.18.0-rc5-dirty #807
[  426.103769][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  426.111851][    C2] RIP: 0010:__tcp_transmit_skb+0xe72/0x1b80
[  426.117512][    C2] Code: e8 b3 ea dc fd 48 8d 7d 30 45 0f b7 77 30 e8 95 ec dc fd 48 8b 5d 30 48 8d bb b8 02 00 00 e8 85 ec dc fd 48 8b 83 b8 02 00 00 <65> 4c 01 70 58 e9 67 fd ff ff e8 ef 56 ac fd 48 8d bd d0 09 00 00
[  426.124692][    C2] RSP: 0018:ffff888060d09ac8 EFLAGS: 00010246
[  426.126845][    C2] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8880145c8000 RCX: ffffffff838cc28b
[  426.129616][    C2] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880145c82b8
[  426.132374][    C2] RBP: ffff8880129f8000 R08: 0000000000000000 R09: 0000000000000007
[  426.135077][    C2] R10: ffffffff838cbfd4 R11: 0000000000000001 R12: ffff8880129f8760
[  426.137793][    C2] R13: ffff88800f6e0118 R14: 0000000000000001 R15: ffff88800f6e00e8
[  426.140489][    C2] FS:  0000000000000000(0000) GS:ffff888060d00000(0000) knlGS:0000000000000000
[  426.143525][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  426.145792][    C2] CR2: 000055b5bb0adabc CR3: 000000000e003000 CR4: 00000000000506e0
[  426.148509][    C2] Call Trace:
[  426.149442][    C2]  <IRQ>
[  426.150183][    C2]  ? __tcp_select_window+0x710/0x710
[  426.151457][    C2]  ? __sanitizer_cov_trace_cmp4+0x1c/0x70
[  426.153007][    C2]  ? tcp_current_mss+0x165/0x280
[  426.154245][    C2]  ? tcp_trim_head+0x300/0x300
[  426.155396][    C2]  ? find_held_lock+0x85/0xa0
[  426.156734][    C2]  ? mark_held_locks+0x65/0x90
[  426.157967][    C2]  tcp_write_wakeup+0x2e2/0x340
[  426.159149][    C2]  tcp_send_probe0+0x2a/0x2c0
[  426.160368][    C2]  tcp_write_timer_handler+0x5cb/0x670
[  426.161740][    C2]  tcp_write_timer+0x86/0x250
[  426.162896][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.164285][    C2]  call_timer_fn+0x15d/0x5f0
[  426.165481][    C2]  ? add_timer_on+0x2e0/0x2e0
[  426.166667][    C2]  ? lock_downgrade+0x3c0/0x3c0
[  426.167921][    C2]  ? mark_held_locks+0x24/0x90
[  426.169263][    C2]  ? _raw_spin_unlock_irq+0x1f/0x40
[  426.170564][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.171920][    C2]  __run_timers.part.0+0x523/0x740
[  426.173181][    C2]  ? call_timer_fn+0x5f0/0x5f0
[  426.174321][    C2]  ? pvclock_clocksource_read+0xdc/0x1a0
[  426.175655][    C2]  run_timer_softirq+0x66/0xe0
[  426.176825][    C2]  __do_softirq+0x1c2/0x670
[  426.177944][    C2]  __irq_exit_rcu+0xf8/0x140
[  426.179120][    C2]  irq_exit_rcu+0x5/0x20
[  426.180150][    C2]  sysvec_apic_timer_interrupt+0x8e/0xc0
[  426.181486][    C2]  </IRQ>
[  426.182180][    C2]  <TASK>
[  426.182845][    C2]  asm_sysvec_apic_timer_interrupt+0x12/0x20

> 
> I think this is still needed for layers (NFS ?) that dismantle their
> TCP sockets whenever a netns
> is dismantled. But RDS case was different, only the listener is a kernel socket.

We can't apply the fix above.

I think that the fundamental problem is that we use net->ns.count for both
"avoiding use-after-free" purpose and "allowing dismantle from user event" purpose.
Why not to use separated counters?