lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com>
Date:   Thu, 17 Jan 2019 22:54:14 +0100
From:   Marek Majkowski <marek@...udflare.com>
To:     netdev@...r.kernel.org
Subject: Using SOCKMAP as echo TCP server - kernel stack overflow (double-fault)

Hi,

perhaps you can tell me if I'm doing something wrong.

I'm playing with BPF_MAP_TYPE_SOCKMAP map with trivial
BPF_SK_SKB_STREAM_PARSER and BPF_SK_SKB_STREAM_VERDICT ebpf programs
to do basic TCP echo server.

The code:
https://gist.github.com/majek/a09bcbeb8ab548cde6c18c930895c3f2#file-sockmap-echo-kern-c-L17

>>>
SEC("prog_parser")
int _prog_parser(struct __sk_buff *skb) { (void)skb; return 4096; }

SEC("prog_verdict")
int _prog_verdict(struct __sk_buff *skb)
{
uint32_t idx = 0;
return bpf_sk_redirect_map(skb, &sock_map, idx, 0);
}
<<<

Notice the unconditional 4096 in prog_parser. To reproduce:

git clone git://gist.github.com/a09bcbeb8ab548cde6c18c930895c3f2.git
cd a09bcbeb8ab548cde6c18c930895c3f2/
make
sudo ./sockmap-echo
[+] Accepting on 0.0.0.0:4321

Then on the remote node:

$ yes | nc -q 1 -vvvv 10.133.8.66 4321 |wc -l

What is expected: not crashing, returning the same amount of data as
piped into nc.

What do I get: couple of things, but let's start with simplest: kernel
panic related to GRO/GSO.

The code works okay (doesn't crash) until I play with GRO/GSO on the
sockmap-echo node:

$ sudo ethtool -K int0  gro off gso off
Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported

I'm running this on "sfc" network card, with vlans:
03:00.0 Ethernet controller: Solarflare Communications SFC9120 (rev 01)

$ sudo modinfo sfc
filename:
/lib/modules/4.19.13-cloudflare-2019.1.4/kernel/drivers/net/ethernet/sfc/sfc.ko
version:        4.14.0.1014
license:        GPL
description:    Solarflare network driver
author:         Solarflare Communications and Michael Brown
<mbrown@...systems.co.uk>
srcversion:     89349D4C6CB39AA17FE7FE3
...
depends:        mdio
name:           sfc
vermagic:       4.19.13-cloudflare-2019.1.4 SMP mod_unload

After disabling GRO/GSO I see:

[ 1378.455127] BUG: stack guard page was hit at 0000000045d41b2c
(stack is 000000001743f701..000000002e3b4c89)
[ 1378.472896] kernel stack overflow (double-fault): 0000 [#1] SMP PTI
[ 1378.487149] CPU: 21 PID: 53397 Comm: kworker/21:1 Tainted: G
   O      4.19.13-cloudflare-2019.1.4 #2019.1.4
[ 1378.505777] Hardware name: Quanta Computer Inc QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 1378.523280] Workqueue: events smap_tx_work
[ 1378.535633] RIP: 0010:__put_page+0x1b/0x30
[ 1378.547935] Code: 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f
1f 44 00 00 48 8b 07 f6 c4 80 74 02 eb b1 48 8b 47 08 a8 01 75 f6 53
48 89 fb <e8> 90 fd ff ff 48 89 df 5b e9 17 6c ff ff 0f 1f 80 00 00 00
00 0f
[ 1378.583257] RSP: 0018:ffff8def490e4000 EFLAGS: 00010246
[ 1378.596498] RAX: dead000000000100 RBX: ffffe1753f4a7640 RCX: ffffe1753e9bb688
[ 1378.611647] RDX: dead0000000000ff RSI: ffffe175403aa688 RDI: ffffe1753f4a7640
[ 1378.611648] RBP: ffff8bf27274f9c0 R08: ffff8c00fffdea00 R09: ffffffffb3eb4800
[ 1378.611648] R10: ffff8c005dc90100 R11: 0000000000000001 R12: ffff8c005dc91600
[ 1378.611649] R13: 0000000000002238 R14: ffff8c00a9024ec0 R15: 0ffff8c00bfb6480
[ 1378.611651] FS:  0000000000000000(0000) GS:ffff8c00bfb40000(0000)
knlGS:0000000000000000
[ 1378.611651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1378.611652] CR2: ffff8def490e3ff8 CR3: 0000001dcf00a002 CR4: 00000000001606e0
[ 1378.611653] Call Trace:
[ 1378.611662]  skb_release_data+0x88/0x140
[ 1378.611664]  ? skb_release_data+0xa5/0x140
[ 1378.611666]  kfree_skb+0x32/0xa0
[ 1378.611668]  skb_release_data+0xa5/0x140
[ 1378.611670]  ? skb_release_data+0xa5/0x140
[ 1378.611674]  kfree_skb+0x32/0xa0
[ 1378.795283]  skb_release_data+0xa5/0x140
[ 1378.795285]  ? skb_release_data+0xa5/0x140
[ 1378.795287]  kfree_skb+0x32/0xa0
[ 1378.795288]  skb_release_data+0xa5/0x140
[ 1378.795290]  ? skb_release_data+0xa5/0x140
[ 1378.795292]  kfree_skb+0x32/0xa0
[ 1378.795294]  skb_release_data+0xa5/0x140
[ 1378.795295]  ? skb_release_data+0xa5/0x140
[ 1378.795297]  kfree_skb+0x32/0xa0
...
[ 1385.038439]  smap_tx_work+0x236/0x2b0
[ 1385.046811]  process_one_work+0x1fa/0x3f0
[ 1385.055539]  ? rescuer_thread+0x330/0x330
[ 1385.064199]  worker_thread+0x2d/0x3d0
[ 1385.072532]  ? rescuer_thread+0x330/0x330
[ 1385.081204]  kthread+0x113/0x130
[ 1385.089095]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1385.098901]  ret_from_fork+0x35/0x40
[ 1385.107197] Modules linked in: ...
[ 1385.249536] ---[ end trace 2750106f41eb6d8e ]---

$ cat /proc/version
Linux version 4.19.13-cloudflare-2019.1.4 (builder@...tc-agent) (gcc
version 8.2.0 (GCC)) #2019.1.4 SMP Wed Jan 9 11:04:31 UTC 2019

I'm having more problems with this sockmap-echo code, but at this
moment this is the simplest thing to reproduce.

Marek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ