[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 9 Sep 2019 13:46:20 -0400
From: Pooja Trivedi <poojatrivedi@...il.com>
To: netdev@...r.kernel.org
Subject: [PATCH net 0/1] net/tls(TLS_SW): double free in tls_tx_records
TLS module crash while running SSL record encryption using
klts_send_[file] using crypto accelerator (Nitrox).
Following are the preconditions and steps to reproduce the issue:
Preconditions:
1) Installed 5.3-rc4
2) Nitrox5 card plugin (crypto accelerator)
Steps to reproduce the issue:
1) Installed n5pf.ko (drivers/crypto/cavium/nitrox)
2) Installed tls.ko if not is installed by default (net/tls)
3) Obtained uperf tool from GitHub
3.1) Modified uperf to use tls module by using setsocket.
3.2) Modified uperf tool to support sendfile with SSL.
Test:
1) Ran uperf with 4 threads
2) Each thread sends data using sendfile over SSL protocol.
After a few seconds into the test, kernel crashes because of record
list corruption
[ 270.888952] ------------[ cut here ]------------
[ 270.890450] list_del corruption, ffff91cc3753a800->prev is
LIST_POISON2 (dead000000000122)
[ 270.891194] WARNING: CPU: 1 PID: 7387 at lib/list_debug.c:50
__list_del_entry_valid+0x62/0x90
[ 270.892037] Modules linked in: n5pf(OE) netconsole tls(OE) bonding
intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel crypto_simd mei_me cryptd glue_helper ipmi_si sg mei
lpc_ich pcspkr joydev ioatdma i2c_i801 ipmi_devintf ipmi_msghandler
wmi ip_tables xfs libcrc32c sd_mod mgag200 drm_vram_helper ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm isci
libsas ahci scsi_transport_sas libahci crc32c_intel serio_raw igb
libata ptp pps_core dca i2c_algo_bit dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: nitrox_drv]
[ 270.896836] CPU: 1 PID: 7387 Comm: uperf Kdump: loaded Tainted: G
OE 5.3.0-rc4 #1
[ 270.897711] Hardware name: Supermicro SYS-1027R-N3RF/X9DRW, BIOS
3.0c 03/24/2014
[ 270.898597] RIP: 0010:__list_del_entry_valid+0x62/0x90
[ 270.899478] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 e0 f9 ee
8d e8 b2 cf c8 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 18 fa ee 8d e8 9e
cf c8 ff <0f> 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 50 fa ee 8d e8 87
cf c8
[ 270.901321] RSP: 0018:ffffb6ea86eb7c20 EFLAGS: 00010282
[ 270.902240] RAX: 0000000000000000 RBX: ffff91cc3753c000 RCX: 0000000000000000
[ 270.903157] RDX: ffff91bc3f867080 RSI: ffff91bc3f857738 RDI: ffff91bc3f857738
[ 270.904074] RBP: ffff91bc36020940 R08: 0000000000000560 R09: 0000000000000000
[ 270.904988] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 270.905902] R13: ffff91cc3753a800 R14: ffff91cc37cc6400 R15: ffff91cc3753a800
[ 270.906809] FS: 00007f454a88d700(0000) GS:ffff91bc3f840000(0000)
knlGS:0000000000000000
[ 270.907715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 270.908606] CR2: 00007f453c00292c CR3: 000000103554e003 CR4: 00000000001606e0
[ 270.909490] Call Trace:
[ 270.910373] tls_tx_records+0x138/0x1c0 [tls]
[ 270.911262] tls_sw_sendpage+0x3e0/0x420 [tls]
[ 270.912154] inet_sendpage+0x52/0x90
[ 270.913045] ? direct_splice_actor+0x40/0x40
[ 270.913941] kernel_sendpage+0x1a/0x30
[ 270.914831] sock_sendpage+0x20/0x30
[ 270.915714] pipe_to_sendpage+0x62/0x90
[ 270.916592] __splice_from_pipe+0x80/0x180
[ 270.917461] ? direct_splice_actor+0x40/0x40
[ 270.918334] splice_from_pipe+0x5d/0x90
[ 270.919208] direct_splice_actor+0x35/0x40
[ 270.920086] splice_direct_to_actor+0x103/0x230
[ 270.920966] ? generic_pipe_buf_nosteal+0x10/0x10
[ 270.921850] do_splice_direct+0x9a/0xd0
[ 270.922733] do_sendfile+0x1c9/0x3d0
[ 270.923612] __x64_sys_sendfile64+0x5c/0xc0
Observations:
1) This issue is observed after applying "Commit a42055e8d2c3: Add
support for async encryption of records for performance"
2) 5.2.2 kernel exhibits the same issue
Attached is the complete crash log.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204669
linux-crypto original post:
https://marc.info/?l=linux-crypto-vger&m=156700690108854&w=2
After adding custom profiling around lock_sock/release_sock as well as
getting backtraces of the call stack at and around the time of the
crash/race-condition, it seems like using the socket lock is not the
best way to synchronize write access to tls_tx_records, especially
when the socket lock can get released under tcp memory pressure situation.
One potential way for race condition to appear:
When under tcp memory pressure, Thread 1 takes the following code path:
do_sendfile ---> ... ---> .... ---> tls_sw_sendpage --->
tls_sw_do_sendpage ---> tls_tx_records ---> tls_push_sg --->
do_tcp_sendpages ---> sk_stream_wait_memory ---> sk_wait_event
sk_wait_event releases the socket lock and sleeps waiting for memory:
#define sk_wait_event(__sk, __timeo, __condition, __wait) \
({ int __rc; \
release_sock(__sk); \
__rc = __condition; \
if (!__rc) { \
*(__timeo) = wait_woken(__wait, \
TASK_INTERRUPTIBLE, \
*(__timeo)); \
} \
sched_annotate_sleep(); \
lock_sock(__sk); \
__rc = __condition; \
__rc; \
})
Thread 2 code path:
tx_work_handler ---> tls_tx_records
Thread 2 is able to obtain the socket lock and go through the
transmission of the ctx->tx_list, deleting the sent ones (as in the
for loop below).
int tls_tx_records(struct sock *sk, int flags)
{
....
....
....
....
list_for_each_entry_safe(rec, tmp, &ctx->tx_list, list) {
if (READ_ONCE(rec->tx_ready)) {
if (flags == -1)
tx_flags = rec->tx_flags;
else
tx_flags = flags;
msg_en = &rec->msg_encrypted;
rc = tls_push_sg(sk, tls_ctx,
&msg_en->sg.data[msg_en->sg.curr],
0, tx_flags);
if (rc)
goto tx_err;
list_del(&rec->list); // **** crash location ****
sk_msg_free(sk, &rec->msg_plaintext);
kfree(rec);
} else {
break;
}
}
....
....
....
....
}
When Thread 1 wakes up from tls_push_sg call and attempts list_del on
previously grabbed record which was sent and deleted by Thread 2, it
causes the crash.
To fix this race, a flag or bool inside of ctx can be used to
synchronize access to tls_tx_records.
View attachment "tls_crash_log_5_3_rc4.txt" of type "text/plain" (11274 bytes)
Powered by blists - more mailing lists