[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8fde7da-9402-4e17-1bcc-cba08f1ec18b@virtuozzo.com>
Date: Wed, 20 Feb 2019 16:34:21 +0300
From: Vasily Averin <vvs@...tuozzo.com>
To: "David S. Miller" <davem@...emloft.net>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Ilya Dryomov <idryomov@...il.com>
Subject: skb_can_coalesce() merges tcp frags with XFS-related slab objects
Dear David,
currently do_tcp_sendpages() calls skb_can_coalesce() to merge proper tcp fragments.
If these fragments are slab objects and the data is not transferred out of the local host
then tcp_recvmsg() can crash host on BUG_ON (see [2] below).
There is known usecase when slab objects are provided to tcp_sendpage:
XFS over locally landed network blockdevice.
I found few such cases:
- _drbd_send_page() had PageSlab() check log time ago.
- recently Ilya Dryomov fixed it in ceph
by commit 7e241f647dc7 "libceph: fall back to sendmsg for slab pages"
Recently OpenVZ team noticed this problem during experiments with
XFS over locally-landed iscsi target.
I would note: triggered BUG is not a real problem but false alert,
that though crashes host.
I can fix last problem by adding PageSlab() into iscsi_tcp_segment_map(),
however it does not fix the problem completely,
there are chances that the problem will be reproduced again with some other filesystems
or with some other kind of network blockdevice.
David, what do you think, is it probably better to add PageSlab() check
directly into skb_can_coalesce()? (see [1] below)
Thank you,
Vasily Averin
[1] The patch preventing the merge of the Slab-based tcp fragments
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 95d25b010a25..e1d200ba1fef 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3089,7 +3089,7 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
if (i) {
const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];
- return page == skb_frag_page(frag) &&
+ return page == skb_frag_page(frag) && !PageSlab(page) &&
off == frag->page_offset + skb_frag_size(frag);
}
return false;
[2] oops example, RHEL7-based OpenVZ node, XFS over locally-landed iscsi target
[ 4902.545219] usercopy: kernel memory exposure attempt detected from ffff8c1497c92200 (kmalloc-512) (1024 bytes)
[ 4902.550585] ------------[ cut here ]------------
[ 4902.552472] kernel BUG at mm/usercopy.c:72!
[ 4902.554148] invalid opcode: 0000 [#1] SMP
[ 4902.555906] Modules linked in: nf_conntrack_netlink xt_mark raw_diag udp_diag netlink_diag af_packet_diag unix_diag dm_service_time iscsi_tcp libiscsi_tcp libiscsi xfs xt_CHECKSUM ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables binfmt_misc tun devlink tcp_diag inet_diag ip_set nfnetlink fuse kvm_intel ppdev kvm i2c_piix4 irqbypass sg virtio_balloon parport_pc parport joydev pcspkr libcrc32c br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat
[ 4903.413397] ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod crct10dif_generic cdrom crct10dif_common ata_generic pata_acpi 8021q garp mrp stp llc virtio_net virtio_console virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix scsi_transport_iscsi drm libata crc32c_intel serio_raw virtio_pci virtio_ring dm_multipath virtio drm_panel_orientation_quirks floppy sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[ 4903.433902] CPU: 0 PID: 13793 Comm: tgtd ve: 0 Kdump: loaded Not tainted 3.10.0-957.1.3.vz7.83.4 #1 83.4
[ 4903.436975] Hardware name: Virtuozzo KVM, BIOS 1.10.2-3.1.vz7.3 04/01/2014
[ 4903.439474] task: ffff8c14f898c740 ti: ffff8c14f8a94000 task.ti: ffff8c14f8a94000
[ 4903.442278] RIP: 0010:[<ffffffff9ce59427>] [<ffffffff9ce59427>] __check_object_size+0x87/0x250
[ 4903.445370] RSP: 0018:ffff8c14f8a97b58 EFLAGS: 00010246
[ 4903.447547] RAX: 0000000000000062 RBX: ffff8c1497c92200 RCX: 0000000000000000
[ 4903.450108] RDX: 0000000000000000 RSI: ffff8c167fc138d8 RDI: ffff8c167fc138d8
[ 4903.452753] RBP: ffff8c14f8a97b78 R08: 0000000000000004 R09: ffff8c166d14af00
[ 4903.455451] R10: 0000000000000080 R11: ffff97b6016ffff8 R12: 0000000000000400
[ 4903.458142] R13: 0000000000000001 R14: ffff8c1497c92600 R15: 0000000000000400
[ 4903.460759] FS: 00007fd59b2b7740(0000) GS:ffff8c167fc00000(0000) knlGS:0000000000000000
[ 4903.463776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4903.466111] CR2: 0000000002701b58 CR3: 00000000b8a98000 CR4: 00000000000006f0
[ 4903.468860] Call Trace:
[ 4903.470483] [<ffffffff9cfb310d>] memcpy_toiovec+0x4d/0xb0
[ 4903.472780] [<ffffffff9d2589e8>] skb_copy_datagram_iovec+0x128/0x280
[ 4903.475388] [<ffffffff9d2c0646>] tcp_recvmsg+0x246/0xbb0
[ 4903.477884] [<ffffffff9cdd42cd>] ? __alloc_pages_nodemask+0x26d/0x610
[ 4903.480432] [<ffffffff9d2ef1d0>] inet_recvmsg+0x80/0xb0
[ 4903.482718] [<ffffffff9d2467bc>] sock_aio_read.part.12+0x14c/0x170
[ 4903.485255] [<ffffffff9d246801>] sock_aio_read+0x21/0x30
[ 4903.487495] [<ffffffff9ce5edd6>] do_sync_read+0x96/0xe0
[ 4903.489710] [<ffffffff9ce5f8b5>] vfs_read+0x145/0x170
[ 4903.491975] [<ffffffff9ce606cf>] SyS_read+0x7f/0xf0
[ 4903.494179] [<ffffffff9d3a4de1>] ? system_call_after_swapgs+0xae/0x146
[ 4903.496684] [<ffffffff9d3a4e9b>] system_call_fastpath+0x22/0x27
[ 4903.499143] [<ffffffff9d3a4de1>] ? system_call_after_swapgs+0xae/0x146
[ 4903.501748] Code: 45 d1 48 c7 c6 f8 c3 68 9d 48 c7 c1 93 5e 69 9d 48 0f 45 f1 49 89 c0 4d 89 e1 48 89 d9 48 c7 c7 68 2b 69 9d 31 c0 e8 4f 1e 53 00 <0f> 0b 0f 1f 80 00 00 00 00 48 c7 c0 00 00 c0 9c 4c 39 f0 73 0d
[ 4903.510508] RIP [<ffffffff9ce59427>] __check_object_size+0x87/0x250
[ 4903.513149] RSP <ffff8c14f8a97b58>
Powered by blists - more mailing lists