[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dc88af7-3d29-58e1-00e0-127d1f798c28@gmail.com>
Date: Wed, 20 Feb 2019 07:53:17 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Vasily Averin <vvs@...tuozzo.com>,
"David S. Miller" <davem@...emloft.net>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Ilya Dryomov <idryomov@...il.com>
Subject: Re: skb_can_coalesce() merges tcp frags with XFS-related slab objects
On 02/20/2019 05:34 AM, Vasily Averin wrote:
> Dear David,
>
> currently do_tcp_sendpages() calls skb_can_coalesce() to merge proper tcp fragments.
> If these fragments are slab objects and the data is not transferred out of the local host
> then tcp_recvmsg() can crash host on BUG_ON (see [2] below).
>
> There is known usecase when slab objects are provided to tcp_sendpage:
> XFS over locally landed network blockdevice.
>
> I found few such cases:
> - _drbd_send_page() had PageSlab() check log time ago.
> - recently Ilya Dryomov fixed it in ceph
> by commit 7e241f647dc7 "libceph: fall back to sendmsg for slab pages"
>
> Recently OpenVZ team noticed this problem during experiments with
> XFS over locally-landed iscsi target.
>
> I would note: triggered BUG is not a real problem but false alert,
> that though crashes host.
>
> I can fix last problem by adding PageSlab() into iscsi_tcp_segment_map(),
> however it does not fix the problem completely,
> there are chances that the problem will be reproduced again with some other filesystems
> or with some other kind of network blockdevice.
>
> David, what do you think, is it probably better to add PageSlab() check
> directly into skb_can_coalesce()? (see [1] below)
>
No, this would be wrong.
There is no way a page fragment can be backed by slab object,
since a page fragment can be shared (the page refcount needs to be manipulated, without slab/slub
being aware of this)
Please fix the callers.
> Thank you,
> Vasily Averin
>
> [1] The patch preventing the merge of the Slab-based tcp fragments
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 95d25b010a25..e1d200ba1fef 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3089,7 +3089,7 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
> if (i) {
> const struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i - 1];
>
> - return page == skb_frag_page(frag) &&
> + return page == skb_frag_page(frag) && !PageSlab(page) &&
> off == frag->page_offset + skb_frag_size(frag);
> }
> return false;
>
> [2] oops example, RHEL7-based OpenVZ node, XFS over locally-landed iscsi target
>
> [ 4902.545219] usercopy: kernel memory exposure attempt detected from ffff8c1497c92200 (kmalloc-512) (1024 bytes)
> [ 4902.550585] ------------[ cut here ]------------
> [ 4902.552472] kernel BUG at mm/usercopy.c:72!
> [ 4902.554148] invalid opcode: 0000 [#1] SMP
> [ 4902.555906] Modules linked in: nf_conntrack_netlink xt_mark raw_diag udp_diag netlink_diag af_packet_diag unix_diag dm_service_time iscsi_tcp libiscsi_tcp libiscsi xfs xt_CHECKSUM ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables binfmt_misc tun devlink tcp_diag inet_diag ip_set nfnetlink fuse kvm_intel ppdev kvm i2c_piix4 irqbypass sg virtio_balloon parport_pc parport joydev pcspkr libcrc32c br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat
> [ 4903.413397] ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod crct10dif_generic cdrom crct10dif_common ata_generic pata_acpi 8021q garp mrp stp llc virtio_net virtio_console virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix scsi_transport_iscsi drm libata crc32c_intel serio_raw virtio_pci virtio_ring dm_multipath virtio drm_panel_orientation_quirks floppy sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> [ 4903.433902] CPU: 0 PID: 13793 Comm: tgtd ve: 0 Kdump: loaded Not tainted 3.10.0-957.1.3.vz7.83.4 #1 83.4
> [ 4903.436975] Hardware name: Virtuozzo KVM, BIOS 1.10.2-3.1.vz7.3 04/01/2014
> [ 4903.439474] task: ffff8c14f898c740 ti: ffff8c14f8a94000 task.ti: ffff8c14f8a94000
> [ 4903.442278] RIP: 0010:[<ffffffff9ce59427>] [<ffffffff9ce59427>] __check_object_size+0x87/0x250
> [ 4903.445370] RSP: 0018:ffff8c14f8a97b58 EFLAGS: 00010246
> [ 4903.447547] RAX: 0000000000000062 RBX: ffff8c1497c92200 RCX: 0000000000000000
> [ 4903.450108] RDX: 0000000000000000 RSI: ffff8c167fc138d8 RDI: ffff8c167fc138d8
> [ 4903.452753] RBP: ffff8c14f8a97b78 R08: 0000000000000004 R09: ffff8c166d14af00
> [ 4903.455451] R10: 0000000000000080 R11: ffff97b6016ffff8 R12: 0000000000000400
> [ 4903.458142] R13: 0000000000000001 R14: ffff8c1497c92600 R15: 0000000000000400
> [ 4903.460759] FS: 00007fd59b2b7740(0000) GS:ffff8c167fc00000(0000) knlGS:0000000000000000
> [ 4903.463776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4903.466111] CR2: 0000000002701b58 CR3: 00000000b8a98000 CR4: 00000000000006f0
> [ 4903.468860] Call Trace:
> [ 4903.470483] [<ffffffff9cfb310d>] memcpy_toiovec+0x4d/0xb0
> [ 4903.472780] [<ffffffff9d2589e8>] skb_copy_datagram_iovec+0x128/0x280
> [ 4903.475388] [<ffffffff9d2c0646>] tcp_recvmsg+0x246/0xbb0
> [ 4903.477884] [<ffffffff9cdd42cd>] ? __alloc_pages_nodemask+0x26d/0x610
> [ 4903.480432] [<ffffffff9d2ef1d0>] inet_recvmsg+0x80/0xb0
> [ 4903.482718] [<ffffffff9d2467bc>] sock_aio_read.part.12+0x14c/0x170
> [ 4903.485255] [<ffffffff9d246801>] sock_aio_read+0x21/0x30
> [ 4903.487495] [<ffffffff9ce5edd6>] do_sync_read+0x96/0xe0
> [ 4903.489710] [<ffffffff9ce5f8b5>] vfs_read+0x145/0x170
> [ 4903.491975] [<ffffffff9ce606cf>] SyS_read+0x7f/0xf0
> [ 4903.494179] [<ffffffff9d3a4de1>] ? system_call_after_swapgs+0xae/0x146
> [ 4903.496684] [<ffffffff9d3a4e9b>] system_call_fastpath+0x22/0x27
> [ 4903.499143] [<ffffffff9d3a4de1>] ? system_call_after_swapgs+0xae/0x146
> [ 4903.501748] Code: 45 d1 48 c7 c6 f8 c3 68 9d 48 c7 c1 93 5e 69 9d 48 0f 45 f1 49 89 c0 4d 89 e1 48 89 d9 48 c7 c7 68 2b 69 9d 31 c0 e8 4f 1e 53 00 <0f> 0b 0f 1f 80 00 00 00 00 48 c7 c0 00 00 c0 9c 4c 39 f0 73 0d
> [ 4903.510508] RIP [<ffffffff9ce59427>] __check_object_size+0x87/0x250
> [ 4903.513149] RSP <ffff8c14f8a97b58>
>
Powered by blists - more mailing lists