netdev - RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F2E9EB7348B8264F86B6AB8151CE2D79280489AA05@shsmsx502.ccr.corp.intel.com>
Date:	Sat, 11 Sep 2010 15:41:14 +0800
From:	"Xin, Xiaohui" <xiaohui.xin@...el.com>
To:	"Xin, Xiaohui" <xiaohui.xin@...el.com>,
	"Michael S. Tsirkin" <mst@...hat.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"herbert@...dor.hengli.com.au" <herbert@...dor.hengli.com.au>,
	"jdike@...ux.intel.com" <jdike@...ux.intel.com>
Subject: RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

>>Playing with rlimit on data path, transparently to the application in this way
>>looks strange to me, I suspect this has unexpected security implications.
>>Further, applications may have other uses for locked memory
>>besides mpassthru - you should not just take it because it's there.
>>
>>Can we have an ioctl that lets userspace configure how much
>>memory to lock? This ioctl will decrement the rlimit and store
>>the data in the device structure so we can do accounting
>>internally. Put it back on close or on another ioctl.
>Yes, we can decrement the rlimit in ioctl in one time to avoid
>data path.
>
>>Need to be careful for when this operation gets called
>>again with 0 or another small value while we have locked memory -
>>maybe just fail with EBUSY?  or wait until it gets unlocked?
>>Maybe 0 can be special-cased and deactivate zero-copy?.
>>

How about we don't use a new ioctl, but just check the rlimit 
in one MPASSTHRU_BINDDEV ioctl? If we find mp device
break the rlimit, then we fail the bind ioctl, and thus can't do 
zero copy any more.

>In fact, if we choose RLIMIT_MEMLOCK to limit the lock memory,
>the default value is only 16 pages. It's too small to make the device to
>work. So we always to configure it with a large value.
>I think, if rlimit value after decrement is < 0, then deactivate zero-copy
>is better. 0 maybe ok.
>

>>> +
>>> +	if (ctor->lock_pages + count > lock_limit && npages) {
>>> +		printk(KERN_INFO "exceed the locked memory rlimit.");
>>> +		return NULL;
>>> +	}
>>> +
>>> +	info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL);
>>
>>You seem to fill in all memory, why zalloc? this is data path ...
>
>Ok, Let me check this.
>
>>
>>> +
>>> +	if (!info)
>>> +		return NULL;
>>> +
>>> +	for (i = j = 0; i < count; i++) {
>>> +		base = (unsigned long)iov[i].iov_base;
>>> +		len = iov[i].iov_len;
>>> +
>>> +		if (!len)
>>> +			continue;
>>> +		n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
>>> +
>>> +		rc = get_user_pages_fast(base, n, npages ? 1 : 0,
>>
>>npages controls whether this is a write? Why?
>
>We use npages as a flag here. In mp_sendmsg(), we called alloc_page_info() with npages =
>0.
>
>>
>>> +				&info->pages[j]);
>>> +		if (rc != n)
>>> +			goto failed;
>>> +
>>> +		while (n--) {
>>> +			frags[j].offset = base & ~PAGE_MASK;
>>> +			frags[j].size = min_t(int, len,
>>> +					PAGE_SIZE - frags[j].offset);
>>> +			len -= frags[j].size;
>>> +			base += frags[j].size;
>>> +			j++;
>>> +		}
>>> +	}
>>> +
>>> +#ifdef CONFIG_HIGHMEM
>>> +	if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
>>> +		for (i = 0; i < j; i++) {
>>> +			if (PageHighMem(info->pages[i]))
>>> +				goto failed;
>>> +		}
>>> +	}
>>> +#endif
>>
>>Are non-highdma devices worth bothering with? If yes -
>>are there other limitations devices might have that we need to handle?
>>E.g. what about non-s/g devices or no checksum offloading?.
>
>Basically I think there is no limitations for both, but let me have a check.
>
>>
>>> +		skb_push(skb, ETH_HLEN);
>>> +
>>> +		if (skb_is_gso(skb)) {
>>> +			hdr.hdr.hdr_len = skb_headlen(skb);
>>> +			hdr.hdr.gso_size = shinfo->gso_size;
>>> +			if (shinfo->gso_type & SKB_GSO_TCPV4)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>>> +			else if (shinfo->gso_type & SKB_GSO_TCPV6)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>>> +			else if (shinfo->gso_type & SKB_GSO_UDP)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
>>> +			else
>>> +				BUG();
>>> +			if (shinfo->gso_type & SKB_GSO_TCP_ECN)
>>> +				hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>>> +
>>> +		} else
>>> +			hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
>>> +
>>> +		if (skb->ip_summed == CHECKSUM_PARTIAL) {
>>> +			hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>>> +			hdr.hdr.csum_start =
>>> +				skb->csum_start - skb_headroom(skb);
>>> +			hdr.hdr.csum_offset = skb->csum_offset;
>>> +		}
>>
>>We have this code in tun, macvtap and packet socket already.
>>Could this be a good time to move these into networking core?
>>I'm not asking you to do this right now, but could this generic
>>virtio-net to skb stuff be encapsulated in functions?
>
>It seems reasonable.
>
>>
>>--
>>MST
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@...r.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html