netdev - Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20150114.153509.1264618607573705890.davem@davemloft.net>
Date:	Wed, 14 Jan 2015 15:35:09 -0500 (EST)
From:	David Miller <davem@...emloft.net>
To:	john.fastabend@...il.com
Cc:	netdev@...r.kernel.org, danny.zhou@...el.com,
	nhorman@...driver.com, dborkman@...hat.com, john.ronciak@...el.com,
	hannes@...essinduktion.org, brouer@...hat.com
Subject: Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring
 access in user space

From: John Fastabend <john.fastabend@...il.com>
Date: Mon, 12 Jan 2015 20:35:11 -0800

> +		if ((region.direction != DMA_BIDIRECTIONAL) &&
> +		    (region.direction != DMA_TO_DEVICE) &&
> +		    (region.direction != DMA_FROM_DEVICE))
> +			return -EFAULT;
 ...
> +		if ((umem->nmap == npages) &&
> +		    (0 != dma_map_sg(dev->dev.parent, umem->sglist,
> +				     umem->nmap, region.direction))) {
> +			region.iova = sg_dma_address(umem->sglist) + offset;

I am having trouble seeing how this can work.

dma_map_{single,sg}() mappings need synchronization after a DMA
transfer takes place.

For example if the DMA occurs to the device, then that region can
be cached in the PCI controller's internal caches and thus future
cpu writes into that memory region will not be seen, until a
dma_sync_*() is invoked.

That isn't going to happen when the device transmit queue is
being completely managed in userspace.

And this takes us back to the issue of protection, I don't think
it is addressed properly yet.

CAP_NET_ADMIN privileges do not mean "can crap all over memory"
yet with this feature that can still happen.

If we are dealing with a device which cannot provide strict protection
to only the process's locked local pages, you have to do something
to implement that protection.

And you have _exactly_ one option to do that, abstracting the page
addresses and eating a system call to trigger the sends, so that you
can read from the user's (fake) descriptors and write into the real
descriptors (translating the DMA addresses along the way) and
triggering the TX doorbell.

I am not going to consider seriously an implementation that says "yeah
sometimes the user can crap onto other people's memory", this isn't
MS-DOS, it's a system where proper memory protections are mandatory
rather than optional.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html