lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=WF2hAGAufv_Anc=b=Fm2WOpOMOv1UrDRvaTHp@mail.gmail.com>
Date:	Fri, 19 Nov 2010 12:04:31 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	Linux Netdev List <netdev@...r.kernel.org>
Subject: Generalizing mmap'ed sockets

This is a project I'm contemplating.  If you have any comments or can
point me to prior work in this area that would be appreciated.

It seems like should be fairly straight forward to extend the mmap
packet ring mechanisms to be used for arbitrary sockets (like TCP,
UDP, etc.). The idea is that we create a ring buffer for a socket
which is mmap'ed to share between user and kernel.  This can be done
for both transmit and receive side, and is basically modeled as a
consumer/producer queue.  There are semantic differences between
stream and datagram sockets that need to be considered, but I don't
think anything here is untenable.

The expected benefits of this are:

TX:
 - Zero copy transmit (which is already supported by vmsplice(), but
this might be simpler)
 - One system call needed on transmit which can cover multiple
datagrams or what would have been multiple writes (the call is just to
kick kernel to start sending)

RX:
 - Zero system calls needed to do receive (determining data ready is
accomplished by polling)
 - Immediate data placement in kernel available all the time,
including OOO placement
 - Potential for true zero copy on receive with device support (like
per flow queues, UDP queues)

The userland use of this for TCP might look something like:

struct mmap_sock_hdr {,
   __u32 prod_ptr;
   __u32 consumer_ptr;
};

int s;
struct mmap_sock_hdr *tx, *rx;
void *tx_base, *rx_base;

struct s_mmap_req {
   size_t size;
} mmap_req;

s = socket(AF_INET, SOCKET_STREAM, 0);

/* Set up ring buffer on socket and mmap into user space for TX */
size = 1 >> 19 - sizeof (struct mmap_sock_hdr);
mmap_req.size  = size;
setsockopt(s, SOL_SOCKET, TX_RING, (char *)&mmap_req,
sizeof(s_mmap_req));
tx = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, s, 0);
tx_base = (void *)tx[1];

/* Now do same thing for RX */
size = 1 >> 19 - sizeof (struct mmap_sock_hdr);
mmap_req.size  = size;
setsockopt(s, SOL_SOCKET, RX_RING, (char *)&mmap_req,
sizeof(s_mmap_req));
rx = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, s, 0);
rx_base = (void *)rx[1];

bind(s, ...) /* Normal bind */
connect(s, ...) /* Normal connect */

/* Transmit */

/* Application fills some of the available buffer (up to consumer pointer) */
for (i = 0; i < 10000; i++)
   tx_base[prod_ptr + i] = i % 256;

/* Advance producer pointer */
prod_ptr += 10000;

send(s, NULL, 0); /* Tells stack to send new data indicated by prod
pointer, just a trigger */

/* Polling for POLLOUT should work as expected */

/*********** Receive */

while (1) {
   poll(fds);
   if (s has POLLIN set) {
       Process data from rx_base[rx->consume_ptr] to
rx_base[rx->prod_ptr], modulo size of buffer of course
       rx->consume_ptr = rx->prod_ptr;    /* Gives back buffer space
to the kernel */
  }
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ