netdev - Re: [PATCH v2] xmit_compl_seq: information to reclaim vmsplice buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1285105131.6378.19.camel@edumazet-laptop>
Date:	Tue, 21 Sep 2010 23:38:51 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	netdev@...r.kernel.org, davem@...emloft.net, sridharr@...gle.com
Subject: Re: [PATCH v2] xmit_compl_seq: information to reclaim vmsplice
 buffers

Le mardi 21 septembre 2010 à 11:57 -0700, Tom Herbert a écrit :
> In this patch we propose to adds some socket API to retrieve the
>  "transmit completion sequence number", essentially a byte counter
> for the number of bytes that have been transmitted and will not be
> retransmitted.  In the case of TCP, this should correspond to snd_una.
> 
> The purpose of this API is to provide information to userspace about
> which buffers can be reclaimed when sending with vmsplice() on a
> socket.
> 
> There are two methods for retrieving the completed sequence number:
> through a simple getsockopt (implemented here for TCP), as well as
> returning the value in the ancilary data of a recvmsg.
> 
> The expected flow would be something like:
>    - Connect is created
>    - Initial completion seq # is retrieved through the sockopt, and is
>      stored in userspace "compl_seq" variable for the connection.
>    - Whenever a send is done, compl_seq += # bytes sent.
>    - When doing a vmsplice the completion sequence number is saved
>      for each user space buffer, buffer_compl_seq = compl_seq.
>    - When recvmsg returns with a completion sequence number in
>      ancillary data, any buffers cover by that sequence number
>      (where buffer_compl_seq < recvmsg_compl_seq) are reclaimed
>      and can be written to again.
>    - If no data is receieved on a connection (recvmsg does not
>      return), a timeout can be used to call the getsockopt and
>      reclaim buffers as a fallback.
> 
> Using recvmsg data in this manner is sort of a cheap way to get a
> "callback" for when a vmspliced buffer is consumed.  It will work
> well for a client where the response causes recvmsg to return.
> On the server side it works well if there are a sufficient
> number of requests coming on the connection (resorting to the
> timeout if necessary as described above).
> 
> Signed-off-by: Tom Herbert <therbert@...gle.com>


> + * Copy the first unacked seq into the receive msg control part.
> + */
> +static inline void tcp_sock_xmit_compl_seq(struct msghdr *msg,
> +					   struct sock *sk)
> +{
> +	if (sock_flag(sk, SOCK_XMIT_COMPL_SEQ)) {
> +		struct tcp_sock *tp = tcp_sk(sk);
> +		if (msg->msg_controllen >= sizeof(tp->snd_una)) {
> +			put_cmsg(msg, SOL_SOCKET, SCM_XMIT_COMPL_SEQ,
> +			    sizeof(tp->snd_una), &tp->snd_una);
> +		}
> +	}
> +}

I am wondering if this part could be done outside of socket lock,
provided you latch tp->snd_una value right before release_sock();

u32 snd_una;
...
tcp_cleanup_rbuf(sk, copied);
TCP_CHECK_TIMER(sk);
snd_una = tp->snd_una;
release_sock(sk);
tcp_sock_xmit_compl_seq(msg, sk, snd_una);
return copied;



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html