netdev - Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALx6S37+Zj-OmApDHzb2hq0jSLsbat4M9Xv6ASzB5VGP9HrQqw@mail.gmail.com>
Date:	Fri, 11 Mar 2016 19:21:26 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:	Stephen Hemminger <stephen@...workplumber.org>,
	"David S. Miller" <davem@...emloft.net>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] rds-tcp: Add module parameters to control
 sndbuf/rcvbuf size of RDS-TCP socket

On Fri, Mar 11, 2016 at 6:43 PM, Sowmini Varadhan
<sowmini.varadhan@...cle.com> wrote:
> On (03/11/16 11:09), Stephen Hemminger wrote:
>> Module parameters are a problem for distributions and should only be used
>> as a last resort.
>
> I dont know the history of what the distibution problem is, but I
> did try to use sysctl as an alternative for this. I'm starting to
> believe that this is one case where module params, with all their
> problems, are the least evil option. Here's what I find if I use sysctl:
>
> - being able to tune the sndbuf and rcvbuf actually gives me a noticeable
>   2X perf improvement over the default for DB/Cluster request/response
>   transactions, where the classic req size is 8K bytes, response is 256
>   bytes, and there are a large number of such concurrent transactions
>   queued up on the kernel tcp socket. (The defaults work well for
>   larger packet sizes, but as I noted in the commit, packet sizes vary
>   in actual deployment).
>
> Assuming I use sysctl:
>
> - by the time the admin gets to execute the sysctl, the kernel listen
>   socket for RDS_TCP_PORT would already have been created, and an
>   arbitrary number of accept/connect (kernel) endpoints may have been
>   created.  According to tcp(7), you should be setting the buf sizes before
>   connect/listen. So using sysctl means you have to reset a variable
>   number of existing cluster connections. All this can be done, but
>   adds a large mass of confusing code to reset kernel sockets and
>   also get the cluster HA/failover right.
>
> - at first I thought sysctl was attractive because it was netns aware,
>   but found that it is  only superficially so. The ->proc_handler does
>   not pass in the struct net *, and setting up the ctl_table's ->data
>   to a per-net var is not simple thing to do. Thus, even though
>   register_net_sysctl() takes a net * pointer, my handler has to do
>   extra ugly things to get to per-net vars.
>
> I dont know if there is a better alternative than sysctl/module_param
> for achieving what I'm trying to do, which is to set up the params
> for the kernel sockets before creating them. If yes, some
> hints/rtfms would be helpful.
>
You could consider and alternate model where connection management
(connect, listen, etc.) is performed in a userspace daemon and the
attached to the kernel (pretty much like KCM does). This moves
complexity out of kernel, and gives you the flexibility to configure
arbitrarily configure TCP sockets (like the 4 setsockopts in
rds_tcp_keepalive could be configurable). The other cool thing this
would allow is switching an existing TCP connection into RDS mode
(like is done with several protocols on an http port).

Tom

> --Sowmini
>