lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+vYp22bUjfB21V_Rqmo9ZAKAmJxyarFWngq=cCmBQUrQ@mail.gmail.com>
Date: Wed, 24 Jul 2024 11:59:49 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: jiang.kun2@....com.cn
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, corbet@....net, 
	dsahern@...nel.org, netdev@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, fan.yu9@....com.cn, xu.xin16@....com.cn, 
	zhang.yunkai@....com.cn, tu.qiang35@....com.cn, he.peilin@....com.cn, 
	yang.yang29@....com.cn
Subject: Re: [PATCH] net: Provide sysctl to tune local port range to IANA specification

On Wed, Jul 24, 2024 at 8:04 AM <jiang.kun2@....com.cn> wrote:
>
> From: Fan Yu <fan.yu9@....com.cn>
>
> The Importance of Following IANA Standards
> ========================================
> IANA specifies User ports as 1024-49151, and it just so happens
> that my application uses port 33060 (reserved for MySQL Database Extended),
> which conflicts with the Linux default dynamic port range (32768-60999)[1].
>
> In fact, IANA assigns numbers in port range from 32768 to 49151,
> which is uniformly accepted by the industry. To do this,
> it is necessary for the kernel to follow the IANA specification.
>
> Drawbacks of existing implementations
> ========================================
> In past discussions, follow the IANA specification by modifying the
> system defaults has been discouraged, which would greatly affect
> existing users[2].
>
> Theoretically, this can be done by tuning net.ipv4.local_port_range,
> but there are inconveniences such as:
> (1) For cloud-native scenarios, each container is expected to follow
> the IANA specification uniformly, so it is necessary to do sysctl
> configuration in each container individually, which increases the user's
> resource management costs.
> (2) For new applications, since sysctl(net.ipv4.local_port_range) is
> isolated across namespaces, the container cannot inherit the host's value,
> so after startup, it remains at the kernel default value of 32768-60999,
> which reduces the ease of use of the system.
>
> Solution
> ========================================
> In order to maintain compatibility, we provide a sysctl interface in
> host namespace, which makes it easy to tune local port range to
> IANA specification.
>
> When ip_local_port_range_use_iana=1, the local port range of all network
> namespaces is tuned to IANA specification (49152-60999), and IANA
> specification is also used for newly created network namespaces. Therefore,
> each container does not need to do sysctl settings separately, which
> improves the convenience of configuration.
> When ip_local_port_range_use_iana=0, the local port range of all network
> namespaces are tuned to the original kernel defaults (32768-60999).
> For example:
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         32768   60999
>         # echo 1 > /proc/sys/net/ipv4/ip_local_port_range_use_iana
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         49152   60999
>
>         # unshare -n
>         # cat /proc/sys/net/ipv4/ip_local_port_range
>         49152   60999
>
> Notes
> ========================================
> The lower value(49152), consistent with IANA dynamic port lower limit.
> The upper limit value(60999), which differs from the IANA dynamic upper
> limit due to the fact that Linux will use 61000-65535 as masquarading/NAT,
> but this does not conflict with the IANA specification[3].
>
> Note that following the above specification reduces the number of ephemeral
> ports by half, increasing the risk of port exhaustion[2].
>
> [1]:https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
> [2]:https://lore.kernel.org/all/bf42f6fd-cd06-02d6-d7b6-233a0602c437@gmail.com/
> [3]:https://lore.kernel.org/all/20070512210830.514c7709@the-village.bc.nu/
>
> Co-developed-by: Kun Jiang <jiang.kun2@....com.cn>
> Signed-off-by: Fan Yu <fan.yu9@....com.cn>
> Signed-off-by: Kun Jiang <jiang.kun2@....com.cn>
> Reviewed-by: xu xin <xu.xin16@....com.cn>
> Reviewed-by: Yunkai Zhang <zhang.yunkai@....com.cn>
> Reviewed-by: Qiang Tu <tu.qiang35@....com.cn>
> Reviewed-by: Peilin He<he.peilin@....com.cn>
> Cc: Yang Yang <yang.yang29@....com.cn>
> ---
>  Documentation/networking/ip-sysctl.rst | 13 +++++++++++++
>  net/ipv4/af_inet.c                     |  7 ++++++-
>  net/ipv4/sysctl_net_ipv4.c             | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index bd50df6a5a42..27f4928c2a1d 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1320,6 +1320,19 @@ ip_local_port_range - 2 INTEGERS
>         Must be greater than or equal to ip_unprivileged_port_start.
>         The default values are 32768 and 60999 respectively.
>
> +ip_local_port_range_use_iana - BOOLEAN
> +       Tune ip_local_port_range to IANA specification easily.
> +       When ip_local_port_range_use_iana=1, the local port range of
> +       all network namespaces is tuned to IANA specification (49152-60999),
> +       and IANA specification is also used for newly created network namespaces.
> +       Therefore, each container does not need to do sysctl settings separately,
> +       which improves the convenience of configuration.
> +       When ip_local_port_range_use_iana=0, the local port range of
> +       all network namespaces are tuned to the original kernel
> +       defaults (32768-60999).
> +

IANA means : Internet Assigned Numbers Authority

It is very possible a future RFC changes the actual ranges.

I would have used rfc 6335, because when a new rfc comes in 2030, we
will have to add a new sysctl, right ?

> +       Default: 0
> +
>  ip_local_reserved_ports - list of comma separated ranges
>         Specify the ports which are reserved for known third-party
>         applications. These ports will not be used by automatic port
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b24d74616637..42b6bc58dc45 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -123,6 +123,8 @@
>
>  #include <trace/events/sock.h>
>
> +extern u8 sysctl_ip_local_port_range_use_iana;
> +
>  /* The inetsw table contains everything that inet_create needs to
>   * build a new socket.
>   */
> @@ -1802,7 +1804,10 @@ static __net_init int inet_init_net(struct net *net)
>         /*
>          * Set defaults for local port range
>          */
> -       net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;
> +       if (sysctl_ip_local_port_range_use_iana)
> +               net->ipv4.ip_local_ports.range = 60999u << 16 | 49152u;
> +       else
> +               net->ipv4.ip_local_ports.range = 60999u << 16 | 32768u;
>
>         seqlock_init(&net->ipv4.ping_group_range.lock);
>         /*
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 162a0a3b6ba5..a38447889072 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -45,6 +45,8 @@ static unsigned int tcp_child_ehash_entries_max = 16 * 1024 * 1024;
>  static unsigned int udp_child_hash_entries_max = UDP_HTABLE_SIZE_MAX;
>  static int tcp_plb_max_rounds = 31;
>  static int tcp_plb_max_cong_thresh = 256;
> +u8 sysctl_ip_local_port_range_use_iana;
> +EXPORT_SYMBOL(sysctl_ip_local_port_range_use_iana);
>
>  /* obsolete */
>  static int sysctl_tcp_low_latency __read_mostly;
> @@ -95,6 +97,26 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
>         return ret;
>  }
>
> +static int ipv4_local_port_range_use_iana(struct ctl_table *table, int write,
> +                                         void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct net *net;
> +       int ret;
> +
> +       ret = proc_dou8vec_minmax(table, write, buffer, lenp, ppos);
> +
> +       if (write && ret == 0) {
> +               for_each_net(net) {

This is quite buggy.

for_each_net() can only be used with care, otherwise list can be
corrupted, netns can disappear under you.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ