lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99d458640912022013l473690dax3f497248072cce0d@mail.gmail.com>
Date:	Wed, 2 Dec 2009 20:13:24 -0800
From:	kapil dakhane <kdakhane@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: soft lockup in inet_csk_get_port

>
> Hmm, I did an one hour audit and could not yet find the bug.
>
> Is it a reproductible error, and any chance I can have a snapshot of
> "netstat -atn" before the lockup ? (maybe privately, since it might be
> too big for netdev)
>
It happens every time. I am working on it will send it to you very soon.

> What is the 'fast' program, is it freely available somewhere ?
>
Its some thing we cooked up in house to test linux network stack
scalability. Its a single threaded transparent proxy, it uses libevent
(which uses epoll), and forwards data from incoming transparently
captured connection, to its original destination. It does this without
doing any copy of data once its read from socket. Clients simply send
data to content server's ip-address, without being aware of the proxy
in the middle, and content-servers see clients ip-address as the
source ip. mpstat output shows that it has very little overhead in
terms of user-space cpu.

Let me know what part of the program you are interested in, and I can
give you the code snippets. IMO, here's what you should know:
sysctl options:
net.ipv4.ip_local_port_range = 1024 65535
# increase TCP max buffer size setable using setsockopt()
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
# set max to at least 4MB, or higher if you use very high BDP paths
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 180
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_max_syn_backlog = 8192
error: "net.ipv4.netfilter.ip_conntrack_max" is an unknown key
error: "net.ipv4.ip_conntrack_max" is an unknown key
net.ipv4.tcp_max_tw_buckets = 360000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 0
net.core.netdev_max_backlog = 5000

/etc/security/limits.conf :
* - nofile 262144



Listen sockets options:
    176   if ((sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP))
...
    106   if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one))
    114   if (setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &one, sizeof(one))
    122   if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one))
    131     if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val))
    141     if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val))

  int flags = fcntl(fd, F_GETFL, 0);
  if (flags >= 0 && fcntl(fd, F_SETFL, flags | O_NONBLOCK)

    162     if (setsockopt(sock, SOL_IP, IP_TRANSPARENT, &one, sizeof(one))
...
    190   if (bind(sock, (struct sockaddr *)&sain, sizeof(sain))
//listen port == 4002
    202   if (listen(sock, config.listen_backlog) == -1) { //backlog == 1024
Client socket options:
  int flags = fcntl(fd, F_GETFL, 0);
  if (flags >= 0 && fcntl(fd, F_SETFL, flags | O_NONBLOCK)

Outbound connections:
  if (unlikely((conn->ssock = socket(AF_INET, SOCK_STREAM,  IPPROTO_TCP))
  if (unlikely(setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &one, sizeof(one))
  if (unlikely(setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one))
  if (unlikely(setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val))
  if (unlikely(setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val))
  int flags = fcntl(fd, F_GETFL, 0);
//O_NONBLOCK
  if (flags >= 0 && fcntl(fd, F_SETFL, flags | O_NONBLOCK) == 0)
   if (setsockopt(sock, SOL_IP, IP_TRANSPARENT, &one, sizeof(one))
//masquerade client
  local_sain.sin_family = AF_INET;
  local_sain.sin_addr.s_addr = tctx->client_sain->sin_addr.s_addr;
  local_sain.sin_port = tctx->client_sain->sin_port;
  int attempts = 0;
  again:
  attempts++;
  UPDATE_MAX_STAT(server.peak_masqattempts, attempts);
  if (unlikely(bind(*(tctx->ssock), (struct sockaddr *)&local_sain,
                    sizeof(local_sain)) == -1)) {
    if (errno == EADDRINUSE && attempts < config.max_masq_attempts) {
      local_sain.sin_port = htons(generate_random_port());
      goto again;
    }
  if (unlikely(connect(conn->ssock, (struct sockaddr *)&conn->server_sain,
                       sizeof(conn->server_sain))

// Get the local address
  socklen_t sl = sizeof(conn->sproxy_sain);
  if (unlikely(getsockname(conn->ssock, (struct sockaddr *)&conn->sproxy_sain,
                           &sl) == -1)) {
    LOG_ERROR(NULL, "Unable to get the local address for server connection "
              "(%d): %s.\n", errno, strerror(errno));
    return -1;
  }


Network setup:
Clients and servers are segregated into several vlans:
ifcfg-eth1.147  ifcfg-eth2.143  ifcfg-eth2.43  ifcfg-eth4.145
ifcfg-eth4.45  ifcfg-eth5.153  ifcfg-eth6.139  ifcfg-eth6.39
ifcfg-eth7.141  ifcfg-eth7.41
ifcfg-eth1.148  ifcfg-eth2.144  ifcfg-eth2.44  ifcfg-eth4.146
ifcfg-eth4.46  ifcfg-eth5.154  ifcfg-eth6.140  ifcfg-eth6.40
ifcfg-eth7.142  ifcfg-eth7.42
ifcfg-eth1.47   ifcfg-eth2.149  ifcfg-eth2.49  ifcfg-eth4.151
ifcfg-eth4.51  ifcfg-eth5.53   ifcfg-eth6.155  ifcfg-eth6.55
ifcfg-eth7.157  ifcfg-eth7.57
ifcfg-eth1.48   ifcfg-eth2.150  ifcfg-eth2.50  ifcfg-eth4.152
ifcfg-eth4.52  ifcfg-eth5.54   ifcfg-eth6.156  ifcfg-eth6.56
ifcfg-eth7.158  ifcfg-eth7.58

Traffic comes in through ifcfg-ethx.xx and goes out via ifcfg-ethx.1xx
on the same NIC. Reverse traffic goes from 1xx to xx.

iptables and ip rules are set like so:
Printing MANGLE rules...
Chain PREROUTING (policy ACCEPT 20M packets, 8970M bytes)
 pkts bytes target     prot opt in     out     source
destination
 681M  586G DIVERT     tcp  --  *      *       0.0.0.0/0
0.0.0.0/0           socket
1290K   76M TPROXY     tcp  --  eth6.39 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.39.1:4002 mark
0x1/0x1
1902K  112M TPROXY     tcp  --  eth6.40 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.40.1:4002 mark
0x1/0x1
1348K   79M TPROXY     tcp  --  eth7.41 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.41.1:4002 mark
0x1/0x1
 867K   51M TPROXY     tcp  --  eth7.42 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.42.1:4002 mark
0x1/0x1
 850K   49M TPROXY     tcp  --  eth2.43 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.43.1:4002 mark
0x1/0x1
 847K   49M TPROXY     tcp  --  eth2.44 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.44.1:4002 mark
0x1/0x1
1334K   78M TPROXY     tcp  --  eth4.45 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.45.1:4002 mark
0x1/0x1
 858K   50M TPROXY     tcp  --  eth4.46 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.46.1:4002 mark
0x1/0x1
 818K   46M TPROXY     tcp  --  eth1.47 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.47.1:4002 mark
0x1/0x1
 807K   45M TPROXY     tcp  --  eth1.48 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.48.1:4002 mark
0x1/0x1
 804K   45M TPROXY     tcp  --  eth2.49 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.49.1:4002 mark
0x1/0x1
 906K   51M TPROXY     tcp  --  eth2.50 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.50.1:4002 mark
0x1/0x1
1569K   89M TPROXY     tcp  --  eth4.51 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.51.1:4002 mark
0x1/0x1
1851K  105M TPROXY     tcp  --  eth4.52 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.52.1:4002 mark
0x1/0x1
1798K  101M TPROXY     tcp  --  eth5.53 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.53.1:4002 mark
0x1/0x1
1829K  103M TPROXY     tcp  --  eth5.54 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.54.1:4002 mark
0x1/0x1
1752K   99M TPROXY     tcp  --  eth6.55 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.55.1:4002 mark
0x1/0x1
1777K  101M TPROXY     tcp  --  eth6.56 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.56.1:4002 mark
0x1/0x1
1800K  102M TPROXY     tcp  --  eth7.57 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.57.1:4002 mark
0x1/0x1
1794K  101M TPROXY     tcp  --  eth7.58 *       0.0.0.0/0
0.0.0.0/0           tcp dpt:80 TPROXY redirect 192.168.58.1:4002 mark
0x1/0x1

Chain INPUT (policy ACCEPT 708M packets, 588G bytes)
 pkts bytes target     prot opt in     out     source
destination

Chain FORWARD (policy ACCEPT 20M packets, 8970M bytes)
 pkts bytes target     prot opt in     out     source
destination

Chain OUTPUT (policy ACCEPT 713M packets, 593G bytes)
 pkts bytes target     prot opt in     out     source
destination

Chain POSTROUTING (policy ACCEPT 733M packets, 602G bytes)
 pkts bytes target     prot opt in     out     source
destination

Chain DIVERT (1 references)
 pkts bytes target     prot opt in     out     source
destination
 681M  586G MARK       all  --  *      *       0.0.0.0/0
0.0.0.0/0           MARK xset 0x1/0xffffffff
 681M  586G ACCEPT     all  --  *      *       0.0.0.0/0
0.0.0.0/0

Printing IP rules...
0:      from all lookup 255
32765:  from all fwmark 0x1 lookup 100
32766:  from all lookup main
32767:  from all lookup default


Each fast proxy handles traffic from two client-facing vlans.

Hope this helps.

Regards,
Kapil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ