lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 07 Jun 2013 14:41:11 +0200
From:	Mike Galbraith <bitbucket@...ine.de>
To:	"Vitaly V. Bursov" <vitalyb@...enet.dn.ua>
Cc:	linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>
Subject: Re: Scaling problem with a lot of AF_PACKET sockets on different
 interfaces

(CC's net-fu dojo) 

On Fri, 2013-06-07 at 14:56 +0300, Vitaly V. Bursov wrote: 
> Hello,
> 
> I have a Linux router with a lot of interfaces (hundreds or
> thousands of VLANs) and an application that creates AF_PACKET
> socket per interface and bind()s sockets to interfaces.
> 
> Each socket has attached BPF filter too.
> 
> The problem is observed on linux-3.8.13, but as far I can see
> from the source the latest version has alike behavior.
> 
> I noticed that box has strange performance problems with
> most of the CPU time spent in __netif_receive_skb:
>   86.15%  [k] __netif_receive_skb
>    1.41%  [k] _raw_spin_lock
>    1.09%  [k] fib_table_lookup
>    0.99%  [k] local_bh_enable_ip
> 
> and this the assembly with the "hot spot":
>         │       shr    $0x8,%r15w
>         │       and    $0xf,%r15d
>    0.00 │       shl    $0x4,%r15
>         │       add    $0xffffffff8165ec80,%r15
>         │       mov    (%r15),%rax
>    0.09 │       mov    %rax,0x28(%rsp)
>         │       mov    0x28(%rsp),%rbp
>    0.01 │       sub    $0x28,%rbp
>         │       jmp    5c7
>    1.72 │5b0:   mov    0x28(%rbp),%rax
>    0.05 │       mov    0x18(%rsp),%rbx
>    0.00 │       mov    %rax,0x28(%rsp)
>    0.03 │       mov    0x28(%rsp),%rbp
>    5.67 │       sub    $0x28,%rbp
>    1.71 │5c7:   lea    0x28(%rbp),%rax
>    1.73 │       cmp    %r15,%rax
>         │       je     640
>    1.74 │       cmp    %r14w,0x0(%rbp)
>         │       jne    5b0
>   81.36 │       mov    0x8(%rbp),%rax
>    2.74 │       cmp    %rax,%r8
>         │       je     5eb
>    1.37 │       cmp    0x20(%rbx),%rax
>         │       je     5eb
>    1.39 │       cmp    %r13,%rax
>         │       jne    5b0
>    0.04 │5eb:   test   %r12,%r12
>    0.04 │       je     6f4
>         │       mov    0xc0(%rbx),%eax
>         │       mov    0xc8(%rbx),%rdx
>         │       testb  $0x8,0x1(%rdx,%rax,1)
>         │       jne    6d5
> 
> This corresponds to:
> 
> net/core/dev.c:
>          type = skb->protocol;
>          list_for_each_entry_rcu(ptype,
>                          &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
>                  if (ptype->type == type &&
>                      (ptype->dev == null_or_dev || ptype->dev == skb->dev ||
>                       ptype->dev == orig_dev)) {
>                          if (pt_prev)
>                                  ret = deliver_skb(skb, pt_prev, orig_dev);
>                          pt_prev = ptype;
>                  }
>          }
> 
> Which works perfectly OK until there are a lot of AF_PACKET sockets, since
> the socket adds a protocol to ptype list:
> 
> # cat /proc/net/ptype
> Type Device      Function
> 0800 eth2.1989 packet_rcv+0x0/0x400
> 0800 eth2.1987 packet_rcv+0x0/0x400
> 0800 eth2.1986 packet_rcv+0x0/0x400
> 0800 eth2.1990 packet_rcv+0x0/0x400
> 0800 eth2.1995 packet_rcv+0x0/0x400
> 0800 eth2.1997 packet_rcv+0x0/0x400
> .......
> 0800 eth2.1004 packet_rcv+0x0/0x400
> 0800          ip_rcv+0x0/0x310
> 0011          llc_rcv+0x0/0x3a0
> 0004          llc_rcv+0x0/0x3a0
> 0806          arp_rcv+0x0/0x150
> 
> And this obviously results in a huge performance penalty.
> 
> ptype_all, by the looks, should be the same.
> 
> Probably one way to fix this it to perform interface name matching in
> af_packet handler, but there could be other cases, other protocols.
> 
> Ideas are welcome :)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ