First a few runs with Eric's code + epoll/libevent ------------------------------------------------------------------------------- PerfTop: 4009 irqs/sec kernel:83.4% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ____________________ 2097.00 8.6% sky2_poll [sky2] 1742.00 7.2% _raw_spin_lock_irqsave [kernel] 831.00 3.4% system_call [kernel] 654.00 2.7% copy_user_generic_string [kernel] 654.00 2.7% datagram_poll [kernel] 647.00 2.7% fget [kernel] 623.00 2.6% _raw_spin_unlock_irqrestore [kernel] 547.00 2.3% _raw_spin_lock_bh [kernel] 506.00 2.1% sys_epoll_ctl [kernel] 475.00 2.0% kmem_cache_free [kernel] 466.00 1.9% schedule [kernel] 436.00 1.8% vread_tsc [kernel].vsyscall_fn 417.00 1.7% fput [kernel] 415.00 1.7% sys_epoll_wait [kernel] 402.00 1.7% _raw_spin_lock [kernel] ------------------------------------------------------------------------------- PerfTop: 616 irqs/sec kernel:98.7% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________ ________ 2534.00 28.6% sky2_poll [sky2] 503.00 5.7% ip_route_input [kernel] 438.00 4.9% _raw_spin_lock_irqsave [kernel] 418.00 4.7% __udp4_lib_lookup [kernel] 378.00 4.3% __alloc_skb [kernel] 364.00 4.1% ip_rcv [kernel] 323.00 3.6% _raw_spin_lock [kernel] 315.00 3.5% sock_queue_rcv_skb [kernel] 284.00 3.2% __netif_receive_skb [kernel] 281.00 3.2% __udp4_lib_rcv [kernel] 266.00 3.0% __wake_up_common [kernel] 238.00 2.7% sock_def_readable [kernel] 181.00 2.0% __kmalloc [kernel] 163.00 1.8% kmem_cache_alloc [kernel] 150.00 1.7% ep_poll_callback [kernel] ------------------------------------------------------------------------------- PerfTop: 854 irqs/sec kernel:80.2% [1000Hz cycles], (all, cpu: 2) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________ ____________________ 341.00 8.0% _raw_spin_lock_irqsave [kernel] 235.00 5.5% system_call [kernel] 174.00 4.1% datagram_poll [kernel] 174.00 4.1% fget [kernel] 173.00 4.1% copy_user_generic_string [kernel] 135.00 3.2% _raw_spin_unlock_irqrestore [kernel] 125.00 2.9% _raw_spin_lock_bh [kernel] 122.00 2.9% schedule [kernel] 113.00 2.6% sys_epoll_ctl [kernel] 113.00 2.6% kmem_cache_free [kernel] 108.00 2.5% vread_tsc [kernel].vsyscall_fn 105.00 2.5% sys_epoll_wait [kernel] 102.00 2.4% udp_recvmsg [kernel] 95.00 2.2% mutex_lock [kernel] Average 97.55% of 10M packets at 750Kpps Turn on rps mask ee and irq affinity to cpu0 ------------------------------------------------------------------------------- PerfTop: 3885 irqs/sec kernel:83.6% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ________ 2945.00 16.7% sky2_poll [sky2] 653.00 3.7% _raw_spin_lock_irqsave [kernel] 460.00 2.6% system_call [kernel] 420.00 2.4% _raw_spin_unlock_irqrestore [kernel] 414.00 2.3% sky2_intr [sky2] 392.00 2.2% fget [kernel] 360.00 2.0% ip_rcv [kernel] 324.00 1.8% sys_epoll_ctl [kernel] 323.00 1.8% __netif_receive_skb [kernel] 310.00 1.8% schedule [kernel] 292.00 1.7% ip_route_input [kernel] 292.00 1.7% _raw_spin_lock [kernel] 291.00 1.7% copy_user_generic_string [kernel] 284.00 1.6% kmem_cache_free [kernel] 262.00 1.5% call_function_single_interrupt [kernel] ------------------------------------------------------------------------------- PerfTop: 1000 irqs/sec kernel:98.1% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ___________________________________ ________ 4170.00 61.9% sky2_poll [sky2] 723.00 10.7% sky2_intr [sky2] 159.00 2.4% __alloc_skb [kernel] 140.00 2.1% get_rps_cpu [kernel] 106.00 1.6% __kmalloc [kernel] 95.00 1.4% enqueue_to_backlog [kernel] 86.00 1.3% kmem_cache_alloc [kernel] 85.00 1.3% irq_entries_start [kernel] 85.00 1.3% _raw_spin_lock_irqsave [kernel] 82.00 1.2% _raw_spin_lock [kernel] 66.00 1.0% swiotlb_sync_single [kernel] 58.00 0.9% sky2_remove [sky2] 49.00 0.7% default_send_IPI_mask_sequence_phys [kernel] 47.00 0.7% sky2_rx_submit [sky2] 36.00 0.5% _raw_spin_unlock_irqrestore [kernel] ------------------------------------------------------------------------------- PerfTop: 344 irqs/sec kernel:84.3% [1000Hz cycles], (all, cpu: 2) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ______________________________ ____________________ 114.00 5.2% _raw_spin_lock_irqsave [kernel] 79.00 3.6% fget [kernel] 78.00 3.6% ip_rcv [kernel] 78.00 3.6% system_call [kernel] 75.00 3.4% _raw_spin_unlock_irqrestore [kernel] 67.00 3.1% sys_epoll_ctl [kernel] 65.00 3.0% schedule [kernel] 61.00 2.8% ip_route_input [kernel] 48.00 2.2% vread_tsc [kernel].vsyscall_fn 48.00 2.2% call_function_single_interrupt [kernel] 46.00 2.1% kmem_cache_free [kernel] 45.00 2.1% __netif_receive_skb [kernel] 41.00 1.9% process_recv snkudp 40.00 1.8% kfree [kernel] 39.00 1.8% _raw_spin_lock [kernel] 92.97% of 10M packets at 750Kpps Ok, so this is exactly what i saw with my app. non-rps is better. To summarize: It used to be the opposite on net-next before around Apr14. rps has gotten worse.