lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Feb 2009 08:45:23 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Kenny Chang <kchang@...enacr.com>, netdev@...r.kernel.org
Subject: Re: Multicast packet loss

On Sun, Feb 01, 2009 at 01:40:39PM +0100, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > Kenny Chang a écrit :
> >> Ah, sorry, here's the test program attached.
> >>
> >> We've tried 2.6.28.1, but no, we haven't tried the 2.6.28.2 or the
> >> 2.6.29.-rcX.
> >>
> >> Right now, we are trying to step through the kernel versions until we
> >> see where the performance drops significantly.  We'll try 2.6.29-rc soon
> >> and post the result.
> > 
> 
> I tried your program on my dev machines and 2.6.29 (each machine : two quad core cpus, 32bits kernel)
> 
> With 8 clients, about 10% packet loss, 
> 
> Might be a scheduling problem, not sure... 50.000 packets per second, x 8 cpus = 400.000
> wakeups per second... But at least UDP receive path seems OK.
> 
> Thing is the receiver (softirq that queues the packet) seems to fight on socket lock with
> readers...
> 
> I tried to setup IRQ affinities, but it doesnt work any more on bnx2 (unless using msi_disable=1)
> 
> I tried playing with ethtool -C|c G|g params...
> And /proc/net/core/rmem_max (and setsockopt(RCVBUF) to set bigger receive buffers in your program)
> 
> I can have 0% packet loss if booting with msi_disable and
> 
> echo 1 >/proc/irq/16/smp_affinities
> 
> (16 being interrupt of eth0 NIC)
> 
> then, a second run gave me errors, about 2%, oh well...
> 
> 
> oprofile numbers without playing IRQ affinities:
> 
> CPU: Core 2, speed 2999.89 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  %        symbol name
> 327928   10.1427  schedule
> 259625    8.0301  mwait_idle
> 187337    5.7943  __skb_recv_datagram
> 109854    3.3977  lock_sock_nested
> 104713    3.2387  tick_nohz_stop_sched_tick
> 98831     3.0568  select_nohz_load_balancer
> 88163     2.7268  skb_release_data
> 78552     2.4296  update_curr
> 75241     2.3272  getnstimeofday
> 71400     2.2084  set_next_entity
> 67629     2.0917  get_next_timer_interrupt
> 67375     2.0839  sched_clock_tick
> 58112     1.7974  enqueue_entity
> 56462     1.7463  udp_recvmsg
> 55049     1.7026  copy_to_user
> 54277     1.6788  sched_clock_cpu
> 54031     1.6712  __copy_skb_header
> 51859     1.6040  __slab_free
> 51786     1.6017  prepare_to_wait_exclusive
> 51776     1.6014  sock_def_readable
> 50062     1.5484  try_to_wake_up
> 42182     1.3047  __switch_to
> 41631     1.2876  read_tsc
> 38337     1.1857  tick_nohz_restart_sched_tick
> 34358     1.0627  cpu_idle
> 34194     1.0576  native_sched_clock
> 33812     1.0458  pick_next_task_fair
> 33685     1.0419  resched_task
> 33340     1.0312  sys_recvfrom
> 33287     1.0296  dst_release
> 32439     1.0033  kmem_cache_free
> 32131     0.9938  hrtimer_start_range_ns
> 29807     0.9219  udp_queue_rcv_skb
> 27815     0.8603  task_rq_lock
> 26875     0.8312  __update_sched_clock
> 23912     0.7396  sock_queue_rcv_skb
> 21583     0.6676  __wake_up_sync
> 21001     0.6496  effective_load
> 20531     0.6350  hrtick_start_fair
> 
> 
> 
> 
> With IRQ affinities and msi_disable (no packet drops)
> 
> CPU: Core 2, speed 3000.13 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  %        symbol name
> 79788    10.3815  schedule
> 69422     9.0328  mwait_idle
> 44877     5.8391  __skb_recv_datagram
> 28629     3.7250  tick_nohz_stop_sched_tick
> 27252     3.5459  select_nohz_load_balancer
> 24320     3.1644  lock_sock_nested
> 20833     2.7107  getnstimeofday
> 20666     2.6889  skb_release_data
> 18612     2.4217  set_next_entity
> 17785     2.3141  get_next_timer_interrupt
> 17691     2.3018  udp_recvmsg
> 17271     2.2472  sched_clock_tick
> 16032     2.0860  copy_to_user
> 14785     1.9237  update_curr
> 12512     1.6280  prepare_to_wait_exclusive
> 12498     1.6262  __slab_free
> 11380     1.4807  read_tsc
> 11145     1.4501  sched_clock_cpu
> 10598     1.3789  __switch_to
> 9588      1.2475  pick_next_task_fair
> 9480      1.2335  cpu_idle
> 9218      1.1994  sys_recvfrom
> 9008      1.1721  tick_nohz_restart_sched_tick
> 8977      1.1680  dst_release
> 8930      1.1619  native_sched_clock
> 8392      1.0919  kmem_cache_free
> 8124      1.0570  hrtimer_start_range_ns
> 7274      0.9464  bnx2_interrupt
> 7175      0.9336  __copy_skb_header
> 7006      0.9116  try_to_wake_up
> 6949      0.9042  sock_def_readable
> 6787      0.8831  enqueue_entity
> 6772      0.8811  __update_sched_clock
> 6349      0.8261  finish_task_switch
> 6164      0.8020  copy_from_user
> 5096      0.6631  resched_task
> 5007      0.6515  sysenter_past_esp
> 
> 
> I will try to investigate a litle bit more in following days if time permits.
> 
I'm not 100% versed on this, but IIRC, some hardware simply can't set irq
affinity when operating in msi interrupt mode.  If this is the case with this
particular bnx2 card, then I would expect some packet loss, simply due to the
constant cache misses.  It would be interesting to re-run your oprofile cases,
counting L2 cache hits/misses (if your cpu supports that class of counter) for
both bnx2 running in msi enabled mode and msi disabled mode.  It would also be
interesting to use a different card, that can set irq affinity, and compare loss
with irqbalance on, and irqbalance off with irq afninty set to all cpus.

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ