lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Jun 2007 12:05:02 -0700 (PDT)
From:	Philip Romanov <philip_romanov@...oo.com>
To:	Stephen Hemminger <shemminger@...ux-foundation.org>
Cc:	netdev@...r.kernel.org
Subject: Re: SKY2 vs SK98LIN performance on 88E8053 MAC



> > We are doing pure IPv4 forwarding between two
> Ethernet
> > interfaces:
> > 
> >  IXIA port A<--->System Under Test<--->IXIA Port B
> > 
> > Traffic has two IP destinations for each direction
> and
> > L4 protocol is UDP. There are two static ARP
> entries
> > and only interface routes. Two tests are identical
> > except that we switch from one driver to another. 
> > 
> > Ethernet ports on the SUT are oversubscribed --
> I'm
> > sending 60% of line rate (of 256-byte packets) and
> > measuring percentage of pass-through traffic which
> > makes to the IXIA port on the other side. Traffic
> is
> > bidirectional and system load is close to 100%.
> > 

> 
> Could you post the profiles. Hopefully, others have
> good ideas
> as well.
> 
> 256 bytes is the size where the copybreak
> optimization kicks in
> so you might want to experiment with the copybreak
> module option
> to the sky2 driver. copybreak=0 would no packets to
> be copied,
> copybreak=1514 would cause all packets to be copied.
>  Copying is
> an optimization that helps when receiving small
> packets locally,
> but may slow down forwarding path.
> 
 

Profiles were attached to previous posting in the
thread. I'm pasting them in plain text now at the end.
There are four profiles: two for the vmlinux and two
for sky2 and sk98lin drivers.

Regarding copybreak parameter: it appears that it
kicks in starting from 128 bytes by default??? 

...
static int copybreak __read_mostly = 128;
module_param(copybreak, int, 0);
MODULE_PARM_DESC(copybreak, "Receive copy threshold");
...

Anyway, I tried both copybreak settings of 0 and 1500:
there is significant slowdown when copybreak is set to
1500 with 256-byte traffic. Another clarification:
256-byte packets refer to entire Ethernet frame
including FCS, so when packets make into the driver
they become 252-byte long. I also tried to switch
driver to IRQ mode from MSI (SK98LIN is running is IRQ
mode) -- that did not have any significant effect on
forwarding performance.


Oprofile results:
================================================
profile for vmlinux 2.6.21.3 running with sk98lin
driver:

CPU: PIII, speed 2000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 100000
samples  %        symbol name
1626     14.3222  _raw_spin_trylock
935       8.2357  dev_hard_start_xmit
756       6.6590  sub_preempt_count
574       5.0559  __alloc_skb
507       4.4658  _raw_spin_unlock
462       4.0694  add_preempt_count
452       3.9813  dev_queue_xmit
432       3.8052  ip_output
416       3.6642  ip_rcv
406       3.5761  preempt_schedule
380       3.3471  netif_receive_skb
364       3.2062  __qdisc_run
283       2.4927  skb_release_data
274       2.4135  debug_smp_processor_id
265       2.3342  kfree
219       1.9290  kmem_cache_free
211       1.8585  __kmalloc
181       1.5943  ip_route_input
177       1.5591  pfifo_fast_dequeue
164       1.4446  ip_forward
150       1.3212  kmem_cache_alloc
141       1.2420  __kfree_skb
128       1.1275  ide_insw
121       1.0658  rt_hash_code
100       0.8808  pfifo_fast_requeue
96        0.8456  nf_iterate
94        0.8280  pfifo_fast_enqueue
91        0.8016  eth_type_trans
80        0.7047  nf_hook_slow
78        0.6870  cache_alloc_refill
72        0.6342  dev_kfree_skb_any
68        0.5990  local_bh_enable
58        0.5109  kfree_skb
58        0.5109  kfree_skbmem
52        0.4580  free_block
49        0.4316  selinux_ipv4_postroute_last
48        0.4228  delay_tsc
38        0.3347  page_fault
36        0.3171  kunmap_atomic
33        0.2907  memcpy
27        0.2378  __handle_mm_fault
27        0.2378  __netif_schedule
27        0.2378  cache_flusharray
26        0.2290  do_wp_page
25        0.2202  net_rx_action
21        0.1850  __d_lookup
16        0.1409  __copy_to_user_ll
16        0.1409  unmap_vmas
15        0.1321  default_idle
15        0.1321  kmap_atomic
14        0.1233  get_page_from_freelist
12        0.1057  __link_path_walk
12        0.1057  flush_tlb_mm
12        0.1057  strnlen_user
11        0.0969  avc_has_perm_noaudit
11        0.0969  do_page_fault
11        0.0969  sysenter_past_esp
10        0.0881  inode_has_perm
10        0.0881  net_tx_action
10        0.0881  selinux_inode_permission
9         0.0793  __might_sleep
9         0.0793  filemap_nopage
8         0.0705  cache_reap
8         0.0705  find_get_page
8         0.0705  find_vma
8         0.0705  local_bh_disable
7         0.0617  _atomic_dec_and_lock
6         0.0528  __copy_from_user_ll
6         0.0528  do_lookup
6         0.0528  do_timer
6         0.0528  free_hot_cold_page
6         0.0528  hrtimer_run_queues
6         0.0528  run_rebalance_domains
5         0.0440  apic_timer_interrupt
5         0.0440  error_code
5         0.0440  find_busiest_group
5         0.0440  task_rq_lock
4         0.0352  __do_softirq
4         0.0352  _spin_lock_irq
4         0.0352  copy_page_range
4         0.0352  do_mmap_pgoff
4         0.0352  do_softirq
4         0.0352  irq_entries_start
4         0.0352  put_page
4         0.0352  radix_tree_lookup
4         0.0352  raise_softirq
4         0.0352  restore_nocheck
4         0.0352  sched_clock
4         0.0352  schedule
3         0.0264  __pagevec_lru_add_active
3         0.0264  account_system_time
3         0.0264  apm_bios_call_simple
3         0.0264  avc_audit
3         0.0264  avc_has_perm
3         0.0264  do_IRQ
3         0.0264  drain_array
3         0.0264  getname
3         0.0264  handle_IRQ_event
3         0.0264  handle_fasteoi_irq
3         0.0264  mutex_lock
3         0.0264  page_remove_rmap
3         0.0264  prio_tree_insert
3         0.0264  run_timer_softirq
3         0.0264  serial_in
3         0.0264  set_cpus_allowed
3         0.0264  shrink_dcache_sb
3         0.0264  strncpy_from_user
2         0.0176  __wake_up_bit
2         0.0176  _raw_read_trylock
2         0.0176  _raw_read_unlock
2         0.0176  alloc_inode
2         0.0176  apm_cpu_idle
2         0.0176  clocksource_get_next
2         0.0176  common_interrupt
2         0.0176  copy_process
2         0.0176  do_sigaction
2         0.0176  dup_fd
2         0.0176  file_move
2         0.0176  flush_tlb_page
2         0.0176  free_pages_bulk
2         0.0176  mark_page_accessed
2         0.0176  mntput_no_expire
2         0.0176  raise_softirq_irqoff
2         0.0176  resume_userspace
2         0.0176  ret_from_intr
2         0.0176  serial_out
2         0.0176  softlockup_tick
2         0.0176  tick_handle_periodic
2         0.0176  unlink_file_vma
2         0.0176  up_read
2         0.0176  vfs_read
2         0.0176  vm_normal_page
1         0.0088  __add_entropy_words
1         0.0088  __alloc_pages
1         0.0088  __const_udelay
1         0.0088  __first_cpu
1         0.0088  __follow_mount
1         0.0088  __lookup_hash
1         0.0088  __mod_zone_page_state
1         0.0088  __next_cpu
1         0.0088  __pte_alloc
1         0.0088  __remove_shared_vm_struct
1         0.0088  __rmqueue
1         0.0088  __wake_up
1         0.0088  __wake_up_common
1         0.0088  ack_ioapic_quirk_irq
1         0.0088  anon_vma_link
1         0.0088  anon_vma_unlink
1         0.0088  arch_get_unmapped_area_topdown
1         0.0088  can_share_swap_page
1         0.0088  cfq_exit_cfqq
1         0.0088  cfq_queue_empty
1         0.0088  copy_mount_options
1         0.0088  copy_strings
1         0.0088  create_write_pipe
1         0.0088  current_kernel_time
1         0.0088  current_tick_length
1         0.0088  d_lookup
1         0.0088  do_filp_open
1         0.0088  do_munmap
1         0.0088  do_notify_parent
1         0.0088  do_notify_resume
1         0.0088  do_path_lookup
1         0.0088  down_read_trylock
1         0.0088  exec_keys
1         0.0088  find_vma_prev
1         0.0088  free_page_and_swap_cache
1         0.0088  free_pgd_range
1         0.0088  generic_file_aio_read
1         0.0088  get_task_mm
1         0.0088  get_unmapped_area
1         0.0088  get_write_access
1         0.0088  hrtimer_cancel
1         0.0088  ide_inb
1         0.0088  ide_outb
1         0.0088  inode_init_once
1         0.0088  internal_add_timer
1         0.0088  irq_enter
1         0.0088  kmem_cache_zalloc
1         0.0088  kprobe_flush_task
1         0.0088  lookup_mnt
1         0.0088  may_create
1         0.0088  memmove
1         0.0088  new_inode
1         0.0088  notifier_call_chain
1         0.0088  number
1         0.0088  open_namei
1         0.0088  pid_task
1         0.0088  pipe_write_fasync
1         0.0088  proc_delete_inode
1         0.0088  proc_flush_task
1         0.0088  proc_lookup
1         0.0088  profile_tick
1         0.0088  put_files_struct
1         0.0088  rb_insert_color
1         0.0088  rcu_process_callbacks
1         0.0088  recalc_task_prio
1         0.0088  resched_task
1         0.0088  resume_kernel
1         0.0088  selinux_bprm_alloc_security
1         0.0088  selinux_bprm_set_security
1         0.0088  selinux_file_permission
1         0.0088  selinux_inode_getattr
1         0.0088  shmem_get_inode
1         0.0088  shmem_mknod
1         0.0088  sigprocmask
1         0.0088  slab_destroy
1         0.0088  sock_attach_fd
1         0.0088  sync_dquots
1         0.0088  sys_mmap2
1         0.0088  sys_munmap
1         0.0088  sys_rt_sigprocmask
1         0.0088  tick_periodic
1         0.0088  vfs_create
1         0.0088  vma_merge
1         0.0088  write_chan
1         0.0088  zone_watermark_ok

==================================================
profile for vmlinux 2.6.21.3 running with sky2 driver:

CPU: PIII, speed 2000.22 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 100000
samples  %        symbol name
7894      9.0213  __alloc_skb
6475      7.3997  skb_release_data
5706      6.5208  dev_hard_start_xmit
5656      6.4637  ip_output
5652      6.4591  eth_type_trans
5432      6.2077  ip_rcv
5278      6.0317  netif_receive_skb
3499      3.9987  kfree
3195      3.6513  _raw_spin_trylock
3003      3.4318  kmem_cache_free
2675      3.0570  debug_smp_processor_id
2669      3.0501  __kmalloc
2383      2.7233  sub_preempt_count
2348      2.6833  ip_route_input
2263      2.5862  ip_forward
2185      2.4970  add_preempt_count
2105      2.4056  dev_queue_xmit
1994      2.2788  kmem_cache_alloc
1587      1.8136  __kfree_skb
1479      1.6902  rt_hash_code
1409      1.6102  nf_iterate
1300      1.4856  pfifo_fast_enqueue
1262      1.4422  preempt_schedule
1084      1.2388  nf_hook_slow
986       1.1268  _raw_spin_unlock
939       1.0731  kfree_skb
935       1.0685  kfree_skbmem
926       1.0582  __qdisc_run
897       1.0251  local_bh_enable
792       0.9051  pfifo_fast_dequeue
503       0.5748  __netdev_alloc_skb
451       0.5154  selinux_ipv4_postroute_last
411       0.4697  dev_kfree_skb_any
298       0.3406  __copy_to_user_ll
269       0.3074  cache_alloc_refill
263       0.3006  free_block
174       0.1988  local_bh_disable
122       0.1394  cache_flusharray
82        0.0937  net_rx_action
75        0.0857  delay_tsc
65        0.0743  memcpy
48        0.0549  net_tx_action
41        0.0469  kunmap_atomic
39        0.0446  do_wp_page
28        0.0320  __link_path_walk
27        0.0309  __d_lookup
27        0.0309  page_fault
22        0.0251  __do_softirq
22        0.0251  get_page_from_freelist
20        0.0229  kmap_atomic
19        0.0217  __handle_mm_fault
15        0.0171  __netif_schedule
14        0.0160  avc_has_perm_noaudit
14        0.0160  find_vma
13        0.0149  hrtimer_run_queues
11        0.0126  flush_tlb_mm
10        0.0114  schedule
9         0.0103  inode_has_perm
8         0.0091  do_timer
8         0.0091  strnlen_user
7         0.0080  avc_has_perm
7         0.0080  do_page_fault
7         0.0080  run_timer_softirq
6         0.0069  __might_sleep
6         0.0069  apic_timer_interrupt
6         0.0069  default_idle
6         0.0069  filemap_nopage
6         0.0069  find_busiest_group
6         0.0069  find_get_page
6         0.0069  mod_zone_page_state
6         0.0069  profile_tick
6         0.0069  sched_clock
5         0.0057  __rmqueue
5         0.0057  _spin_lock_irq
5         0.0057  apm_bios_call_simple
5         0.0057  error_code
5         0.0057  raise_softirq_irqoff
5         0.0057  serial_out
5         0.0057  set_cpus_allowed
5         0.0057  softlockup_tick
5         0.0057  tick_periodic
5         0.0057  unmap_vmas
4         0.0046  __rcu_process_callbacks
4         0.0046  _atomic_dec_and_lock
4         0.0046  account_system_time
4         0.0046  cache_reap
4         0.0046  copy_process
4         0.0046  do_mmap_pgoff
4         0.0046  kmem_cache_zalloc
4         0.0046  memmove
4         0.0046  read_tsc
4         0.0046  scheduler_tick
4         0.0046  shrink_dcache_sb
4         0.0046  smp_apic_timer_interrupt
4         0.0046  strncpy_from_user
3         0.0034  _raw_read_unlock
3         0.0034  avc_audit
3         0.0034  clocksource_get_next
3         0.0034  dput
3         0.0034  file_has_perm
3         0.0034  inode_doinit_with_dentry
3         0.0034  inode_init_once
3         0.0034  put_page
3         0.0034  raise_softirq
3         0.0034  rb_insert_color
3         0.0034  restore_nocheck
3         0.0034  ret_from_intr
3         0.0034  run_posix_cpu_timers
3         0.0034  selinux_inode_permission
3         0.0034  serial_in
3         0.0034  vm_normal_page
2         0.0023  __follow_mount
2         0.0023  __mod_zone_page_state
2         0.0023  __switch_to
2         0.0023  __wake_up_bit
2         0.0023  anon_vma_prepare
2         0.0023  anon_vma_unlink
2         0.0023  atomic_notifier_call_chain
2         0.0023  copy_page_range
2         0.0023  drain_array
2         0.0023  dummy_file_mmap
2         0.0023  find_vma_prepare
2         0.0023  free_hot_cold_page
2         0.0023  irq_entries_start
2         0.0023  irq_exit
2         0.0023  msecs_to_jiffies
2         0.0023  neigh_lookup
2         0.0023  page_add_file_rmap
2         0.0023  radix_tree_lookup
2         0.0023  resume_userspace
2         0.0023  selinux_vm_enough_memory
2         0.0023  shmem_get_inode
1         0.0011  __alloc_pages
1         0.0011  __copy_from_user_ll
1         0.0011  __copy_user_intel
1         0.0011  __dec_zone_page_state
1         0.0011  __dentry_open
1         0.0011  __free_pages_ok
1         0.0011  __mutex_init
1         0.0011  __netif_rx_schedule
1         0.0011  __next_cpu
1         0.0011  __pagevec_lru_add_active
1         0.0011  __rcu_pending
1         0.0011  __wake_up_common
1         0.0011  _raw_read_trylock
1         0.0011  _read_lock_irq
1         0.0011  acpi_pm_read
1         0.0011  add_to_page_cache
1         0.0011  add_wait_queue
1         0.0011  alloc_pid
1         0.0011  anon_vma_link
1         0.0011  apm_bios_call
1         0.0011  apm_cpu_idle
1         0.0011  arch_get_unmapped_area_topdown
1         0.0011  blockable_page_cache_readahead
1         0.0011  cap_bprm_set_security
1         0.0011  cap_capable
1         0.0011  cdev_get
1         0.0011  clear_user
1         0.0011  copy_from_user
1         0.0011  copy_strings
1         0.0011  copy_to_user
1         0.0011  cp_new_stat64
1         0.0011  cpuset_exit
1         0.0011  current_tick_length
1         0.0011  d_alloc
1         0.0011  deny_write_access
1         0.0011  dequeue_task
1         0.0011  do_exit
1         0.0011  do_lookup
1         0.0011  do_path_lookup
1         0.0011  do_sigaction
1         0.0011  do_softirq
1         0.0011  do_wait
1         0.0011  dummy_inode_setattr
1         0.0011  dup_fd
1         0.0011  exit_itimers
1         0.0011  exit_mm
1         0.0011  ext3_follow_link
1         0.0011  ext3_release_file
1         0.0011  fib_semantic_match
1         0.0011  file_read_actor
1         0.0011  filp_close
1         0.0011  find_inode_fast
1         0.0011  find_next_bit
1         0.0011  find_pid
1         0.0011  find_vma_prev
1         0.0011  flush_thread
1         0.0011  flush_tlb_page
1         0.0011  fn_hash_lookup
1         0.0011  free_page_and_swap_cache
1         0.0011  free_pages
1         0.0011  half_md4_transform
1         0.0011  handle_edge_irq
1         0.0011  hrtimer_init
1         0.0011  hweight32
1         0.0011  idle_cpu
1         0.0011  insert_vm_struct
1         0.0011  iput
1         0.0011  irq_enter
1         0.0011  ksoftirqd
1         0.0011  lookup_create
1         0.0011  lookup_mnt
1         0.0011  lru_cache_add_active
1         0.0011  move_native_irq
1         0.0011  mutex_unlock
1         0.0011  open_exec
1         0.0011  page_remove_rmap
1         0.0011  permission
1         0.0011  pipe_poll
1         0.0011  pipe_read
1         0.0011  prepare_to_wait
1         0.0011  proc_lookup
1         0.0011  radix_tree_preload
1         0.0011  rb_erase
1         0.0011  read_chan
1         0.0011  release_pages
1         0.0011  remove_vma
1         0.0011  restore_sigcontext
1         0.0011  run_rebalance_domains
1         0.0011  rw_verify_area
1         0.0011  save_i387
1         0.0011  second_overflow
1         0.0011  security_compute_sid
1         0.0011  selinux_file_mmap
1         0.0011  selinux_sysctl
1         0.0011  send_signal
1         0.0011  seq_escape
1         0.0011  shmem_swp_alloc
1         0.0011  shmem_truncate
1         0.0011  show_vfsmnt
1         0.0011  sig_ignored
1         0.0011  slab_destroy
1         0.0011  snprintf
1         0.0011  sys_access
1         0.0011  sys_clone
1         0.0011  sys_close
1         0.0011  sys_faccessat
1         0.0011  sys_fcntl64
1         0.0011  sys_mkdirat
1         0.0011  sys_mprotect
1         0.0011  sys_rt_sigaction
1         0.0011  sysenter_past_esp
1         0.0011  task_rq_lock
1         0.0011  task_running_tick
1         0.0011  unix_stream_connect
1         0.0011  unlink_file_vma
1         0.0011  up_read
1         0.0011  vma_adjust
1         0.0011  vma_link
1         0.0011  vma_prio_tree_add
1         0.0011  worker_thread
1         0.0011  zone_watermark_ok

====================================================
profile for SK98LIN driver:

CPU: PIII, speed 2000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 100000
samples  %        image name               symbol name
2357     43.3910  sk98lin.ko               SkY2Poll
870      16.0162  sk98lin.ko              
GiveTxBufferToHw
811      14.9300  sk98lin.ko               SkY2Xmit
677      12.4632  sk98lin.ko              
FillReceiveTableYukon2
206       3.7923  sk98lin.ko               SkGmPhyRead
113       2.0803  sk98lin.ko               SkY2Isr
103       1.8962  sk98lin.ko              
SkCsGetReceiveInfo
80        1.4728  sk98lin.ko               SkMacIrq
70        1.2887  sk98lin.ko              
SkGmPhyWrite
53        0.9757  sk98lin.ko              
SkGmMacStatistic
30        0.5523  sk98lin.ko              
SkGmResetCounter
25        0.4602  sk98lin.ko              
CheckRXCounters
8         0.1473  sk98lin.ko              
SkY2FreeRxBuffers
5         0.0920  sk98lin.ko               SkHwtRead
4         0.0736  sk98lin.ko               SkGmInitMac
2         0.0368  sk98lin.ko              
SkEventDispatcher
2         0.0368  sk98lin.ko              
SkMacHashing
2         0.0368  sk98lin.ko              
SkMacRxTxDisable
2         0.0368  sk98lin.ko              
SkMacSoftRst
2         0.0368  sk98lin.ko              
SkYuk2SirqIsr
1         0.0184  sk98lin.ko              
DoInitRamQueue
1         0.0184  sk98lin.ko              
SkAddrGmacMcUpdate
1         0.0184  sk98lin.ko               SkDrvEvent
1         0.0184  sk98lin.ko              
SkGeCheckTimer
1         0.0184  sk98lin.ko              
SkGeInitMacFifo
1         0.0184  sk98lin.ko              
SkGeStopPort
1         0.0184  sk98lin.ko               SkTimerStop
1         0.0184  sk98lin.ko              
SkY2PortStop
1         0.0184  sk98lin.ko              
SkYuk2PortSirq
1         0.0184  sk98lin.ko               timer_done


=============================================
SKY2 profile:

CPU: PIII, speed 2000.22 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 100000
samples  %        image name               symbol name
69576    64.4634  sky2.ko                 
sky2_xmit_frame
27759    25.7192  sky2.ko                  sky2_poll
5782      5.3571  sky2.ko                 
sky2_rx_unmap_skb
1310      1.2137  sky2.ko                 
sky2_tx_complete
1276      1.1822  sky2.ko                  sky2_rx_add
1018      0.9432  sky2.ko                 
sky2_rx_submit
687       0.6365  sky2.ko                 
sky2_rx_map_skb
521       0.4827  sky2.ko                  .text
2         0.0019  sky2.ko                  sky2_intr







       
____________________________________________________________________________________
Get the free Yahoo! toolbar and rest assured with the added security of spyware protection.
http://new.toolbar.yahoo.com/toolbar/features/norton/index.php
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists