lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130627224444.GE5936@sbohrermbp13-local.rgmadvisors.com>
Date:	Thu, 27 Jun 2013 17:44:44 -0500
From:	Shawn Bohrer <sbohrer@...advisors.com>
To:	Rick Jones <rick.jones2@...com>
Cc:	netdev@...r.kernel.org
Subject: Re: Understanding lock contention in __udp4_lib_mcast_deliver

On Thu, Jun 27, 2013 at 03:03:15PM -0700, Rick Jones wrote:
> On 06/27/2013 02:54 PM, Shawn Bohrer wrote:
> >On Thu, Jun 27, 2013 at 01:46:58PM -0700, Rick Jones wrote:
> >>How do you know that time is actually contention and not simply
> >>acquire and release overhead?
> >
> >Excellent point, and that could be the problem with my thinking.  I
> >just now tried (unsuccessfully) to use lockstat to see if there was
> >any contention reported.  I read Documentation/lockstat.txt and
> >followed the instructions but the lock in question did not appear to
> >be in the output.  I think I'm going to have to go with the assumption
> >that this is just acquire and release overhead.
> 
> I think there is a way to get perf to "annotate" (iirc that is the
> term it uses) the report to show hits at the instruction level.
> Ostensibly one could then look and see how many of the hits were for
> the acquire/release part of the routine, and how much was for the
> actual contention.

Yep, so ~1% of my total time is in _raw_spin_lock and using perf
annotate it appears that maybe only 5-6% percent of that is actually
contention and the rest is acquire/release.  Looks like I need to look
elsewhere for my performance improvements.  Thanks Rick for your help!
Below is the output of perf annotate if your curious.

 Percent |      Source code & Disassembly of vmlinux
------------------------------------------------
         :
         :
         :
         :      Disassembly of section .text:
         :
         :      ffffffff814c72d0 <_raw_spin_lock>:
         :      EXPORT_SYMBOL(_raw_spin_trylock_bh);
         :      #endif
         :
         :      #ifndef CONFIG_INLINE_SPIN_LOCK
         :      void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
         :      {
    2.43 :      ffffffff814c72d0:       callq  ffffffff814cf440 <__fentry__>
    1.23 :      ffffffff814c72d5:       push   %rbp
    1.66 :      ffffffff814c72d6:       mov    %rsp,%rbp
         :       */
         :      static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
         :      {
         :              register struct __raw_tickets inc = { .tail = 1 };
         :
         :              inc = xadd(&lock->tickets, inc);
    0.71 :      ffffffff814c72d9:       mov    $0x10000,%eax
    0.00 :      ffffffff814c72de:       lock xadd %eax,(%rdi)
   86.07 :      ffffffff814c72e2:       mov    %eax,%edx
    0.05 :      ffffffff814c72e4:       shr    $0x10,%edx
         :
         :              for (;;) {
         :                      if (inc.head == inc.tail)
    0.00 :      ffffffff814c72e7:       cmp    %ax,%dx
    0.00 :      ffffffff814c72ea:       je     ffffffff814c72fa <_raw_spin_lock+0x2a>
    0.04 :      ffffffff814c72ec:       nopl   0x0(%rax)
         :      }
         :
         :      /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
         :      static inline void rep_nop(void)
         :      {
         :              asm volatile("rep; nop" ::: "memory");
    0.47 :      ffffffff814c72f0:       pause  
         :                              break;
         :                      cpu_relax();
         :                      inc.head = ACCESS_ONCE(lock->tickets.head);
    2.85 :      ffffffff814c72f2:       movzwl (%rdi),%eax
         :              register struct __raw_tickets inc = { .tail = 1 };
         :
         :              inc = xadd(&lock->tickets, inc);
         :
         :              for (;;) {
         :                      if (inc.head == inc.tail)
    3.53 :      ffffffff814c72f5:       cmp    %ax,%dx
    0.00 :      ffffffff814c72f8:       jne    ffffffff814c72f0 <_raw_spin_lock+0x20>
         :              __raw_spin_lock(lock);
         :      }
    0.91 :      ffffffff814c72fa:       pop    %rbp
    0.00 :      ffffffff814c72fb:       retq   

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ