lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=uQTQjrXn4_w-YwKCutpENFdFrxQed5kVKXTDF@mail.gmail.com>
Date:	Sat, 19 Feb 2011 12:07:30 +0800
From:	Cypher Wu <cypher.w@...il.com>
To:	Chris Metcalf <cmetcalf@...era.com>
Cc:	David Miller <davem@...emloft.net>, xiyou.wangcong@...il.com,
	linux-kernel@...r.kernel.org, eric.dumazet@...il.com,
	netdev@...r.kernel.org
Subject: Re: IGMP and rwlock: Dead ocurred again on TILEPro

On Sat, Feb 19, 2011 at 5:51 AM, Chris Metcalf <cmetcalf@...era.com> wrote:
> On 2/17/2011 10:16 PM, Cypher Wu wrote:
>> On Fri, Feb 18, 2011 at 7:18 AM, Chris Metcalf <cmetcalf@...era.com> wrote:
>>> The interrupt architecture on Tile allows a write to a special-purpose
>>> register to put you into a "critical section" where no interrupts or faults
>>> are delivered.  So we just need to bracket the read_lock operations with
>>> two SPR writes; each takes six machine cycles, so we're only adding 12
>>> cycles to the total cost of taking or releasing a read lock on an rwlock
>> I agree that just lock interrupt for read operations should be enough,
>> but read_unlock() is also the place we should lock interrupt, right?
>> If interrupt occurred when it hold lock-val after TNS deadlock still
>> can occur.
>
> Correct; that's what I meant by "read_lock operations".  This include lock,
> trylock, and unlock.
>
>> When will you release out that patch? Since time is tight, so maybe
>> I've to fix-up it myself.
>
> I heard from one of our support folks that you were asking through that
> channel, so I asked him to go ahead and give you the spinlock sources
> directly.  I will be spending time next week syncing up our internal tree
> with the public git repository so you'll see it on LKML at that time.
>
>> 1. If we use SPR_INTERRUPT_CRITICAL_SECTION it will disable all the
>> interrupt which claimed 'CM', is that right? Should we have to same
>> its original value and restore it later?
>
> We don't need to save and restore, since INTERRUPT_CRITICAL_SECTION is
> almost always zero except in very specific situations.
>
>> 2. Should we lock interrupt for the whole operation of
>> read_lock()/read_unlock(), or we should leave interrupt critical
>> section if it run into  __raw_read_lock_slow() and before have to
>> delay_backoff() some time, and re-enter interrupt critical section
>> again before TNS?
>
> Correct, the fix only holds the critical section around the tns and the
> write-back, not during the delay_backoff().
>
>> Bye the way, other RISC platforms, say ARM and MIPS, use store
>> conditional rather that TNS a temp value for lock-val, does Fx have
>> similar instructions?
>
> TILEPro does not have anything more than test-and-set; TILE-Gx (the 64-bit
> processor) has a full array of atomic instructions.
>
>> Adding that to SPR writes should be fine, but it may cause interrupt
>> delay a little more that other platform's read_lock()?
>
> A little, but I think it's in the noise relative to the basic cost of
> read_lock in the absence of full-fledged atomic instructions.
>
>> Another question: What NMI in the former mail means?
>
> Non-maskable interrupt, such as performance counter interrupts.
>
> --
> Chris Metcalf, Tilera Corp.
> http://www.tilera.com
>
>

I've got your source code, thank you very much.

There is still two more question:
1. Why we merge the inlined code and the *_slow into none inlined functions?
2. I've seen the use of 'mb()' in unlock operation, but we don't use
that in the lock operation.

I've released a temporary version with that modification under our
customer' demand, since they want to do a long time test though this
weekend. I'll appreciate that if you gave some comment on my
modifications:

*** /opt/TileraMDE-2.1.0.98943/tilepro/src/sys/linux/include/asm-tile/spinlock_32.h
    2010-04-02 11:07:47.000000000 +0800
--- include/asm-tile/spinlock_32.h      2011-02-18 17:09:40.000000000 +0800
***************
*** 12,17 ****
--- 12,27 ----
   *   more details.
   *
   * 32-bit SMP spinlocks.
+  *
+  *
+  * The use of TNS instruction cause race condition for system call and
+  * interrupt, so we have to lock interrupt when we trying lock-value.
+  * However, since write_lock() is exclusive so if we really need to
+  * operate it in interrupt then system call have to use write_lock_irqsave(),
+  * So it don't need to lock interrupt here.
+  * Spinlock is also exclusive so we don't take care about it.
+  *
+  * Modified by Cyberman Wu on Feb 18th, 2011.
   */

  #ifndef _ASM_TILE_SPINLOCK_32_H
*************** void __raw_read_unlock_slow(raw_rwlock_t
*** 86,91 ****
--- 96,114 ----
  void __raw_write_lock_slow(raw_rwlock_t *, u32);
  void __raw_write_unlock_slow(raw_rwlock_t *, u32);

+
+ static inline void __raw_read_lock_enter_critical(void)
+ {
+       BUG_ON(__insn_mfspr(SPR_INTERRUPT_CRITICAL_SECTION));
+       __insn_mtspr(SPR_INTERRUPT_CRITICAL_SECTION, 1);
+ }
+
+ static inline void __raw_read_lock_leave_critical(void)
+ {
+       __insn_mtspr(SPR_INTERRUPT_CRITICAL_SECTION, 0);
+ }
+
+
  /**
   * __raw_read_can_lock() - would read_trylock() succeed?
   */
*************** static inline int __raw_write_can_lock(r
*** 107,121 ****
   */
  static inline void __raw_read_lock(raw_rwlock_t *rwlock)
  {
!       u32 val = __insn_tns((int *)&rwlock->lock);
        if (unlikely(val << _RD_COUNT_WIDTH)) {
  #ifdef __TILECC__
  #pragma frequency_hint NEVER
  #endif
                __raw_read_lock_slow(rwlock, val);
                return;
        }
        rwlock->lock = val + (1 << _RD_COUNT_SHIFT);
  }

  /**
--- 130,148 ----
   */
  static inline void __raw_read_lock(raw_rwlock_t *rwlock)
  {
!     u32 val;
!       __raw_read_lock_enter_critical();
!       /*u32 */val = __insn_tns((int *)&rwlock->lock);
        if (unlikely(val << _RD_COUNT_WIDTH)) {
  #ifdef __TILECC__
  #pragma frequency_hint NEVER
  #endif
                __raw_read_lock_slow(rwlock, val);
+               __raw_read_lock_leave_critical();
                return;
        }
        rwlock->lock = val + (1 << _RD_COUNT_SHIFT);
+       __raw_read_lock_leave_critical();
  }

  /**
*************** static inline void __raw_write_lock(raw_
*** 140,154 ****
  static inline int __raw_read_trylock(raw_rwlock_t *rwlock)
  {
        int locked;
!       u32 val = __insn_tns((int *)&rwlock->lock);
        if (unlikely(val & 1)) {
  #ifdef __TILECC__
  #pragma frequency_hint NEVER
  #endif
!               return __raw_read_trylock_slow(rwlock);
        }
        locked = (val << _RD_COUNT_WIDTH) == 0;
        rwlock->lock = val + (locked << _RD_COUNT_SHIFT);
        return locked;
  }

--- 167,187 ----
  static inline int __raw_read_trylock(raw_rwlock_t *rwlock)
  {
        int locked;
!     u32 val;
!       __raw_read_lock_enter_critical();
!       /*u32 */val = __insn_tns((int *)&rwlock->lock);
        if (unlikely(val & 1)) {
  #ifdef __TILECC__
  #pragma frequency_hint NEVER
  #endif
!               // return __raw_read_trylock_slow(rwlock);
!               locked =__raw_read_trylock_slow(rwlock);
!               __raw_read_lock_leave_critical();
!               return locked;
        }
        locked = (val << _RD_COUNT_WIDTH) == 0;
        rwlock->lock = val + (locked << _RD_COUNT_SHIFT);
+       __raw_read_lock_leave_critical();
        return locked;
  }

*************** static inline void __raw_read_unlock(raw
*** 184,198 ****
--- 217,234 ----
  {
        u32 val;
        mb();
+       __raw_read_lock_enter_critical();
        val = __insn_tns((int *)&rwlock->lock);
        if (unlikely(val & 1)) {
  #ifdef __TILECC__
  #pragma frequency_hint NEVER
  #endif
                __raw_read_unlock_slow(rwlock);
+               __raw_read_lock_leave_critical();
                return;
        }
        rwlock->lock = val - (1 << _RD_COUNT_SHIFT);
+       __raw_read_lock_leave_critical();
  }



--- /opt/TileraMDE-2.1.0.98943/tilepro/src/sys/linux/arch/tile/lib/spinlock_32.c
       2010-04-02 11:08:02.000000000 +0800
+++ arch/tile/lib/spinlock_32.c 2011-02-18 16:05:31.000000000 +0800
@@ -98,7 +98,18 @@ static inline u32 get_rwlock(raw_rwlock_
 #ifdef __TILECC__
 #pragma frequency_hint NEVER
 #endif
+            /*
+             * get_rwlock() now have to be called in Interrupt
+             * Critical Section, so it can't be called in the
+             * these __raw_write_xxx() anymore!!!!!
+             *
+             * We leave Interrupt Critical Section for making
+             * interrupt delay minimal.
+             * Is that really needed???
+             */
+            __raw_read_lock_leave_critical();
                        delay_backoff(iterations++);
+            __raw_read_lock_enter_critical();
                        continue;
                }
                return val;
@@ -152,7 +163,14 @@ void __raw_read_lock_slow(raw_rwlock_t *
        do {
                if (!(val & 1))
                        rwlock->lock = val;
+        /*
+         * We leave Interrupt Critical Section for making
+         * interrupt delay minimal.
+         * Is that really needed???
+         */
+        __raw_read_lock_leave_critical();
                delay_backoff(iterations++);
+        __raw_read_lock_enter_critical();
                val = __insn_tns((int *)&rwlock->lock);
        } while ((val << RD_COUNT_WIDTH) != 0);
        rwlock->lock = val + (1 << RD_COUNT_SHIFT);
@@ -166,23 +184,30 @@ void __raw_write_lock_slow(raw_rwlock_t
         * when we compare them.
         */
        u32 my_ticket_;
+       u32 iterations = 0;

-       /* Take out the next ticket; this will also stop would-be readers. */
-       if (val & 1)
-               val = get_rwlock(rwlock);
-       rwlock->lock = __insn_addb(val, 1 << WR_NEXT_SHIFT);
+       /*
+        * Wait until there are no readers, then bump up the next
+        * field and capture the ticket value.
+        */
+       for (;;) {
+               if (!(val & 1)) {
+                       if ((val >> RD_COUNT_SHIFT) == 0)
+                               break;
+                       rwlock->lock = val;
+               }
+               delay_backoff(iterations++);
+               val = __insn_tns((int *)&rwlock->lock);
+       }

-       /* Extract my ticket value from the original word. */
+       /* Take out the next ticket and extract my ticket value. */
+       rwlock->lock = __insn_addb(val, 1 << WR_NEXT_SHIFT);
        my_ticket_ = val >> WR_NEXT_SHIFT;

-       /*
-        * Wait until the "current" field matches our ticket, and
-        * there are no remaining readers.
-        */
+       /* Wait until the "current" field matches our ticket. */
        for (;;) {
                u32 curr_ = val >> WR_CURR_SHIFT;
-               u32 readers = val >> RD_COUNT_SHIFT;
-               u32 delta = ((my_ticket_ - curr_) & WR_MASK) + !!readers;
+               u32 delta = ((my_ticket_ - curr_) & WR_MASK);
                if (likely(delta == 0))
                        break;


-- 
Cyberman Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ