lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOGi=dPLk1HdpnZUq=BqH=sw5xaQ79WWaywuyfzwb1t04JiRuA@mail.gmail.com>
Date:	Thu, 26 Nov 2015 11:49:14 +0800
From:	Ling Ma <ling.ma.program@...il.com>
To:	Waiman Long <waiman.long@....com>
Cc:	Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
	linux-kernel@...r.kernel.org, Ling <ling.ml@...baba-inc.com>
Subject: Re: Improve spinlock performance by moving work to one core

Hi Longman,

All compared data is from the below operation in spinlock-test.patch:

+#if ORG_QUEUED_SPINLOCK
+       org_queued_spin_lock((struct qspinlock *)&pa.n->list_lock);
+       refill_fn(&pa);
+       org_queued_spin_unlock((struct qspinlock *)&pa.n->list_lock);
+#else
+       new_spin_lock((struct nspinlock *)&pa.n->list_lock, refill_fn, &pa);
+#endif

and

+#if ORG_QUEUED_SPINLOCK
+       org_queued_spin_lock((struct qspinlock *)&pa.n->list_lock);
+       flusharray_fn(&pa);
+       org_queued_spin_unlock((struct qspinlock *)&pa.n->list_lock);
+#else
+       new_spin_lock((struct nspinlock *)&pa.n->list_lock, flusharray_fn, &pa);
+#endif

So the result is correct and fair.

Yes, we updated the code in include/asm-generic/qspinlock.h to
simplified modification and avoid kernel crash,
for example there are 10 lock scenarios to use new spin lock,
because bottle-neck is only from one or two scenarios, we only modify them,
other lock scenarios will continue to use the lock in qspinlock.h, we
must modify the code,
otherwise the operation will be hooked in the queued and never be waken up.

Thanks
Ling



2015-11-26 3:05 GMT+08:00 Waiman Long <waiman.long@....com>:
> On 11/23/2015 04:41 AM, Ling Ma wrote:
>> Hi Longman,
>>
>> Attachments include user space application thread.c and kernel patch
>> spinlock-test.patch based on kernel 4.3.0-rc4
>>
>> we run thread.c with kernel patch, test original and new spinlock respectively,
>> perf top -G indicates thread.c cause cache_alloc_refill and
>> cache_flusharray functions to spend ~25% time on original spinlock,
>> after introducing new spinlock in two functions, the cost time become ~22%.
>>
>> The printed data  also tell us the new spinlock improves performance
>> by about 15%( 93841765576 / 81036259588) on E5-2699V3
>>
>> Appreciate your comments.
>>
>>
>
> I saw that you make the following changes in the code:
>
> static __always_inline void queued_spin_lock(struct qspinlock *lock)
> {
> u32 val;
> -
> +repeat:
> val = atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL);
> if (likely(val == 0))
> return;
> - queued_spin_lock_slowpath(lock, val);
> + goto repeat;
> + //queued_spin_lock_slowpath(lock, val);
> }
>
>
> This effectively changes the queued spinlock into an unfair byte lock.
> Without a pause to moderate the cmpxchg() call, that is especially bad
> for performance. Is the performance data above refers to the unfair byte
> lock versus your new spinlock?
>
> Cheers,
> Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ