linux-kernel - Re: [RFC v1 0/5] fs/locks: Use plain percpu spinlocks instead of lglock to protect file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Mar 2015 13:58:17 +0100
From:	Daniel Wagner <daniel.wagner@...-carit.de>
To:	Jeff Layton <jlayton@...chiereds.net>
CC:	Andi Kleen <andi@...stfloor.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, John Kacur <jkacur@...hat.com>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	"J. Bruce Fields" <bfields@...ldses.org>
Subject: Re: [RFC v1 0/5] fs/locks: Use plain percpu spinlocks instead of
 lglock to protect file_lock

On 02/27/2015 04:30 PM, Jeff Layton wrote:
> On Fri, 27 Feb 2015 16:01:30 +0100
> Daniel Wagner <daniel.wagner@...-carit.de> wrote:
>> On 02/24/2015 10:06 PM, Jeff Layton wrote:
>>> On Tue, 24 Feb 2015 16:58:26 +0100
>>> Daniel Wagner <daniel.wagner@...-carit.de> wrote:
>>>> On 02/20/2015 05:05 PM, Andi Kleen wrote:
>>>>> Daniel Wagner <daniel.wagner@...-carit.de> writes:
>>>>>>
>>>>>> I am looking at how to get rid of lglock. Reason being -rt is not too
>>>>>> happy with that lock, especially that it uses arch_spinlock_t and
>>>>>
>>>>> AFAIK it could just use normal spinlock. Have you tried that?
>>>>
>>>> I have tried it. At least fs/locks.c didn't blow up. The benchmark
>>>> results (lockperf) indicated that using normal spinlocks is even
>>>> slightly faster. Simply converting felt like cheating. It might be
>>>> necessary for the other user (kernel/stop_machine.c). Currently it looks
>>>> like there is some additional benefit getting lglock away in fs/locks.c.
>>>>
>>>
>>> What would that benefit be?
>>>
>>> lglocks are basically percpu spinlocks. Fixing some underlying
>>> infrastructure that provides that seems like it might be a better
>>> approach than declaring them "manually" and avoiding them altogether.
>>>
>>> Note that you can still do basically what you're proposing here with
>>> lglocks as well. Avoid using lg_global_* and just lock each one in
>>> turn.
>>
>> Yes, that was I was referring to as benefit. My main point is that there
>> are only lg_local_* calls we could as well use normal spinlocks. No need
>> to fancy.
>>
> 
> Sure, but the lg_lock wrappers are a nice abstraction for this. I don't
> think that we gain much by eliminating them. Changing the lglock code
> to use normal spinlocks would also have the benefit of fixing up the
> other user of that code.

Obviously, you only need lglock if you take all locks at once. As point
out, accessing /proc/locks is not something what happens very often. My
hope was to get a bigger box in time to measure how expensive such an
operation could get if many cores are involved. On my small system,
there is real gain or loss by this change.

>>> That said, now that I've thought about this, I'm not sure that's really
>>> something we want to do when accessing /proc/locks. If you lock each
>>> one in turn, then you aren't freezing the state of the file_lock_list
>>> percpu lists. Won't that mean that you aren't necessarily getting a
>>> consistent view of the locks on those lists when you cat /proc/locks?
>>
>> Maybe I am overlooking something here but I don't see a consistency
>> problem. We list a blocker and all its waiter in a go since only the
>> blocker is added to flock_lock_list and the waiters are added blocker's
>> fl_block list.
>>
> 
> Other locking activity could be happening at the same time. For
> instance, between when you drop one CPU's spinlock and pick up another,
> the lock that you just printed could be acquired by another thread on
> another CPU and then you go print it again. Now you're showing
> conflicting locks in /proc/locks output.

Hmm, are you sure about that? I read the code this way that when a lock
is added to flock_list it stays on that CPU. The locks are not moved
from one flock_list to another during their existent.

> Is that a real problem? I've no idea -- we don't have a lot of guidance
> for what sort of atomicity /proc/locks needs, but that seems wrong to
> me.

During the timeframe when taking all locks are taken, locks can be
created or destroyed on those CPU which spinlock is not taken yet. I
don't know if I would even use the word 'atomicity' here since any short
lived process is likely to be missed.

Even a busy loop reading /proc/locks is likely to missing the flock02
processes for example.

My point is that you do not really gain anything from taking all locks
before iterating over the percpu flock_list vs locking/unlocking while
iterating.

> I also just don't see much benefit in optimizing /proc/locks access.

FWIW we could avoid taking a non scaling lock.

> That's only done very rarely under most workloads. Locking all of the
> spinlocks when you want to read from it sounds 100% fine to me and that
> may help prevent these sorts of consistency problems. 

If they exists :)

> It also has the benefit of keeping the /proc/locks seqfile code simpler.

The resulting code is almost the same. The locking part is in hidden in
seq_hlist_start_percpu_locked() and friends.

>>> I think having a consistent view there might trump any benefit to
>>> performance. Reading /proc/locks is a *very* rare activity in the big
>>> scheme of things.
>>
>> I agree, but I hope that I got it right with my consistency argument
>> than there shouldn't be a problem.
>>
>>> I do however like the idea of moving more to be protected by the
>>> lglocks, and minimizing usage of the blocked_lock_lock.
>>
>> Good to hear. I am trying to write a new test (a variation of the
>> dinning philosophers 'problem') case which benchmarks blocked_lock_lock
>> after the re-factoring.
>>
> 
> Sounds good. I may go ahead and pick up the first couple of patches and
> queue them for v4.1 since they seem like reasonable cleanups. I'll let
> you know once I've done that.

Great.

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/