linux-kernel - Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87induxd3u.fsf@notabene.neil.brown.name>
Date:   Wed, 29 Nov 2017 09:17:09 +1100
From:   NeilBrown <neilb@...e.com>
To:     paulmck@...ux.vnet.ibm.com, Florian Weimer <fweimer@...hat.com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>
Subject: Re: [PATCH] VFS: use synchronize_rcu_expedited() in namespace_unlock()

On Mon, Nov 27 2017, Paul E. McKenney wrote:

> On Mon, Nov 27, 2017 at 12:27:04PM +0100, Florian Weimer wrote:
>> On 10/26/2017 02:27 PM, Paul E. McKenney wrote:
>> >But just for completeness, one way to make this work across the board
>> >might be to instead use call_rcu(), with the callback function kicking
>> >off a workqueue handler to do the rest of the unmount.  Of course,
>> >in saying that, I am ignoring any mutexes that you might be holding
>> >across this whole thing, and also ignoring any problems that might arise
>> >when returning to userspace with some portion of the unmount operation
>> >still pending.  (For example, someone unmounting a filesystem and then
>> >immediately remounting that same filesystem.)
>> 
>> You really need to complete all side effects of deallocating a
>> resource before returning to user space.  Otherwise, it will never
>> be possible to allocate and deallocate resources in a tight loop
>> because you either get spurious failures because too many
>> unaccounted deallocations are stuck somewhere in the system (and the
>> user can't tell that this is due to a race), or you get an OOM
>> because the user manages to queue up too much state.
>> 
>> We already have this problem with RLIMIT_NPROC, where waitpid etc.
>> return before the process is completely gone.  On some
>> kernels/configurations, the resulting race is so wide that parallel
>> make no longer works reliable because it runs into fork failures.
>
> Or alternatively, use rcu_barrier() occasionally to wait for all
> preceding deferred deallocations.  And there are quite a few other
> ways to take on this problem.

So, supposing we could package up everything that has to happen after
the current synchronize_rcu() and put it in an call_rcu() call back,
then instead of calling synchronize_rcu_expedited() at the end of
namespace_unlock(), we could possibly call call_rcu() there and
rcu_barrier() at the start of namespace_lock().....

That would mean a single unmount would have low impact, but it would
still slow down a sequence of 1000 consecutive unmounts.
Maybe we would only need the rcu_barrier() before select
namespace_lock() calls.  I would need to study the code closely to
form an opinion.  Interesting idea though.

Hopefully the _expedited() patch will be accepted - I haven't had a
"nak" yet...

thanks,
NeilBrown


Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)