[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pnejf6fz.fsf@x220.int.ebiederm.org>
Date: Wed, 12 Feb 2020 22:37:52 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Al Viro <viro@...iv.linux.org.uk>,
LKML <linux-kernel@...r.kernel.org>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Linux API <linux-api@...r.kernel.org>,
Linux FS Devel <linux-fsdevel@...r.kernel.org>,
Linux Security Module <linux-security-module@...r.kernel.org>,
Akinobu Mita <akinobu.mita@...il.com>,
Alexey Dobriyan <adobriyan@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Daniel Micay <danielmicay@...il.com>,
Djalal Harouni <tixxdz@...il.com>,
"Dmitry V . Levin" <ldv@...linux.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Ingo Molnar <mingo@...nel.org>,
"J . Bruce Fields" <bfields@...ldses.org>,
Jeff Layton <jlayton@...chiereds.net>,
Jonathan Corbet <corbet@....net>,
Kees Cook <keescook@...omium.org>,
Oleg Nesterov <oleg@...hat.com>,
Solar Designer <solar@...nwall.com>
Subject: Re: [PATCH v8 07/11] proc: flush task dcache entries from all procfs instances
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Wed, Feb 12, 2020 at 1:48 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> The good news is proc_flush_task isn't exactly called from process exit.
>> proc_flush_task is called during zombie clean up. AKA release_task.
>
> Yeah, that at least avoids some of the nasty locking while dying debug problems.
>
> But the one I was more worried about was actually the lock contention
> issue with lots of processes. The lock is basically a single global
> lock in many situations - yes, it's technically per-ns, but in a lot
> of cases you really only have one namespace anyway.
>
> And we've had problems with global locks in this area before, notably
> the one you call out:
>
>> Further after proc_flush_task does it's thing the code goes
>> and does "write_lock_irq(&task_list_lock);"
>
> Yeah, so it's not introducing a new issue, but it is potentially
> making something we already know is bad even worse.
>
>> What would be downside of having a mutex for a list of proc superblocks?
>> A mutex that is taken for both reading and writing the list.
>
> That's what the original patch actually was, and I was hoping we could
> avoid that thing.
>
> An rwsem would be possibly better, since most cases by far are likely
> about reading.
>
> And yes, I'm very aware of the task_list_lock, but it's literally why
> I don't want to make a new one.
>
> I'm _hoping_ we can some day come up with something better than
> task_list_lock.
Yes. I understand that.
I occassionally play with ideas, and converted all of proc to rcu
to help with situation but I haven't come up with anything clearly
better.
All of this is why I was really hoping we could have a change in
strategy and see if we can make the shrinker be able to better prune
proc inodes.
I think I have an alternate idea that could work. Add some extra code
into proc_task_readdir, that would look for dentries that no longer
point to tasks and d_invalidate them. With the same logic probably
being called from a few more places as well like proc_pid_readdir,
proc_task_lookup, and proc_pid_lookup.
We could even optimize it and have a process died flag we set in the
superblock.
That would would batch up the freeing work until the next time someone
reads from proc in a way that would create more dentries. So it would
prevent dentries from reaped zombies from growing without bound.
Hmm. Given the existence of proc_fill_cache it would really be a good
idea if readdir and lookup performed some of the freeing work as well.
As on readdir we always populate the dcache for all of the directory
entries.
I am booked solid for the next little while but if no one beats me to it
I will try and code something like that up where at least readdir
looks for and invalidates stale dentries.
Eric
Powered by blists - more mailing lists