linux-kernel - Re: [PATCH v8 07/11] proc: flush task dcache entries from all procfs instances

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pnejf6fz.fsf@x220.int.ebiederm.org>
Date:   Wed, 12 Feb 2020 22:37:52 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Al Viro <viro@...iv.linux.org.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        Linux API <linux-api@...r.kernel.org>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        Linux Security Module <linux-security-module@...r.kernel.org>,
        Akinobu Mita <akinobu.mita@...il.com>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Daniel Micay <danielmicay@...il.com>,
        Djalal Harouni <tixxdz@...il.com>,
        "Dmitry V . Levin" <ldv@...linux.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "J . Bruce Fields" <bfields@...ldses.org>,
        Jeff Layton <jlayton@...chiereds.net>,
        Jonathan Corbet <corbet@....net>,
        Kees Cook <keescook@...omium.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Solar Designer <solar@...nwall.com>
Subject: Re: [PATCH v8 07/11] proc: flush task dcache entries from all procfs instances

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Wed, Feb 12, 2020 at 1:48 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> The good news is proc_flush_task isn't exactly called from process exit.
>> proc_flush_task is called during zombie clean up. AKA release_task.
>
> Yeah, that at least avoids some of the nasty locking while dying debug problems.
>
> But the one I was more worried about was actually the lock contention
> issue with lots of processes. The lock is basically a single global
> lock in many situations - yes, it's technically per-ns, but in a lot
> of cases you really only have one namespace anyway.
>
> And we've had problems with global locks in this area before, notably
> the one you call out:
>
>> Further after proc_flush_task does it's thing the code goes
>> and does "write_lock_irq(&task_list_lock);"
>
> Yeah, so it's not introducing a new issue, but it is potentially
> making something we already know is bad even worse.
>
>> What would be downside of having a mutex for a list of proc superblocks?
>> A mutex that is taken for both reading and writing the list.
>
> That's what the original patch actually was, and I was hoping we could
> avoid that thing.
>
> An rwsem would be possibly better, since most cases by far are likely
> about reading.
>
> And yes, I'm very aware of the task_list_lock, but it's literally why
> I don't want to make a new one.
>
> I'm _hoping_ we can some day come up with something better than
> task_list_lock.

Yes.  I understand that.

I occassionally play with ideas, and converted all of proc to rcu
to help with situation but I haven't come up with anything clearly
better.

All of this is why I was really hoping we could have a change in
strategy and see if we can make the shrinker be able to better prune
proc inodes.

I think I have an alternate idea that could work.  Add some extra code
into proc_task_readdir, that would look for dentries that no longer
point to tasks and d_invalidate them.  With the same logic probably
being called from a few more places as well like proc_pid_readdir,
proc_task_lookup, and proc_pid_lookup.

We could even optimize it and have a process died flag we set in the
superblock.

That would would batch up the freeing work until the next time someone
reads from proc in a way that would create more dentries.  So it would
prevent dentries from reaped zombies from growing without bound.

Hmm.  Given the existence of proc_fill_cache it would really be a good
idea if readdir and lookup performed some of the freeing work as well.
As on readdir we always populate the dcache for all of the directory
entries.

I am booked solid for the next little while but if no one beats me to it
I will try and code something like that up where at least readdir
looks for and invalidates stale dentries.

Eric