linux-kernel - Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=X39vVZ0v2_Z5rDrJt1JcdW+N8jRkhc0bz4yqqtox2=4Q@mail.gmail.com>
Date:   Fri, 31 Mar 2017 14:00:13 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     arve@...roid.com, riandrews@...roid.com, tkjos@...gle.com,
        devel@...verdev.osuosl.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Hi,

On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <gregkh@...uxfoundation.org> wrote:
> On Fri, Mar 31, 2017 at 10:53:41AM -0700, Douglas Anderson wrote:
>> Sometimes when we're out of memory the OOM killer decides to kill a
>> process that's in binder_thread_read().  If we happen to be waiting
>> for work we'll get the kill signal and wake up.  That's good.  ...but
>> then we try to grab the binder lock before we return.  That's bad.
>>
>> The problem is that someone else might be holding the one true global
>> binder lock.  If that one other process is blocked then we can't
>> finish exiting.  In the worst case, the other process might be blocked
>> waiting for memory.  In that case we'll have a really hard time
>> exiting.
>>
>> On older kernels that don't have the OOM reaper (or something
>> similar), like kernel 4.4, this is a really big problem and we end up
>> with a simple deadlock because:
>> * Once we pick a process to OOM kill we won't pick another--we first
>>   wait for the process we picked to die.  The reasoning is that we've
>>   given the doomed process access to special memory pools so it can
>>   quit quickly and we don't have special pool memory to go around.
>> * We don't have any type of "special access donation" that would give
>>   the mutex holder our special access.
>>
>> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> <snip>
>
> How does your change interact with the recent "break up the binder big
> lock" patchset:
>         https://android-review.googlesource.com/#/c/354698/
>
> Have you tried that series out to see if it helps out any?

I wasn't aware of that patchset.  Someone else on my team mentioned
that fine-grained locking was being worked on but I didn't know
patches were actually posted...  Probably it makes sense to just drop
my patch, then.  It was only making things marginally better even on
kernel 4.4 because I would just hit the next task that would refuse to
quit for a non-binder related reason.  :(

BTW: I presume that nobody has decided that it would be a wise idea to
pick the OOM reaper code back to any stable trees?  It seemed a bit
too scary to me, so I wrote a dumber (but easier to backport) solution
that avoided the deadlocks I was seeing.  http://crosreview.com/465189
and the 3 patches above it in case anyone else stumbles on this thread
and is curious.


-Doug