linux-kernel - Re: [PATCH] android: binder: Disable preemption while holding the global binder lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMP5Xgfo86rWxVdhdOCg6D7W4-T60jQR+-5b6W12QWaz7VL7jQ@mail.gmail.com>
Date:   Tue, 13 Sep 2016 12:52:35 -0700
From:   Arve Hjønnevåg <arve@...roid.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     "devel@...verdev.osuosl.org" <devel@...verdev.osuosl.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Riley Andrews <riandrews@...roid.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH] android: binder: Disable preemption while holding the
 global binder lock

On Mon, Sep 12, 2016 at 11:42 PM, Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
> On Mon, Sep 12, 2016 at 08:44:09PM -0700, Arve Hjønnevåg wrote:
>> On Sat, Sep 10, 2016 at 10:28 AM, Greg Kroah-Hartman
>> <gregkh@...uxfoundation.org> wrote:
>> > On Sat, Sep 10, 2016 at 06:37:29PM +0200, Thomas Gleixner wrote:
>> >> On Sat, 10 Sep 2016, Peter Zijlstra wrote:
>> >>
>> >> > On Sat, Sep 10, 2016 at 09:16:59AM -0700, Christoph Hellwig wrote:
>> >> > > On Thu, Sep 08, 2016 at 09:12:50AM -0700, Todd Kjos wrote:
>> >> > > > In Android systems, the display pipeline relies on low
>> >> > > > latency binder transactions and is therefore sensitive to
>> >> > > > delays caused by contention for the global binder lock.
>> >> > > > Jank is siginificantly reduced by disabling preemption
>> >> > > > while the global binder lock is held.
>> >> > >
>> >> > > That's now how preempt_disable is supposed to use.  It is for critical
>> >> >
>> >> > not, that's supposed to be _not_. Just to be absolutely clear, this is
>> >> > NOT how you're supposed to use preempt_disable().
>> >> >
>> >> > > sections that use per-cpu or similar resources.
>> >> > >
>> >> > > >
>> >> > > > Originally-from: Riley Andrews <riandrews@...gle.com>
>> >> > > > Signed-off-by: Todd Kjos <tkjos@...gle.com>
>> >> >
>> >> > > > @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
>> >> > > > binder_proc *proc, int flags)
>> >> > > >   rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
>> >> > > >   unlock_task_sighand(proc->tsk, &irqs);
>> >> > > >
>> >> > > > - return __alloc_fd(files, 0, rlim_cur, flags);
>> >> > > > + preempt_enable_no_resched();
>> >> > > > + ret = __alloc_fd(files, 0, rlim_cur, flags);
>> >> > > > + preempt_disable();
>> >> >
>> >> > And the fact that people want to use preempt_enable_no_resched() shows
>> >> > that they're absolutely clueless.
>> >> >
>> >> > This is so broken its not funny.
>> >> >
>> >> > NAK NAK NAK
>> >>
>> >> Indeed. Sprinkling random preempt_enabe/disable() pairs all over the place
>> >> documents clearly that this is tinkering and not proper software
>> >> engineering.
>> >
>> > I have pointed out in the other thread for this patch (the one that had
>> > a patch that could be applied) that the single lock in the binder code
>> > is the main problem here, it should be solved instead of this messing
>> > around with priorities.
>> >
>>
>> While removing the single lock in the binder driver would help reduce
>> the problem that this patch tries to work around, it would not fix it.
>> The largest problems occur when a very low priority thread gets
>> preempted while holding the lock. When a high priority thread then
>> needs the same lock it can't get it. Changing the driver to use more
>> fine-grained locking would reduce the set of threads that can trigger
>> this problem, but there are processes that receive work from both high
>> and low priority threads and could still end up in the same situation.
>
> Ok, but how would this patch solve the problem?  It only reduces the
> window of when the preemption could occur, it doesn't stop it from
> happening entirely.
>

I agree, this patch does not _solve_ the problem either. It only
reduces it, as there are many places where it re-enables preemption
for significant work.

>> A previous attempt to fix this problem, changed the lock to use
>> rt_mutex instead of mutex, but this apparently did not work as well as
>> this patch. I believe the added overhead was noticeable, and it did
>> not work when the preempted thread was in a different cgroup (I don't
>> know if this is still the case).
>
> Why would rt_mutex solve this?

If the task that holds the lock would run when you try to get the
lock, there would be no long delay to get the lock. rt_mutex seems to
be designed to solve this problem.

>
> I worry that this is only going to get worse as more portions of the
> Android userspace/HAL get rewritten to use binder more and more.  What
> about trying to get rid of the lock entirely?  That way the task
> priority levels would work "properly" here.
>

I think a good first step would be to switch to locks with a smaller
scope. I'm not how useful it would be to eliminate locks in this
driver since it calls into other code that uses locks.

> thanks,
>
> greg k-h

-- 
Arve Hjønnevåg