linux-kernel - Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that try to kill a task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <a42b1bc3882c4693ad8eadfa68659601@cnbox3.mioffice.cn>
Date:	Tue, 14 Oct 2014 06:51:26 +0000
From:	朱辉 <zhuhui@...omi.com>
To:	Rik van Riel <riel@...hat.com>,
	朱辉 <zhuhui@...omi.com>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"rientjes@...gle.com" <rientjes@...gle.com>,
	"vinayakm.list@...il.com" <vinayakm.list@...il.com>,
	"weijie.yang@...sung.com" <weijie.yang@...sung.com>
CC:	"devel@...verdev.osuosl.org" <devel@...verdev.osuosl.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"teawater@...il.com" <teawater@...il.com>
Subject: Re: [PATCH] Fix the issue that lowmemkiller fell into a cycle that
 try to kill a task

2014 09 24 23:36, Rik van Riel:
> On 09/22/2014 10:57 PM, Hui Zhu wrote:
>> The cause of this issue is when free memroy size is low and a lot of task is
>> trying to shrink the memory, the task that is killed by lowmemkiller cannot get
>> CPU to exit itself.
>>
>> Fix this issue with change the scheduling policy to SCHED_FIFO if a task's flag
>> is TIF_MEMDIE in lowmemkiller.
>
> Is it actually true that the task that was killed by lowmemkiller
> cannot get CPU time?

I am so sorry that answer this mail late because I tried to do more test 
around it.
But this issue is really hard to reproduce the issue.  I got a special 
app that can reproduce this issue easyly. But I still need retry a lot 
of times to repdroduce this issue.

And I found that most of time, the task cannot be killed because it is 
blocked by binder_lock.
It looks like there are something wrong with a task that get binder_lock 
and it is blocked by another thing.

So I make a patch that change a binder_lock to binder_lock_killable to 
handle this issue.(I will post it later)
It work sometime but I am not sure it is right.
And I just met one time, the kernel with the binder patch and without 
the lowmemkiller SCHED_FIFO patch, a task that didn't blocked by a lock. 
  And different tasks call lowmemkiller tried to kill this task.
I think the root cause of this issue is killed task cannot get cpu.
But I just got this issue one time.

>
> It is also possible that the task is busy in the kernel, for example
> in the reclaim code, and is not breaking out of some loop fast enough,
> despite the TIF_MEMDIE flag being set.
>
> I suspect SCHED_FIFO simply papers over that kind of issue, by not
> letting anything else run until the task is gone, instead of fixing
> the root cause of the problem.
>
>

According to I introduction, I think lowmemkiller SCHED_FIFO patch maybe 
can handle some issue.

Thanks,
Hui