lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <881b990a-2198-8e80-036e-bfa6f88070ff@linux.alibaba.com>
Date:   Thu, 28 May 2020 14:50:09 +0800
From:   wetp <wetp.zy@...ux.alibaba.com>
To:     HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
Cc:     "n-horiguchi@...jp.nec.com" <n-horiguchi@...jp.nec.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, memory_failure: only send BUS_MCEERR_AO to early-kill
 process


On 2020/5/28 上午10:22, HORIGUCHI NAOYA(堀口 直也) wrote:
> Hi Zhang,
>
> Sorry for my late response.
>
> On Tue, May 26, 2020 at 03:06:41PM +0800, Wetp Zhang wrote:
>> From: Zhang Yi <wetpzy@...il.com>
>>
>> If a process don't need early-kill, it may not care the BUS_MCEERR_AO.
>> Let the process to be killed when it really access the corrupted memory.
>>
>> Signed-off-by: Zhang Yi <wetpzy@...il.com>
> Thank you for pointing this. This looks to me a bug (per-process flag
> is ignored when system-wide flag is set).

The flag is not problem for me.

In my case, two processes share memory with no any flag setting, both 
will be killed when only one

access the fail memory.

>> ---
>>   mm/memory-failure.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index a96364be8ab4..2db13d48865c 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -210,7 +210,7 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
>>   {
>>   	struct task_struct *t = tk->tsk;
>>   	short addr_lsb = tk->size_shift;
>> -	int ret;
>> +	int ret = 0;
>>
>>   	pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
>>   		pfn, t->comm, t->pid);
>> @@ -225,8 +225,9 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
>>   		 * This could cause a loop when the user sets SIGBUS
>>   		 * to SIG_IGN, but hopefully no one will do that?
>>   		 */
>> -		ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
>> -				      addr_lsb, t);  /* synchronous? */
>> +		if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
>> +			ret = send_sig_mceerr(BUS_MCEERR_AO,
>> +				(void __user *)tk->addr, addr_lsb, t);
> kill_proc() could be called only for processes that are selected by
> collect_procs() with task_early_kill().  So I think that we should fix
> task_early_kill(), maybe by reordering sysctl_memory_failure_early_kill
> check and find_early_kill_thread() check.
>
>      static struct task_struct *task_early_kill(struct task_struct *tsk,
>                                                 int force_early)
>      {
>              struct task_struct *t;
>              if (!tsk->mm)
>                      return NULL;
>              if (force_early)
>                      return tsk;

The force_early is rely the flag MF_ACTION_REQUIRED, so it is always 
true when MCE occurs.

This leads always sending SIGBUS to processes even if those are not 
current or no flag setting.

  I think it could keep the non-current processes which has no flag 
setting running.


Besides, base on your recommendation I reorder the force_early check and 
find_early_kill_thread()

check, to send the signal to the right thread.

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2db13d48865c..33a87d7b3e61 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c

@@ -399,11 +402,11 @@ static struct task_struct *task_early_kill(struct task_struct *tsk,
  	struct task_struct *t;
  	if (!tsk->mm)
  		return NULL;
-	if (force_early)
-		return tsk;
  	t = find_early_kill_thread(tsk);
  	if (t)
  		return t;
+	if (force_early && tsk->mm == current->mm)
+		return tsk;
  	if (sysctl_memory_failure_early_kill)
  		return tsk;
  	return NULL;

>              t = find_early_kill_thread(tsk);
>              if (t)
>                      return t;
>              if (sysctl_memory_failure_early_kill)
>                      return tsk;
>              return NULL;
>      }
>
> One subtleness is to make sure that find_early_kill_thread() should distinguish
> default value and explicitly set value, so we might need some modification
> on find_early_kill_thread().
>
> Can you try that?
>
> Thanks,
> Naoya Horiguchi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ