lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 31 Oct 2017 00:44:26 +0000
From:   Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:     gengdongjiu <gengdongjiu@...wei.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: consult a question about action_result() in memory_failure()

Hi gengdongjiu,

On Tue, Oct 24, 2017 at 08:47:41PM +0800, gengdongjiu wrote:
> Hi Naoya,
>    very sorry to disturb you, I want to consult you about the handing to error page type in memory_failure().
> If the error page is the current task's page table, will the memory_failure not handling that?
> From my test, I found the memory_failure() consider the error page table physical address as unknown page.
> why it does not handling the page table page error? Thanks a lot.

I think that that's because it's handled not in the context of
memory error handling, but in MCE's context.

When your hardware detects a memory error on a page table page
(f.e. memory scrubbing running in background), MCE SRAO is sent to
the kernel, and the kernel kicks memory error handler.
But memory error handler does nothing because there's currently
no way to isolate the page table page. I think that a main problem
is that no one easily knows "which processes owned the page table page."
So the error page is still open for access, then later some CPU
try to access the page table page, which triggers severer MCE SRAR.
Then in this time, MCE handler tries to kill the process of current
context (hoping that it's the right process to be killed.)
# For errors on "kernel" page table pages, there's no choice other
# than panic...

So the current situation not the worst, but still open for improvement.
Any suggestion to handle it in memory error handling would be wonderful.

Thanks,
Naoya Horiguchi


> 
> commit 64d37a2baf5e5c0f1009c0ef290a9027de721d66
> Author: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
> Date:   Wed Apr 15 16:13:05 2015 -0700
> 
>     mm/memory-failure.c: define page types for action_result() in one place
> 
>     This cleanup patch moves all strings passed to action_result() into a
>     singl= e array action_page_type so that a reader can easily find which
>     kind of actio= n results are possible.  And this patch also fixes the
>     odd lines to be printed out, like "unknown page state page" or "free
>     buddy, 2nd try page".
> 
>     [akpm@...ux-foundation.org: rename messages, per David]
>     [akpm@...ux-foundation.org: s/DIRTY_UNEVICTABLE_LRU/CLEAN_UNEVICTABLE_LRU', per Andi]
>     Signed-off-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
>     Reviewed-by: Andi Kleen <ak@...ux.intel.com>
>     Cc: Tony Luck <tony.luck@...el.com>
>     Cc: "Xie XiuQi" <xiexiuqi@...wei.com>
>     Cc: Steven Rostedt <rostedt@...dmis.org>
>     Cc: Chen Gong <gong.chen@...ux.intel.com>
>     Cc: David Rientjes <rientjes@...gle.com>
>     Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d487f8d..5fd8931 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -521,6 +521,52 @@ static const char *action_name[] = {
>         [RECOVERED] = "Recovered",
>  };
> 
> +enum action_page_type {
> +       MSG_KERNEL,
> +       MSG_KERNEL_HIGH_ORDER,
> +       MSG_SLAB,
> +       MSG_DIFFERENT_COMPOUND,
> +       MSG_POISONED_HUGE,
> +       MSG_HUGE,
> +       MSG_FREE_HUGE,
> +       MSG_UNMAP_FAILED,
> +       MSG_DIRTY_SWAPCACHE,
> +       MSG_CLEAN_SWAPCACHE,
> +       MSG_DIRTY_MLOCKED_LRU,
> +       MSG_CLEAN_MLOCKED_LRU,
> +       MSG_DIRTY_UNEVICTABLE_LRU,
> +       MSG_CLEAN_UNEVICTABLE_LRU,
> +       MSG_DIRTY_LRU,
> +       MSG_CLEAN_LRU,
> +       MSG_TRUNCATED_LRU,
> +       MSG_BUDDY,
> +       MSG_BUDDY_2ND,
> +       MSG_UNKNOWN,
> +};
> +
> +static const char * const action_page_types[] = {
> +       [MSG_KERNEL]                    = "reserved kernel page",
> +       [MSG_KERNEL_HIGH_ORDER]         = "high-order kernel page",
> +       [MSG_SLAB]                      = "kernel slab page",
> +       [MSG_DIFFERENT_COMPOUND]        = "different compound page after locking",
> +       [MSG_POISONED_HUGE]             = "huge page already hardware poisoned",
> +       [MSG_HUGE]                      = "huge page",
> +       [MSG_FREE_HUGE]                 = "free huge page",
> +       [MSG_UNMAP_FAILED]              = "unmapping failed page",
> +       [MSG_DIRTY_SWAPCACHE]           = "dirty swapcache page",
> +       [MSG_CLEAN_SWAPCACHE]           = "clean swapcache page",
> +       [MSG_DIRTY_MLOCKED_LRU]         = "dirty mlocked LRU page",
> +       [MSG_CLEAN_MLOCKED_LRU]         = "clean mlocked LRU page",
> +       [MSG_DIRTY_UNEVICTABLE_LRU]     = "dirty unevictable LRU page",
> +       [MSG_CLEAN_UNEVICTABLE_LRU]     = "clean unevictable LRU page",
> +       [MSG_DIRTY_LRU]                 = "dirty LRU page",
> +       [MSG_CLEAN_LRU]                 = "clean LRU page",
> +       [MSG_TRUNCATED_LRU]             = "already truncated LRU page",
> +       [MSG_BUDDY]                     = "free buddy page",
> +       [MSG_BUDDY_2ND]                 = "free buddy page (2nd try)",
> +       [MSG_UNKNOWN]                   = "unknown page",
> +};
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ