linux-kernel - Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090604051346.GM1065@one.firstfloor.org>
Date:	Thu, 4 Jun 2009 07:13:46 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	"hugh.dickins@...cali.co.uk" <hugh.dickins@...cali.co.uk>,
	"npiggin@...e.de" <npiggin@...e.de>,
	"riel@...hat.com" <riel@...hat.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5

On Thu, Jun 04, 2009 at 11:24:41AM +0800, Wu Fengguang wrote:
> On Thu, Jun 04, 2009 at 02:46:47AM +0800, Andi Kleen wrote:
> 
> [snip]
> 
> This patch is full of this style error (the old version didn't have
> this problem though):

I don't see that here. At least nothing new compared to old.

> 
>         ERROR: code indent should use tabs where possible

It's checkpath clean for me, except for a few > 80 lines on printks,
one list_for_each_entry_safe (which I think checkpatch is wrong on) and
the meminfo comma error which I also think checkpath.pl is wrong on too.

> > +               page_cache_release(p);
> > +
> > +       /*
> > +        * Now truncate the page in the page cache. This is really
> > +        * more like a "temporary hole punch"
> > +        * Don't do this for block devices when someone else
> > +        * has a reference, because it could be file system metadata
> > +        * and that's not safe to truncate.
> > +        */
> > +       mapping = page_mapping(p);
> > +       if (mapping && S_ISBLK(mapping->host->i_mode) && page_count(p) > 1) {
> 
> Shall use (page_count > 2) to count for the page cache reference.

I think the page cache reference got dropped in

	  if (!isolate_lru_page(p))
                page_cache_release(p);

So it should be only one if there are no other users

> Or can we base the test on busy buffers instead of page count?  Nick?

At least the S_ISBLK test is the best one I came up with. I'm not 
saying it's the absolutely best.

> > +       SetPageError(p);
> > +       /* TBD: print more information about the file. */
> > +       if (mapping) {
> > +               /*
> > +                * IO error will be reported by write(), fsync(), etc.
> > +                * who check the mapping.
> 
> btw, here are some side notes on EIO.
> 
> close() *may* also report it. NFS will sync file on close.

I think the comment is already too verbose, sure there are other
details too that it doesn't describe. It's not trying to be a
full reference on linux error reporting. So I prefer to not
add more cases.

> > +                * at the wrong time.
> > +                *
> > +                * So right now we assume that the application DTRT on
> 
> DTRT = do the return value test?

Do The Right Thing

> > +};
> > +
> > +static void action_result(unsigned long pfn, char *msg, int ret)
> 
> rename 'ret' to 'action'?

But's not an action (as in a page state handler), it's a return value?
(RECOVERED, FAILED etc.) I can name it result.

> > +        * need this to decide if we should kill or just drop the page.
> > +        */
> > +       mapping = page_mapping(p);
> > +       if (!PageDirty(p) && !PageAnon(p) && !PageSwapBacked(p) &&
> 
> !PageAnon(p) could be removed: the below non-zero mapping check will
> do the work implicitly.

You mean !page_mapped?  Ok.

> > +                       kill = 0;
> > +                       printk(KERN_INFO
> > +       "MCE %#lx: corrupted page was clean: dropped without side effects\n",
> > +                               pfn);
> > +                       ttu |= TTU_IGNORE_HWPOISON;
> 
> Why not put the two assignment lines together? :)

Ok. But that was your patch @)

> > +        * Try a few times (RED-PEN better strategy?)
> > +        */
> > +       for (i = 0; i < N_UNMAP_TRIES; i++) {
> > +               ret = try_to_unmap(p, ttu);
> > +               if (ret == SWAP_SUCCESS)
> > +                       break;
> > +               pr_debug("MCE %#lx: try_to_unmap retry needed %d\n", pfn,  ret);
> 
> Can we make it a printk? This is a serious accident.

I think it can actually happen due to races, e.g. when a remap
is currently in process.

> > +        */
> > +       hwpoison_user_mappings(p, pfn, trapno);
> > +
> > +       /*
> > +        * Torn down by someone else?
> > +        */
> > +       if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) {
> > +               action_result(pfn, "already unmapped LRU", IGNORED);
> 
> "NULL mapping LRU" or "already truncated page"?
> At least page_mapped != page_mapping.

It's "already truncated" now.

> > @@ -1311,6 +1311,20 @@
> >                 .mode           = 0644,
> >                 .proc_handler   = &scan_unevictable_handler,
> >         },
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +       {
> > +               .ctl_name       = CTL_UNNUMBERED,
> > +               .procname       = "memory_failure_early_kill",
> > +               .data           = &sysctl_memory_failure_early_kill,
> > +               .maxlen         = sizeof(vm_highmem_is_dirtyable),
> 
> s/vm_highmem_is_dirtyable/sysctl_memory_failure_early_kill/

Fixed thanks.

> >   * Documentation/sysctl/ctl_unnumbered.txt
> > Index: linux/fs/proc/meminfo.c
> > ===================================================================
> > --- linux.orig/fs/proc/meminfo.c        2009-06-03 19:37:38.000000000 +0200
> > +++ linux/fs/proc/meminfo.c     2009-06-03 20:13:43.000000000 +0200
> > @@ -95,7 +95,11 @@
> >                 "Committed_AS:   %8lu kB\n"
> >                 "VmallocTotal:   %8lu kB\n"
> >                 "VmallocUsed:    %8lu kB\n"
> > -               "VmallocChunk:   %8lu kB\n",
> > +               "VmallocChunk:   %8lu kB\n"
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +               "BadPages:       %8lu kB\n"
> 
> "HWPoison:" or something like that? 
> People is more likely to misinterpret "BadPages".

I'll name it HardwareCorrupted. That makes it too long, but it's hopefully
clearer.

> >                 vmi.used >> 10,
> >                 vmi.largest_chunk >> 10
> > +#ifdef CONFIG_MEMORY_FAILURE
> > +               ,atomic_long_read(&mce_bad_pages) << (PAGE_SHIFT - 10)
> 
> ERROR: space required after that ','

That's one of the cases where checkpatch.pl is stupid. The lone comma
with a space looks absolutely ridiculous to me. I refuse to do ridiculous
things things just for checkpatch.pl deficiencies.

> >           Enable the KSM kernel module to allow page sharing of equal pages
> >           among different tasks.
> > 
> > +config MEMORY_FAILURE
> > +       bool
> > +
> 
> Do we have code to automatically enable/disable CONFIG_MEMORY_FAILURE
> based on hardware capability?

Yes the architecture can enable it. There's also another patch
which always enables it for testing.

> > +
> > +Control how to kill processes when uncorrected memory error (typically
> > +a 2bit error in a memory module) is detected in the background by hardware.
> > +
> > +1: Kill all processes that have the corrupted page mapped as soon as the
> > +corruption is detected.
> > +
> > +0: Only unmap the page from all processes and only kill a process
> > +who tries to access it.
> 
> Note that
> - no process will be killed if the page data is clean and can be
>   safely reloaded from disk
> - pages in swap cache is always late killed.

I clarified that

Thanks,
-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/