lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1273737326-21211-1-git-send-email-n-horiguchi@ah.jp.nec.com>
Date:	Thu, 13 May 2010 16:55:19 +0900
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	n-horiguchi@...jp.nec.com
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: [PATCH 0/7] HWPOISON for hugepage (v5)

This patchset enables error handling for hugepage by containing error
in the affected hugepage.

Until now, memory error (classified as SRAO in MCA language) on hugepage
was simply ignored, which means if someone accesses the error page later,
the second MCE (severer than the first one) occurs and the system panics.

It's useful for some aggressive hugepage users if only affected processes
are killed.  Then other unrelated processes aren't disturbed by the error
and can continue operation.

Moreover, for other extensive hugetlb users which have own "pagecache"
on hugepage, the most valued feature would be being able to receive
the early kill signal BUS_MCEERR_AO, because the cache pages have
good opportunity to be dropped without side effects on BUS_MCEERR_AO.


The design of hugepage error handling is based on that of non-hugepage
error handling, where we:
 1. mark the error page as hwpoison,
 2. unmap the hwpoisoned page from processes using it,
 3. invalidate error page, and
 4. block later accesses to the hwpoisoned pages.

Similarities and differences between huge and non-huge case are
summarized below:

 1. (Difference) when error occurs on a hugepage, PG_hwpoison bits on all pages
    in the hugepage are set, because we have no simple way to break up
    hugepage into individual pages for now. This means there is a some
    risk to be killed by touching non-guilty pages within the error hugepage.

 2. (Similarity) hugetlb entry for the error hugepage is replaced by hwpoison
    swap entry, with which we can detect hwpoisoned memory in VM code.
    This is accomplished by adding rmapping code for hugepage, which enables
    to use try_to_unmap() for hugepage.

 3. (Difference) since hugepage is not linked to LRU list and is unswappable,
    there are not many things to do for page invalidation (only dequeuing
    free/reserved hugepage from freelist. See patch 5/7.)
    If we want to contain the error into one page, there may be more to do.

 4. (Similarity) we block later accesses by forcing page requests for
    hwpoisoned hugepage to fail as done in non-hugepage case in do_wp_page().

ToDo:
- Narrow down the containment region into one raw page.
- Soft-offlining for hugepage is not supported due to the lack of migration
  for hugepage.
- Counting file-mapped/anonymous hugepage in NR_FILE_MAPPED/NR_ANON_PAGES.

 [PATCH 1/7] hugetlb, rmap: add reverse mapping for hugepage
 [PATCH 2/7] HWPOISON, hugetlb: enable error handling path for hugepage
 [PATCH 3/7] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
 [PATCH 4/7] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
 [PATCH 5/7] HWPOISON, hugetlb: isolate corrupted hugepage
 [PATCH 6/7] HWPOISON, hugetlb: detect hwpoison in hugetlb code
 [PATCH 7/7] HWPOISON, hugetlb: support hwpoison injection for hugepage

Dependency:
- patch 2 depends on patch 1.
- patch 3 to patch 6 depend on patch 2.

 include/linux/hugetlb.h |    3 +
 mm/hugetlb.c            |   98 ++++++++++++++++++++++++++++++++++++++-
 mm/hwpoison-inject.c    |   15 ++++--
 mm/memory-failure.c     |  120 +++++++++++++++++++++++++++++++++++------------
 mm/rmap.c               |   16 ++++++
 5 files changed, 215 insertions(+), 37 deletions(-)

ChangeLog from v4:
- rebased to 2.6.34-rc7
- add isolation code for free/reserved hugepage in me_huge_page()
- set/clear PG_hwpoison bits of all pages in hugepage.
- mce_bad_pages counts all pages in hugepage.
- rename __hugepage_set_anon_rmap() to hugepage_add_anon_rmap()
- add huge_pte_offset() dummy function in header file on !CONFIG_HUGETLBFS

ChangeLog from v3:
- rebased to 2.6.34-rc5
- support for privately mapped hugepage

ChangeLog from v2:
- rebase to 2.6.34-rc3
- consider mapcount of hugepage
- rename pointer "head" into "hpage"

ChangeLog from v1:
- rebase to 2.6.34-rc1
- add comment from Wu Fengguang

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ