lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220509105641.491313-1-pizhenwei@bytedance.com>
Date:   Mon,  9 May 2022 18:56:36 +0800
From:   zhenwei pi <pizhenwei@...edance.com>
To:     akpm@...ux-foundation.org, naoya.horiguchi@....com
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        zhenwei pi <pizhenwei@...edance.com>
Subject: [PATCH v2 0/5] memory-failure: fix hwpoison_filter

v1 -> v2:
- move clear_hwpoisoned_pages() near the definitions of
  hwpoison_filter*.
- fix typo.
- remove "mm/memofy-failure.c: optimize hwpoison_filter".
- call hwpoison_filter() after get_hwpoison_page().
- disable hwpoison filter during removing
- simplify num_poisoned_pages_inc/dec

Also add background of this work:
As well known, the memory failure mechanism handles memory corrupted event, and try to send SIGBUS to the user process which uses this corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE into the guest, and the guest handles memory failure again. Thus the guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:
1, try to modify code to decrease poisoned pages in a single place (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

2, try to use page_handle_poison() to handle SetPageHWPoison() and num_poisoned_pages_inc() together. It would be best to call num_poisoned_pages_inc() in a single place too.

3, introduce memory failure notifier list in memory-failure.c: notify the corrupted PFN to someone who registers this list.
If I can complete [1] and [2] part, [3] will be quite easy(just call notifier list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers memory failure notifier list. During the guest kernel handles memory failure, balloon device gets notified by memory failure notifier list, and tells the host to recover the corrupted PFN(GPA) by the new VQ.

5, host side remaps the corrupted page(HVA), and tells the guest side to unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) dynamically.

Thanks to Naoya & David for the suggestions!

v1:
- move clear_hwpoisoned_pages() from sparse.c to memory-failure.c.
- simplify num_poisoned_pages_dec().
- call hwpoison_filter() early in memory_failure().
- add hwpoison_filter for soft offline.

zhenwei pi (5):
  mm/memory-failure.c: move clear_hwpoisoned_pages
  mm/memory-failure.c: simplify num_poisoned_pages_dec
  mm/memory-failure.c: add hwpoison_filter for soft offline
  mm/hwpoison: disable hwpoison filter during removing
  mm/memory-failure.c: simplify num_poisoned_pages_inc/dec

 mm/hwpoison-inject.c |  1 +
 mm/internal.h        | 11 ++++++
 mm/memory-failure.c  | 85 ++++++++++++++++++++++++--------------------
 mm/page_alloc.c      |  1 -
 mm/sparse.c          | 27 --------------
 5 files changed, 59 insertions(+), 66 deletions(-)

-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ