lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200717144913.GA31206@lca.pw>
Date:   Fri, 17 Jul 2020 10:49:28 -0400
From:   Qian Cai <cai@....pw>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     akpm@...ux-foundation.org, mhocko@...e.com, linux-mm@...ck.org,
        mike.kravetz@...cle.com, david@...hat.com,
        aneesh.kumar@...ux.vnet.ibm.com, naoya.horiguchi@....com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 00/15] Hwpoison soft-offline rework

On Thu, Jul 16, 2020 at 02:37:54PM +0200, Oscar Salvador wrote:
> Hi all,
> 
> this is a follow-up version on [1].
> That version had some flaws wrt. handling hugetlb pages, so this version
> fixes it.
> I checked that the case reported by Qian seems to work fine now.

I am still getting EIO from madvise on some x86 NUMA systems with next-20200717
which includes this patchset.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make

# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,3.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 3
madvise: Input/output error

== serial console output ==
[  100.149531][ T1644] Soft offlining pfn 0x3a5000 at process virtual address 0x7f1bf7e00000
[  100.193804][ T1644] Soft offlining pfn 0x3a5200 at process virtual address 0x7f1bf8000000
[  100.263446][ T1644] Soft offlining pfn 0x1fa3e00 at process virtual address 0x7f1bf7e00000
[  100.302745][ T1644] __get_any_page: 0x1fa3e00 free huge page
[  100.330226][ T1644] Soft offlining pfn 0x3bd600 at process virtual address 0x7f1bf8000000
[  100.373717][ T1644] Soft offlining pfn 0x202dc00 at process virtual address 0x7f1bf7e00000
[  100.414605][ T1644] Soft offlining pfn 0x1fa3c00 at process virtual address 0x7f1bf8000000
[  100.457675][ T1644] Soft offlining pfn 0x1fd3a00 at process virtual address 0x7f1bf7e00000
[  100.498519][ T1644] Soft offlining pfn 0x1fd3800 at process virtual address 0x7f1bf8000000
[  100.541750][ T1644] Soft offlining pfn 0x1fa3800 at process virtual address 0x7f1bf7e00000
[  100.582207][ T1644] Soft offlining pfn 0x1ffde00 at process virtual address 0x7f1bf8000000
[  100.625221][ T1644] Soft offlining pfn 0x1f7fe00 at process virtual address 0x7f1bf7e00000
[  100.665768][ T1644] Soft offlining pfn 0x1f7fc00 at process virtual address 0x7f1bf8000000
[  100.712181][ T1644] Soft offlining pfn 0x202d800 at process virtual address 0x7f1bf7e00000
[  100.753815][ T1644] Soft offlining pfn 0x205da00 at process virtual address 0x7f1bf8000000
[  100.796807][ T1644] Soft offlining pfn 0x1f7fa00 at process virtual address 0x7f1bf7e00000
[  100.837403][ T1644] Soft offlining pfn 0x1f7f800 at process virtual address 0x7f1bf8000000
[  100.880442][ T1644] Soft offlining pfn 0x1ffd800 at process virtual address 0x7f1bf7e00000
[  100.921584][ T1644] Soft offlining pfn 0x1fd3600 at process virtual address 0x7f1bf8000000
[  100.964444][ T1644] Soft offlining pfn 0x1fa3600 at process virtual address 0x7f1bf7e00000
[  101.005009][ T1644] Soft offlining pfn 0x1fa3400 at process virtual address 0x7f1bf8000000
[  101.047984][ T1644] Soft offlining pfn 0x1f7f400 at process virtual address 0x7f1bf7e00000
[  101.088665][ T1644] Soft offlining pfn 0x205d600 at process virtual address 0x7f1bf8000000
[  101.131368][ T1644] Soft offlining pfn 0x202d600 at process virtual address 0x7f1bf7e00000
[  101.171717][ T1644] Soft offlining pfn 0x202d400 at process virtual address 0x7f1bf8000000
[  101.216780][ T1644] Soft offlining pfn 0x1fa3000 at process virtual address 0x7f1bf7e00000
[  101.258755][ T1644] Soft offlining pfn 0x1fd3200 at process virtual address 0x7f1bf8000000
[  101.302585][ T1644] Soft offlining pfn 0x1ffd600 at process virtual address 0x7f1bf7e00000
[  101.344729][ T1644] Soft offlining pfn 0x1ffd400 at process virtual address 0x7f1bf8000000
[  101.388958][ T1644] Soft offlining pfn 0x205d000 at process virtual address 0x7f1bf7e00000
[  101.430995][ T1644] Soft offlining pfn 0x1f7f200 at process virtual address 0x7f1bf8000000
[  101.474513][ T1644] Soft offlining pfn 0x1fd2e00 at process virtual address 0x7f1bf7e00000
[  101.515333][ T1644] Soft offlining pfn 0x1fd2c00 at process virtual address 0x7f1bf8000000
[  101.558119][ T1644] Soft offlining pfn 0x1fa2c00 at process virtual address 0x7f1bf7e00000
[  101.600051][ T1644] Soft offlining pfn 0x202d200 at process virtual address 0x7f1bf8000000
[  101.643046][ T1644] Soft offlining pfn 0x1ffd200 at process virtual address 0x7f1bf7e00000
[  101.683842][ T1644] Soft offlining pfn 0x1ffd000 at process virtual address 0x7f1bf8000000
[  101.730551][ T1644] Soft offlining pfn 0x205cc00 at process virtual address 0x7f1bf7e00000
[  101.772575][ T1644] Soft offlining pfn 0x1f7ee00 at process virtual address 0x7f1bf8000000
[  101.818438][ T1644] Soft offlining pfn 0x1fd2a00 at process virtual address 0x7f1bf7e00000
[  101.861488][ T1644] Soft offlining pfn 0x1fd2800 at process virtual address 0x7f1bf8000000
[  101.904410][ T1644] Soft offlining pfn 0x1fa2800 at process virtual address 0x7f1bf7e00000
[  101.946639][ T1644] Soft offlining pfn 0x202ce00 at process virtual address 0x7f1bf8000000
[  101.989523][ T1644] Soft offlining pfn 0x1ffce00 at process virtual address 0x7f1bf7e00000
[  102.030092][ T1644] Soft offlining pfn 0x1ffcc00 at process virtual address 0x7f1bf8000000
[  102.076592][ T1644] Soft offlining pfn 0x437600 at process virtual address 0x7f1bf7e00000
[  102.116941][ T1644] Soft offlining pfn 0x433800 at process virtual address 0x7f1bf8000000
[  102.161314][ T1644] Soft offlining pfn 0x1f7e800 at process virtual address 0x7f1bf7e00000
[  102.200495][ T1644] Soft offlining pfn 0x1fa2a00 at process virtual address 0x7f1bf8000000
[  102.247260][ T1644] Soft offlining pfn 0x1fd2600 at process virtual address 0x7f1bf7e00000
[  102.290189][ T1644] Soft offlining pfn 0x1fd2400 at process virtual address 0x7f1bf8000000
[  102.328558][ T1644] __get_any_page: 0x1fd2400: unknown zero refcount page type 3bfffc000000000

# numactl -H
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 24 25 26 27 28 29
node 0 size: 15646 MB
node 0 free: 14779 MB
node 1 cpus: 6 7 8 9 10 11 30 31 32 33 34 35
node 1 size: 31966 MB
node 1 free: 29825 MB
node 2 cpus: 12 13 14 15 16 17 36 37 38 39 40 41
node 2 size: 32253 MB
node 2 free: 31029 MB
node 3 cpus: 18 19 20 21 22 23 42 43 44 45 46 47
node 3 size: 32252 MB
node 3 free: 31360 MB
node distances:
node   0   1   2   3 
  0:  10  21  21  21 
  1:  21  10  21  21 
  2:  21  21  10  21 
  3:  21  21  21  10

# cat /proc/meminfo 
MemTotal:       114809324 kB
MemFree:        109554164 kB
MemAvailable:   109256800 kB
Buffers:            5376 kB
Cached:           166932 kB
SwapCached:            0 kB
Active:           119804 kB
Inactive:         107788 kB
Active(anon):      56356 kB
Inactive(anon):     9308 kB
Active(file):      63448 kB
Inactive(file):    98480 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:               108 kB
Writeback:             0 kB
AnonPages:         55296 kB
Mapped:            82252 kB
Shmem:             10380 kB
KReclaimable:     114784 kB
Slab:            4401424 kB
SReclaimable:     114784 kB
SUnreclaim:      4286640 kB
KernelStack:       18560 kB
PageTables:         5648 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    61547760 kB
Committed_AS:     286384 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      168220 kB
VmallocChunk:          0 kB
Percpu:            39424 kB
HardwareCorrupted:   196 kB
AnonHugePages:     10240 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:      50
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:       25
Hugepagesize:       2048 kB
Hugetlb:          102400 kB
DirectMap4k:      578044 kB
DirectMap2M:    22358016 kB
DirectMap1G:    113246208 kB

> 
> Cover letter:
> 
> This patchset was initially based on Naoya's hwpoison rework [1], so
> thanks to him for the initial work.
> I would also like to think Naoya for testing the patchset off-line,
> and report any issues he found, that was quite helpful.
> 
> This patchset aims to fix some issues laying in soft-offline handling,
> but it also takes the chance and takes some further steps to perform 
> cleanups and some refactoring as well.
> 
> 
>  - Motivation:
> 
>    A customer and I were facing an issue were processes were killed
>    after having soft-offlined some of their pages.
>    This should not happen when soft-offlining, as it is meant to be non-disruptive.
>    I was able to reproduce the issue when I stressed the memory +
>    soft offlining pages in the meantime.
> 
>    After debugging the issue, I saw that the problem was that pages were returned
>    back to user-space after having offlined them properly.
>    So, when those pages were faulted in, the fault handler returned VM_FAULT_POISON
>    all the way down to the arch handler, and it simply killed the process.
> 
>    After a further anaylsis, it became clear that the problem was that when
>    kcompactd kicked in to migrate pages over, compaction_alloc callback
>    was handing poisoned pages to the migrate routine.
> 
>    All this could happen because isolate_freepages_block and
>    fast_isolate_freepages just check for the page to be PageBuddy,
>    and since 1) poisoned pages can be part of a higher order page
>    and 2) poisoned pages are also Page Buddy, they can sneak in easily.
> 
>    I also saw some other problems with sawap pages, but I suspected it
>    to be the same sort of problem, so I did not follow that trace.
> 
>    The above refers to soft-offline.
>    But I also saw problems with hard-offline, specially hugetlb corruption,
>    and some other weird stuff. (I could paste the logs)
> 
>    The full explanation refering to the soft-offline case can be found at [2].
> 
>  - Approach:
> 
>    The taken approach is to contain those pages and never let them hit 
>    neither pcplists nor buddy freelists.
>    Only when they are completely out of reach, we flag them as poisoned.
> 
>    A full explanation of this can be found in patch#11 and patch#12
> 
>  - Outcome:
> 
>    With this patchset, I no longer see the issues with soft-offline.
> 
> [1] https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
> 
> Naoya Horiguchi (6):
>   mm,hwpoison: cleanup unused PageHuge() check
>   mm, hwpoison: remove recalculating hpage
>   mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>   mm,hwpoison-inject: don't pin for hwpoison_filter
>   mm,hwpoison: remove MF_COUNT_INCREASED
>   mm,hwpoison: remove flag argument from soft offline functions
> 
> Oscar Salvador (9):
>   mm,madvise: Refactor madvise_inject_error
>   mm,hwpoison: Un-export get_hwpoison_page and make it static
>   mm,hwpoison: Kill put_hwpoison_page
>   mm,hwpoison: Unify THP handling for hard and soft offline
>   mm,hwpoison: Rework soft offline for free pages
>   mm,hwpoison: Rework soft offline for in-use pages
>   mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>   mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
>   mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
>  drivers/base/memory.c      |   2 +-
>  include/linux/mm.h         |  12 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hugetlb.c               |  60 +++++++-
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |  37 ++---
>  mm/memory-failure.c        | 307 +++++++++++++++----------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  70 +++++++--
>  10 files changed, 270 insertions(+), 256 deletions(-)
> 
> -- 
> 2.26.2
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ