[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YY5we7CKKS0g4d/s@kroah.com>
Date: Fri, 12 Nov 2021 14:47:39 +0100
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Naresh Kamboju <naresh.kamboju@...aro.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@...il.com>, f.fainelli@...il.com,
torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
lkft-triage@...ts.linaro.org, patches@...nelci.org,
stable@...r.kernel.org, pavel@...x.de, akpm@...ux-foundation.org,
jonathanh@...dia.com, shuah@...nel.org, linux@...ck-us.net,
Yang Shi <shy828301@...il.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Oscar Salvador <osalvador@...e.de>,
Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 5.10 00/21] 5.10.79-rc1 review
On Thu, Nov 11, 2021 at 08:24:42PM +0530, Naresh Kamboju wrote:
> On Thu, 11 Nov 2021 at 18:32, Sudip Mukherjee
> <sudipm.mukherjee@...il.com> wrote:
> >
> > Hi Greg,
> >
> > On Wed, Nov 10, 2021 at 07:43:46PM +0100, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 5.10.79 release.
> > > There are 21 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Fri, 12 Nov 2021 18:19:54 +0000.
> > > Anything received after that time might be too late.
> >
> > systemd-journal-flush.service failed due to a timeout resulting in a very very
> > slow boot on my test laptop. qemu test on openqa failed due to the same problem.
> >
> > https://openqa.qa.codethink.co.uk/tests/365
> >
> > A bisect showed the problem to be 8615ff6dd1ac ("mm: filemap: check if THP has
> > hwpoisoned subpage for PMD page fault"). Reverting it on top of 5.10.79-rc1
> > fixed the problem.
> > Incidentally, I was having similar problem with Linus's tree
> > for last few days and was failing since 20211106 (did not get the time to check).
> > I will test mainline again with this commit reverted.
>
> I have also noticed this problem and Anders bisected and found this
> first bad commit.
>
> Failed test log link,
> A start job is running for Journal Service (5s / 1min 27s)
> https://lkft.validation.linaro.org/scheduler/job/3901980#L2234
>
> Reported-by: Linux Kernel Functional Testing <lkft@...aro.org>
>
> Bisect log:
>
> # bad: [b85617a6291f710807d0cd078c230626dee60b16] Linux 5.10.79-rc1
> # good: [5040520482a594e92d4f69141229a6dd26173511] Linux 5.10.78
> git bisect start 'b85617a6291f710807d0cd078c230626dee60b16'
> '5040520482a594e92d4f69141229a6dd26173511'
> # bad: [7ceeda856035991a6c9804916987a03759745fb0] staging: rtl8712:
> fix use-after-free in rtl8712_dl_fw
> git bisect bad 7ceeda856035991a6c9804916987a03759745fb0
> # bad: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm: filemap: check
> if THP has hwpoisoned subpage for PMD page fault
> git bisect bad 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed
> # good: [e9cb6ce4690749d42013f1d56874c624d7241740] Revert "x86/kvm:
> fix vcpu-id indexed array sizes"
> git bisect good e9cb6ce4690749d42013f1d56874c624d7241740
> # good: [dc385dfc126d51d7a93db694f8e151afe60eb06a] mm: hwpoison:
> remove the unnecessary THP check
> git bisect good dc385dfc126d51d7a93db694f8e151afe60eb06a
> # first bad commit: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm:
> filemap: check if THP has hwpoisoned subpage for PMD page fault
> commit 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed
> Author: Yang Shi <shy828301@...il.com>
> Date: Thu Oct 28 14:36:11 2021 -0700
>
> mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
>
> commit eac96c3efdb593df1a57bb5b95dbe037bfa9a522 upstream.
>
> When handling shmem page fault the THP with corrupted subpage could be
> PMD mapped if certain conditions are satisfied. But kernel is supposed
> to send SIGBUS when trying to map hwpoisoned page.
>
> There are two paths which may do PMD map: fault around and regular
> fault.
>
> Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
> codepaths") the thing was even worse in fault around path. The THP
> could be PMD mapped as long as the VMA fits regardless what subpage is
> accessed and corrupted. After this commit as long as head page is not
> corrupted the THP could be PMD mapped.
>
> In the regular fault path the THP could be PMD mapped as long as the
> corrupted page is not accessed and the VMA fits.
>
> This loophole could be fixed by iterating every subpage to check if any
> of them is hwpoisoned or not, but it is somewhat costly in page fault
> path.
>
> So introduce a new page flag called HasHWPoisoned on the first tail
> page. It indicates the THP has hwpoisoned subpage(s). It is set if any
> subpage of THP is found hwpoisoned by memory failure and after the
> refcount is bumped successfully, then cleared when the THP is freed or
> split.
>
> The soft offline path doesn't need this since soft offline handler just
> marks a subpage hwpoisoned when the subpage is migrated successfully.
> But shmem THP didn't get split then migrated at all.
>
> Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
> Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
> Signed-off-by: Yang Shi <shy828301@...il.com>
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@....com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Cc: Hugh Dickins <hughd@...gle.com>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: Oscar Salvador <osalvador@...e.de>
> Cc: Peter Xu <peterx@...hat.com>
> Cc: <stable@...r.kernel.org>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>
> include/linux/page-flags.h | 23 +++++++++++++++++++++++
> mm/huge_memory.c | 2 ++
> mm/memory-failure.c | 14 ++++++++++++++
> mm/memory.c | 9 +++++++++
> mm/page_alloc.c | 4 +++-
> 5 files changed, 51 insertions(+), 1 deletion(-)
>
Thanks, I'm going to go drop this patch again.
This has been the second time we have tried to add it. Yang, are you
_SURE_ it needs to be in the 5.10.y tree? So far it's been nothing but
build and boot failures :(
thanks,
greg k-h
Powered by blists - more mailing lists