lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a20e7bdb-7344-306d-e8f5-5ee69af7d5ea@oracle.com>
Date: Wed, 10 Jan 2024 18:32:40 -0800
From: Sidhartha Kumar <sidhartha.kumar@...cle.com>
To: Muhammad Usama Anjum <usama.anjum@...labora.com>,
        Jiaqi Yan <jiaqiyan@...gle.com>
Cc: linmiaohe@...wei.com, mike.kravetz@...cle.com, naoya.horiguchi@....com,
        akpm@...ux-foundation.org, songmuchun@...edance.com,
        shy828301@...il.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        jthoughton@...gle.com, "kernel@...labora.com" <kernel@...labora.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>
Subject: Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs
 read

On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>> <usama.anjum@...labora.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to convert this test to TAP as I think the failures sometimes go
>>>> unnoticed on CI systems if we only depend on the return value of the
>>>> application. I've enabled the following configurations which aren't already
>>>> present in tools/testing/selftests/mm/config:
>>>> CONFIG_MEMORY_FAILURE=y
>>>> CONFIG_HWPOISON_INJECT=m
>>>>
>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>> investigate the failure when we are trying to inject the poison page by
>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>
>>>> ./hugetlb-read-hwpoison
>>>> Write/read chunk size=0x800
>>>>   ... HugeTLB read regression test...
>>>>   ...  ... expect to read 0x200000 bytes of data in total
>>>>   ...  ... actually read 0x200000 bytes of data in total
>>>>   ... HugeTLB read regression test...TEST_PASSED
>>>>   ... HugeTLB read HWPOISON test...
>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
>>>> address 0x7f28ec101000
>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
>>>> users
>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>>>>   ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>>   ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>
>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>
>>> Thanks for reporting this, Usama!
>>>
>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>> writeback disabling."
>>>
>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>> MADV_HWPOISON injection works and and the test passes:
>>>
>>>   ... HugeTLB read HWPOISON test...
>>>   ...  ... expect to read 0x101000 bytes of data in total
>>>   ...  !!! read failed: Input/output error
>>>   ...  ... actually read 0x101000 bytes of data in total
>>>   ... HugeTLB read HWPOISON test...TEST_PASSED
>>>   ... HugeTLB seek then read HWPOISON test...
>>>   ...  ... init val=4 with offset=0x102000
>>>   ...  ... expect to read 0xfe000 bytes of data in total
>>>   ...  ... actually read 0xfe000 bytes of data in total
>>>   ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>   ...
>>>
>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>> virtual address 0x7f75e3101000
>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>> page: Recovered
>>> ...
>>>
>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>> should be able to figure it out via bisection (and of course by
>>> reading delta commits between them, probably related to page
>>> refcount).
>> Thank you for this information.
>>
>>>
>>> That being said, I will be on vacation from tomorrow until the end of
>>> next week. So I will get back to this after next weekend. Meanwhile if
>>> you want to go ahead and bisect the problematic commit, that will be
>>> very much appreciated.
>> I'll try to bisect and post here if I find something.
> Found the culprit commit by bisection:
> 
> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
> mm/filemap: remove hugetlb special casing in filemap.c
> 
> hugetlb-read-hwpoison started failing from this patch. I've added the
> author of this patch to this bug report.
> 
Hi Usama,

Thanks for pointing this out. After debugging, the below diff seems to fix the 
issue and allows the tests to pass again. Could you test it on your 
configuration as well just to confirm.

Thanks,
Sidhartha

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 36132c9125f9..3a248e4f7e93 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
                 } else {
                         folio_unlock(folio);

-                       if (!folio_test_has_hwpoisoned(folio))
+                       if (!folio_test_hwpoison(folio))
                                 want = nr;
                         else {
                                 /*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d8c853b35dbb..87f6bf7d8bc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -973,7 +973,7 @@ struct page_state {
  static bool has_extra_refcount(struct page_state *ps, struct page *p,
                                bool extra_pins)
  {
-       int count = page_count(p) - 1;
+       int count = page_count(p) - folio_nr_pages(page_folio(p));

         if (extra_pins)
                 count -= 1;


>>
>>>
>>> Thanks,
>>> Jiaqi
>>>
>>>
>>>>
>>>> Regards,
>>>> Usama
>>>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ