lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Oct 2021 17:03:00 +0800
From:   Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To:     Hugh Dickins <hughd@...gle.com>,
        Matthew Wilcox <willy@...radead.org>
Cc:     Song Liu <song@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        William Kucharski <william.kucharski@...cle.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page
 cache



On 10/5/21 10:58 AM, Hugh Dickins wrote:
> On Tue, 5 Oct 2021, Rongwei Wang wrote:
> 
>> Hi,
>> I have run our cases these two days to stress test new Patch #1. The new Patch
>> #1 mainly add filemap_invalidate_{un}lock before and after
>> truncate_pagecache(), basing on original Patch #1. And the crash has not
>> happened.
>>
>> Now, I keep the original Patch #1, then adding the code below which suggested
>> by liu song (I'm not sure which one I should add in the next version,
>> Suggested-by or Signed-off-by? If you know, please remind me).
>>
>> -               if (filemap_nr_thps(inode->i_mapping))
>> +               if (filemap_nr_thps(inode->i_mapping)) {
>> +                       filemap_invalidate_lock(inode->i_mapping);
>>                          truncate_pagecache(inode, 0);
>> +                       filemap_invalidate_unlock(inode->i_mapping);
>> +               }
> 
> I won't NAK that patch; but I still believe it's unnecessary, and don't
> see how it protects against all the races (collapse_file() does not use
> that lock, whereas collapse_file() does use page lock).  And if you're
> hoping to fix 5.10, then you will have to backport those invalidate_lock
> patches there too (they're really intended to protect hole-punching).
> 
>>
>> And the reason for keeping the original Patch #1 is mainly to fix the race
>> between collapse_file and truncate_pagecache. It seems necessary. Despite the
>> two-day test, I did not reproduce this race any more.
>>
>> In addition, I also test the below method:
>>
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 3f47190f98a8..33604e4ce60a 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -210,8 +210,6 @@ invalidate_complete_page(struct address_space *mapping,
>> struct page *page)
>>
>>   int truncate_inode_page(struct address_space *mapping, struct page *page)
>>   {
>> -       VM_BUG_ON_PAGE(PageTail(page), page);
>> -
>>          if (page->mapping != mapping)
>>                  return -EIO;
>>
>> I am not very sure this VM_BUG_ON_PAGE(PageTail) is what Hugh means. And
>> the test results show that only removing this VM_BUG_ON_PAGE(PageTail) has no
>> effect. So, I still keep the original Patch #1 to fix one race.
> 
> Yes, that's exactly what I meant, and thank you for intending to try it.
> 
> But if that patch had "no effect", then I think you were not running the
> kernel with that patch applied: because it deletes the BUG on line 213
> of mm/truncate.c, which is what you reported in the first mail!
> 
> Or, is line 213 of mm/truncate.c in your 5.10.46-hugetext+ kernel
> something else?  I've been looking at 5.15-rc.
Hi, Hugh

I'm sorry the confusing '5.10.46-hugetext+'. I am also look and test at 
5.15-rc.
> 
> But I wasn't proposing to delete it merely to hide the BUG: as I hope
> I explained, we could move it below the page->mapping check, but it
> wouldn't really be of any value there since tails have NULL page->mapping
> anyway (well, I didn't check first and second tails, maybe mapping gets
> reused for some compound page field in those). I was proposing to delete
> it because the page->mapping check then weeds out the racy case once
> we're holding page lock, without the need for adding anything special.
> 
> Hugh
Today, I try again to create some cases to reproduce the race, such as 
ensuring that multiple processes are always executing 
‘truncate_pagecache’ and only mapping 2M DSO. In this way, I try to 
ensure that the target of 'collapse_file' and 'truncate_pagecache' can 
only be the same VMA, to increase the probability of reproducing that 
race. But, I can't reproduce that race any more.

In fact, according to the previous experience, the current number of 
attempts should be able to reproduce that race.

If you have the idea about creating this case, please tell me, and I can 
try again. Or we can solve it when it appears again.

Thanks!
> 

Powered by blists - more mailing lists