linux-kernel - Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <BC145393-93AC-4DF4-9CF4-2FB1C736B70C@linux.alibaba.com>
Date:   Thu, 23 Sep 2021 01:04:54 +0800
From:   Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>, song@...nel.org,
        william.kucharski@...cle.com, Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page
 cache



> On Sep 22, 2021, at 7:37 PM, Matthew Wilcox <willy@...radead.org> wrote:
> 
> On Wed, Sep 22, 2021 at 03:06:44PM +0800, Rongwei Wang wrote:
>> Transparent huge page has supported read-only non-shmem files. The file-
>> backed THP is collapsed by khugepaged and truncated when written (for
>> shared libraries).
>> 
>> However, there is race in two possible places.
>> 
>> 1) multiple writers truncate the same page cache concurrently;
>> 2) collapse_file rolls back when writer truncates the page cache;
> 
> As I've said before, the bug here is that somehow there is a writable fd
> to a file with THPs.  That's what we need to track down and fix.
Hi, Matthew
I am not sure get your means. We know “mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs"
Introduced file-backed THPs for DSO. It is possible {very rarely} for DSO to be opened in writeable way. So, the
Pagecache of DSO need to be cleaned when someone opened this DSO in a writeable way. The process of
Pagecache cleaned mainly refers to ’truncate_inode_pages_range’.

Of course, the above is consensus for us.

The ’somehow’ that your said is sure for us. Maybe I can try to describe roughly here:

In collapse_file <khugepaged>, when PTEs scan failed (e.g. SCAN_FAIL, SCAN_TRUNCATED..), the following
code will be run:

        } else {
                struct page *page;

                /* Something went wrong: roll back page cache changes */
                xas_lock_irq(&xas);
                mapping->nrpages -= nr_none;

                if (is_shmem)
                        shmem_uncharge(mapping->host, nr_none);

                xas_set(&xas, start);
                xas_for_each(&xas, page, end - 1) {
                        page = list_first_entry_or_null(&pagelist,
                                        struct page, lru);
                        if (!page || xas.xa_index < page->index) {
                                if (!nr_none)
                                        break;
                                nr_none--;
                                /* Put holes back where they were */
                                xas_store(&xas, NULL);
                                continue;
                        }

                        VM_BUG_ON_PAGE(page->index != xas.xa_index, page);

                        /* Unfreeze the page. */
                        list_del(&page->lru);
                        page_ref_unfreeze(page, 2);
  line1               xas_store(&xas, page);
                        xas_pause(&xas);
                        xas_unlock_irq(&xas);
                        unlock_page(page);
                        putback_lru_page(page);
                        xas_lock_irq(&xas);
                }
		….
		new_page->mapping = NULL;
	   }
line2   unlock_page(new_page);
---

The above code refers to roll back. When someone starts open a DSO in a writeable way, and the collapse_file
is scanning PTEs. The hugepage had been locked and was mapped in page cache before line1. And the hugepage
not be in pagecache and be unlocked at line2. The race that between ‘collapse_file’ and ’truncate_inode_pages_range’
will happen when ‘collapse_file' is executing line1 and line2. 
This race can be shown concisely:

Khugepaged:                                                writer
Collapse_file:                                                truncate_inode_pages_range

lock_page(hugepage)                                   skip hugepage because locked by others

					                              scan_left_pages() and wait_lock_page(hugepage)
scan_fail() & xas_store(&xas, 4k_page)

unlock_page(hugepage)
						                      get_lock_page(hugepage)
						                      hugepage is not in pagecache here, but not be checked

This situation triggers easily if we setting a very small ‘scan_sleep_millisecs’.

The above descriptions are roughly my understanding of ’somehow’. It says a lot of nonsense, maybe it helps!
> 
> https://lore.kernel.org/linux-mm/YUdL3lFLFHzC80Wt@casper.infradead.org/
All in all, what you mean is that we should solve this race at the source? However, a writer happens to appear in the middle
of a process in ‘collapse_file', so this bug solved when roll back. The above is my understanding, but it may be
wrong.

Thanks