lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <BC145393-93AC-4DF4-9CF4-2FB1C736B70C@linux.alibaba.com>
Date:   Thu, 23 Sep 2021 01:04:54 +0800
From:   Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>, song@...nel.org,
        william.kucharski@...cle.com, Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page
 cache



> On Sep 22, 2021, at 7:37 PM, Matthew Wilcox <willy@...radead.org> wrote:
> 
> On Wed, Sep 22, 2021 at 03:06:44PM +0800, Rongwei Wang wrote:
>> Transparent huge page has supported read-only non-shmem files. The file-
>> backed THP is collapsed by khugepaged and truncated when written (for
>> shared libraries).
>> 
>> However, there is race in two possible places.
>> 
>> 1) multiple writers truncate the same page cache concurrently;
>> 2) collapse_file rolls back when writer truncates the page cache;
> 
> As I've said before, the bug here is that somehow there is a writable fd
> to a file with THPs.  That's what we need to track down and fix.
Hi, Matthew
I am not sure get your means. We know “mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs"
Introduced file-backed THPs for DSO. It is possible {very rarely} for DSO to be opened in writeable way. So, the
Pagecache of DSO need to be cleaned when someone opened this DSO in a writeable way. The process of
Pagecache cleaned mainly refers to ’truncate_inode_pages_range’.

Of course, the above is consensus for us.

The ’somehow’ that your said is sure for us. Maybe I can try to describe roughly here:

In collapse_file <khugepaged>, when PTEs scan failed (e.g. SCAN_FAIL, SCAN_TRUNCATED..), the following
code will be run:

        } else {
                struct page *page;

                /* Something went wrong: roll back page cache changes */
                xas_lock_irq(&xas);
                mapping->nrpages -= nr_none;

                if (is_shmem)
                        shmem_uncharge(mapping->host, nr_none);

                xas_set(&xas, start);
                xas_for_each(&xas, page, end - 1) {
                        page = list_first_entry_or_null(&pagelist,
                                        struct page, lru);
                        if (!page || xas.xa_index < page->index) {
                                if (!nr_none)
                                        break;
                                nr_none--;
                                /* Put holes back where they were */
                                xas_store(&xas, NULL);
                                continue;
                        }

                        VM_BUG_ON_PAGE(page->index != xas.xa_index, page);

                        /* Unfreeze the page. */
                        list_del(&page->lru);
                        page_ref_unfreeze(page, 2);
  line1               xas_store(&xas, page);
                        xas_pause(&xas);
                        xas_unlock_irq(&xas);
                        unlock_page(page);
                        putback_lru_page(page);
                        xas_lock_irq(&xas);
                }
		….
		new_page->mapping = NULL;
	   }
line2   unlock_page(new_page);
---

The above code refers to roll back. When someone starts open a DSO in a writeable way, and the collapse_file
is scanning PTEs. The hugepage had been locked and was mapped in page cache before line1. And the hugepage
not be in pagecache and be unlocked at line2. The race that between ‘collapse_file’ and ’truncate_inode_pages_range’
will happen when ‘collapse_file' is executing line1 and line2. 
This race can be shown concisely:

Khugepaged:                                                writer
Collapse_file:                                                truncate_inode_pages_range

lock_page(hugepage)                                   skip hugepage because locked by others

					                              scan_left_pages() and wait_lock_page(hugepage)
scan_fail() & xas_store(&xas, 4k_page)

unlock_page(hugepage)
						                      get_lock_page(hugepage)
						                      hugepage is not in pagecache here, but not be checked

This situation triggers easily if we setting a very small ‘scan_sleep_millisecs’.

The above descriptions are roughly my understanding of ’somehow’. It says a lot of nonsense, maybe it helps!
> 
> https://lore.kernel.org/linux-mm/YUdL3lFLFHzC80Wt@casper.infradead.org/
All in all, what you mean is that we should solve this race at the source? However, a writer happens to appear in the middle
of a process in ‘collapse_file', so this bug solved when roll back. The above is my understanding, but it may be
wrong.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ