lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.2008021705240.29126@eggly.anvils>
Date:   Sun, 2 Aug 2020 17:35:23 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Song Liu <songliubraving@...com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH] khugepaged: retract_page_tables() remember to test
 exit

On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > Only once have I seen this scenario (and forgot even to notice what
> > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > pmd:00000000, pte values corresponding to data in physical page 0.
> > 
> > The pte mappings being zapped in this case were supposed to be from a
> > huge page of ext4 text (but could as well have been shmem): my belief
> > is that it was racing with collapse_file()'s retract_page_tables(),
> > found *pmd pointing to a page table, locked it, but *pmd had become
> > 0 by the time start_pte was decided.
> > 
> > In most cases, that possibility is excluded by holding mmap lock;
> > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > do so, for example.  But retract_page_tables() did not: fix that
> > (using an mm variable instead of vma->vm_mm repeatedly).
> 
> Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> i_mmap lock, no? Unlinking a VMA requires it.

Ah, my wording is misleading, yes.  That comment
"(using an mm variable instead of vma->vm_mm repeatedly)"
was nothing more than a note, that the patch is bigger than it could be,
because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
But it looks as if I'm saying there used to be a need for READ_ONCE() or
something, and by using the mm variable I was fixing the problem.

No, sorry: delete that line now the point is made: the mm variable is
just a patch detail, it's not important.

The fix (as the subject suggested) is for retract_page_tables() to check
khugepaged_test_exit(), after acquiring mmap lock, before doing anything
to the page table.  Getting the mmap lock serializes with __mmput(),
which briefly takes and drops it in __khugepaged_exit(); then the
khugepaged_test_exit() check on mm_users makes sure we don't touch the
page table once exit_mmap() might reach it, since exit_mmap() will be
proceeding without mmap lock, not expecting anyone to be racing with it.

(I devised that protocol for ksmd, then Andrea adopted it for khugepaged:
back then it was important for these daemons to have a hold on the mm,
without an actual reference to mm_users, because that would prevent the
OOM killer from reaching exit_mmap().  Nowadays with the OOM reaper, it's
probably less crucial to avoid mm_users, but I think still worthwhile.)

Thanks a lot for looking at these patches so quickly,
Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ