linux-kernel - Re: [PATCH 0/2] mm/thp: rework the pmd protect changing flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <50493410-c44c-7ef0-81f9-d4ce9a525c1f@arm.com>
Date:   Thu, 23 Jan 2020 13:54:32 +0530
From:   Anshuman Khandual <anshuman.khandual@....com>
To:     Xuefeng Wang <wxf.wang@...ilicon.com>, catalin.marinas@....com,
        will@...nel.org, mark.rutland@....com, arnd@...db.de,
        akpm@...ux-foundation.org
Cc:     linux-arch@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        chenzhou10@...wei.com
Subject: Re: [PATCH 0/2] mm/thp: rework the pmd protect changing flow



On 01/23/2020 01:25 PM, Xuefeng Wang wrote:
> On KunPeng920 board. When changing permission of a large range region,
> pmdp_invalidate() takes about 65% in profile (with hugepages) in JIT tool.
> Kernel will flush tlb twice: first flush happens in pmdp_invalidate, second
> flush happens at the end of change_protect_range(). The first pmdp_invalidate
> is not necessary if the hardware support atomic pmdp changing. The atomic
> changing pimd to zero can prevent the hardware from update asynchronous.
> So reconstruct it and remove the first pmdp_invalidate. And the second tlb
> flush can make sure the new tlb entry valid.
> 
> This patch series add a pmdp_modify_prot transaction abstraction firstly.
> Then add pmdp_modify_prot_start() in arm64, which uses pmdp_huge_get_and_clear()
> to atomically fetch the pmd and zero the entry.

There is a comment section in change_huge_pmd() which details how clearing
the PMD entry there (in prot_numa case) can potentially race with another
concurrent madvise(MADV_DONTNEED, ..) call. Here is the comment block for
reference.

        /*
         * In case prot_numa, we are under down_read(mmap_sem). It's critical
         * to not clear pmd intermittently to avoid race with MADV_DONTNEED
         * which is also under down_read(mmap_sem):
         *
         *      CPU0:                           CPU1:
         *                              change_huge_pmd(prot_numa=1)
         *                               pmdp_huge_get_and_clear_notify()
         * madvise_dontneed()
         *  zap_pmd_range()
         *   pmd_trans_huge(*pmd) == 0 (without ptl)
         *   // skip the pmd
         *                               set_pmd_at();
         *                               // pmd is re-established
         *
         * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
         * which may break userspace.
         *
         * pmdp_invalidate() is required to make sure we don't miss
         * dirty/young flags set by hardware.
         */

By defining the new override with pmdp_huge_get_and_clear(), are not we
now exposed to above race condition ?

- Anshuman