linux-kernel - Re: [RFC] ARM: lockless get_user_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM1oe53oQ5OBww=89vqcyUt_NqsYfCHDVZkKv0p9=-TchhkHSA@mail.gmail.com>
Date:	Thu, 3 Oct 2013 11:07:44 -0700
From:	Zi Shen Lim <zishen.lim@...aro.org>
To:	Will Deacon <will.deacon@....com>
Cc:	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux@....linux.org.uk" <linux@....linux.org.uk>,
	Catalin Marinas <Catalin.Marinas@....com>,
	"steve.capper@...aro.org" <steve.capper@...aro.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>,
	"linaro-networking@...aro.org" <linaro-networking@...aro.org>
Subject: Re: [RFC] ARM: lockless get_user_pages_fast()

Thanks for your feedback Will.

On Thu, Oct 3, 2013 at 10:27 AM, Will Deacon <will.deacon@....com> wrote:
> On Thu, Oct 03, 2013 at 06:15:15PM +0100, Zi Shen Lim wrote:
>> Futex uses GUP. Currently on ARM, the default __get_user_pages_fast
>> being used always returns 0, leading to a forever loop in get_futex_key :(
>>
>> Implementing GUP solves this problem.
>>
>> Tested on vexpress-A15 on QEMU.
>> 8<---------------------------------------------------->8
>>
>> Implement get_user_pages_fast without locking in the fastpath on ARM.
>> This work is derived from the x86 version and adapted to ARM.
>
> This looks pretty much like an exact copy of the x86 version, which will
> likely also result in another exact copy for arm64. Can none of this code be
> made common? Furthermore, the fact that you've lifted the code and not
> provided much of an explanation in the cover letter hints that you might not
> be aware of all the subtleties involved here...
>

You are right. I was wondering the same too. Hopefully this RFC will
lead to the desired solution.

x86 does this:
--8<-----
        unsigned long mask;
        pte_t *ptep;

        mask = _PAGE_PRESENT|_PAGE_USER;
        if (write)
                mask |= _PAGE_RW;

        ptep = pte_offset_map(&pmd, addr);
        do {
                pte_t pte = gup_get_pte(ptep);
                struct page *page;

                if ((pte_flags(pte) & (mask | _PAGE_SPECIAL)) != mask) {
                        pte_unmap(ptep);
                        return 0;
                }
-->8-----
The adaptation uses pte_* macros.

x86 also uses a more optimized version of pmd_large and pud_large,
instead of reusing pmd_huge or pud_huge.

>> +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>> +             int write, struct page **pages, int *nr)
>> +{
>> +     unsigned long next;
>> +     pmd_t *pmdp;
>> +
>> +     pmdp = pmd_offset(&pud, addr);
>> +     do {
>> +             pmd_t pmd = *pmdp;
>> +
>> +             next = pmd_addr_end(addr, end);
>> +             /*
>> +              * The pmd_trans_splitting() check below explains why
>> +              * pmdp_splitting_flush has to flush the tlb, to stop
>> +              * this gup-fast code from running while we set the
>> +              * splitting bit in the pmd. Returning zero will take
>> +              * the slow path that will call wait_split_huge_page()
>> +              * if the pmd is still in splitting state. gup-fast
>> +              * can't because it has irq disabled and
>> +              * wait_split_huge_page() would never return as the
>> +              * tlb flush IPI wouldn't run.
>> +              */
>> +             if (pmd_none(pmd) || pmd_trans_splitting(pmd))
>> +                     return 0;
>> +             if (unlikely(pmd_huge(pmd))) {
>> +                     if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
>> +                             return 0;
>> +             } else {
>> +                     if (!gup_pte_range(pmd, addr, next, write, pages, nr))
>> +                             return 0;
>> +             }
>> +     } while (pmdp++, addr = next, addr != end);
>
> ...case in point: we don't (usually) require IPIs to shoot down TLB entries
> in SMP systems, so this is racy under thp splitting.
>

Ok. I learned something new :)
Suggestions on how to proceed?

Thanks for your patience.

> Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/