linux-kernel - Re: [PATCH] vmap(): don't allow invalid pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAH8bW_=KtrwLpg4tx+1tdMjN2mUiuLOoZH6nvbn8qNjU6aB3A@mail.gmail.com>
Date:   Wed, 19 Jan 2022 09:22:58 -0800
From:   Yury Norov <yury.norov@...il.com>
To:     Anshuman Khandual <anshuman.khandual@....com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Ding Tianhong <dingtianhong@...wei.com>,
        Alexey Klimov <aklimov@...hat.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Uladzislau Rezki <urezki@...il.com>
Subject: Re: [PATCH] vmap(): don't allow invalid pages

On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual
<anshuman.khandual@....com> wrote:
>
>
>
> On 1/19/22 6:21 AM, Matthew Wilcox wrote:
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> >> vmap() takes struct page *pages as one of arguments, and user may provide
> >> an invalid pointer which would lead to DABT at address translation later.
> >
> > Could we spell out 'DABT'?  Presumably that's an ARM-specific thing.
> > Just like we don't say #PF for Intel page faults, I think this is
> > probably a 'data abort'?
>
> Right, it is data abort.
>
> >
> >> Currently, kernel checks the pages against NULL. In my case, however, the
> >> address was not NULL, and was big enough so that the hardware generated
> >> Address Size Abort on arm64.
>
> Could you please provide the actual abort stack here with details.

[  665.484101] Unhandled fault at 0xffff8000252cd000
[  665.488807] Mem abort info:
[  665.491617]   ESR = 0x96000043
[  665.494675]   EC = 0x25: DABT (current EL), IL = 32 bits
[  665.499985]   SET = 0, FnV = 0
[  665.503039]   EA = 0, S1PTW = 0
[  665.506167] Data abort info:
[  665.509047]   ISV = 0, ISS = 0x00000043
[  665.512882]   CM = 0, WnR = 1
[  665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000
[  665.522550] [ffff8000252cd000] pgd=000000affcfff003,
pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713
[  665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP
[  665.539936] Modules linked in: [...]
[  665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P           OE
   5.4.0-84-generic #94~18.04.1-Ubuntu
[  665.626806] Hardware name: HPE Apollo 70             /C01_APACHE_MB
        , BIOS L50_5.13_1.0.6 07/10/2018
[  665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO)
[  665.641407] pc : __memset+0x38/0x188
[  665.645146] lr : test+0xcc/0x3f8
[  665.650184] sp : ffff8000359bb840
[  665.653486] x29: ffff8000359bb840 x28: 0000000000000000
[  665.658785] x27: 0000000000000000 x26: 0000000000231000
[  665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800
[  665.669382] x23: 0000000000000001 x22: ffff00af533e5000
[  665.674680] x21: 0000000000001000 x20: 0000000000000000
[  665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff
[  665.685276] x17: 00000000588636a5 x16: 0000000000000013
[  665.690574] x15: ffffffffffffffff x14: 000000000007ffff
[  665.695872] x13: 0000000080000000 x12: 0140000000000000
[  665.701170] x11: 0000000000000041 x10: ffff8000652cd000
[  665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000
[  665.711767] x7 : 0303030303030303 x6 : 0000000000001000
[  665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000
[  665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001
[  665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000
[  665.732960] Call trace:
[  665.735395]  __memset+0x38/0x188
[...]

TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused
address size abort has bit#47 set, which is far above 16TB that is allowed by
ips == 100.

> >>
> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> >> used, which is quite inconvenient for debugging purposes.
> >>
> >> This patch adds a pfn_valid() check into vmap() path, so that invalid
> >> mapping will not be created.
> >>
> >> RFC: https://lkml.org/lkml/2022/1/18/815
> >> v1: use pfn_valid() instead of adding an arch-specific
> >>     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
>
> This should be after the '---' below.
>
> >>
> >> Signed-off-by: Yury Norov <yury.norov@...il.com>
> >
> > Suggested-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> >
> >> ---
> >>  mm/vmalloc.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index d2a00ad4e1dd..a4134ee56b10 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >>                      return -EBUSY;
> >>              if (WARN_ON(!page))
> >>                      return -ENOMEM;
> >> +            if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> >> +                    return -EINVAL;
> >>              set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >>              (*nr)++;
> >>      } while (pte++, addr += PAGE_SIZE, addr != end);
> >> --
> >> 2.30.2
> >>
>
> Why should not this just scan over the entire user provided struct page
> array and make sure that all pages there in are valid via above method,
> but in vmap() itself before calling vmap_pages_range(). Because seems
> like a single invalid page detected in vmap_pages_pte_range() will
> anyways abort the entire vmap(). This will also enable us to drop the
> existing NULL check above.

I can do this, but why is it any better than the current approach?

Thanks,
Yury