linux-kernel - Re: [PATCH -next] arm64/mm: fix a bogus GFP flag in pgd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190604142338.GC24467@lakrids.cambridge.arm.com>
Date:   Tue, 4 Jun 2019 15:23:38 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Qian Cai <cai@....pw>, rppt@...ux.ibm.com
Cc:     akpm@...ux-foundation.org, catalin.marinas@....com,
        will.deacon@....com, linux-kernel@...r.kernel.org,
        mhocko@...nel.org, linux-mm@...ck.org, vdavydov.dev@...il.com,
        hannes@...xchg.org, cgroups@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH -next] arm64/mm: fix a bogus GFP flag in pgd_alloc()

On Tue, Jun 04, 2019 at 10:00:36AM -0400, Qian Cai wrote:
> The commit "arm64: switch to generic version of pte allocation"
> introduced endless failures during boot like,
> 
> kobject_add_internal failed for pgd_cache(285:chronyd.service) (error:
> -2 parent: cgroup)
> 
> It turns out __GFP_ACCOUNT is passed to kernel page table allocations
> and then later memcg finds out those don't belong to any cgroup.

Mike, I understood from [1] that this wasn't expected to be a problem,
as the accounting should bypass kernel threads.

Was that assumption wrong, or is something different happening here?

> 
> backtrace:
>   kobject_add_internal
>   kobject_init_and_add
>   sysfs_slab_add+0x1a8
>   __kmem_cache_create
>   create_cache
>   memcg_create_kmem_cache
>   memcg_kmem_cache_create_func
>   process_one_work
>   worker_thread
>   kthread
> 
> Signed-off-by: Qian Cai <cai@....pw>
> ---
>  arch/arm64/mm/pgd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
> index 769516cb6677..53c48f5c8765 100644
> --- a/arch/arm64/mm/pgd.c
> +++ b/arch/arm64/mm/pgd.c
> @@ -38,7 +38,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>  	if (PGD_SIZE == PAGE_SIZE)
>  		return (pgd_t *)__get_free_page(gfp);
>  	else
> -		return kmem_cache_alloc(pgd_cache, gfp);
> +		return kmem_cache_alloc(pgd_cache, GFP_PGTABLE_KERNEL);

This is used to allocate PGDs for both user and kernel pagetables (e.g.
for the efi runtime services), so while this may fix the regression, I'm
not sure it's the right fix.

Do we need a separate pgd_alloc_kernel()?

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20190505061956.GE15755@rapoport-lnx