[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUTjGzQ1Aax0R0cZgUvdow_pUksXd5UrbmqBAK27tN2xg@mail.gmail.com>
Date:   Tue, 16 May 2017 16:55:28 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Mateusz Guzik <mguzik@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: kmemleak splat on copy_process()
On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas
<catalin.marinas@....com> wrote:
> Thanks for cc'ing me. The vmalloc allocations have always been tricky
> for kmemleak since there are 2-3 other memory locations with the same
> value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start;
> occasionally we have vmap_area.va_end pointing to the next
> vmap_area.va_start.
>
> To have a more reliable object reference counting, kmemleak avoids
> scanning most of the vmap_area structure (apart from vmap_area.vm) and
> requires a minimum count of 2 references for a vmalloc'ed object to not
> be considered a leak (that is vm_struct.addr and the caller's location,
> in this case task->stack).
>
>> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
>> > root@...gy:~# cat /sys/kernel/debug/kmemleak
>> > unreferenced object 0xffffa07500d4c000 (size 16384):
>> >   comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
>> >   hex dump (first 32 bytes):
>> >     9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00  .n.W............
>> >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>> >   backtrace:
>> >     [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>> >     [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>> >     [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>> >     [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>> >     [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>> >     [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>> >     [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>> >     [<ffffffffffffffff>] 0xffffffffffffffff
>
> This stack here was allocated by bash for a child process via
> alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the
> child died, free_stack_thread() did not call vfree_atomic() but rather
> stored the corresponding vm_struct pointer in cached_stacks[i]. However,
> the original task->stack pointer was lost (child task_struct freed), so
> when scanning the memory kmemleak can only find a single reference to
> the original stack (via vm_struct cached_stacks[i]) rather than 2 as
> required for vmalloc'ed objects.
>
> BTW, you can get more info about a specific object, including the number
> of references found, with:
>
>   echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak
>
> So basically kernel/fork.c has its own thread stack allocator on top of
> vmalloc and kmemleak gets confused. A quick workaround:
>
> ------------8<------------------------------
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 06d759ab4c62..785907fccf67 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>                 this_cpu_write(cached_stacks[i], NULL);
>
>                 tsk->stack_vm_area = s;
> +               kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC);
>                 local_irq_enable();
>                 return s->addr;
>         }
> @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
>                                 continue;
>
>                         this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
> +                       kmemleak_free(tsk->stack);
>                         local_irq_restore(flags);
>                         return;
>                 }
> ------------8<------------------------------
I would do this differently.  The problem only affects cached stacks,
right?  How about kmemleal_ignoring when caching a stack and
unignoring when uncaching a stack?
Also, this needs a serious comment.  How about "kmemleak does not
presently understand that a reference to a vm_area_struct implies a
reference to the vmalloced memory"?
> A more complex solution (which I'm not yet convinced it works) may be to
> pass the number of references of an objects down to the objects it
> refers. In the vmalloc case we normally have a single reference to
> vm_struct and two to the vmalloc'ed object. If a reference to the
> vmalloc'ed object disappeared we could balance it with an increased
> number of references to the vm_struct object. But this option requires
> more research.
The idea being that, with this change, kmemleak would start to
understand that surplus references to vm_area_struct cause it to think
the memory itself is reachable?
--Andy
Powered by blists - more mailing lists
 
