linux-kernel - Re: Memory leaks on atom-based boards?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <545F566F.7010102@suse.cz>
Date:	Sun, 09 Nov 2014 12:56:31 +0100
From:	Vlastimil Babka <vbabka@...e.cz>
To:	AL13N <alien@...il.be>, linux-kernel@...r.kernel.org
Subject: Re: Memory leaks on atom-based boards?

On 10/27/2014 07:44 PM, AL13N wrote:
> I have several machines with the same OS and kernel (3.14.22).
> 
> 2 of those machines are both atom-based boards and they get OOM, without
> swap being used (MemAvail crawls down towards 0, even though not more
> memory is used on processes).
> 
> Specifically, this one machine, i need to reboot every 3 à 5 days.
> 
> It has 4GB RAM and 4GB swap(SSD), but:
>  - sum of all vmRSS < 500MB
>  - sum of all tmpfs < 100MB
>  - Slab is around 16MB
>  - Cache will usually crawl down towards 0 (just like MemAvail)
>  - I couldn't find another explanation for the loss of Memory
>  - I also asked
> http://serverfault.com/questions/616856/where-did-my-memory-go-on-linux-no-cache-slab-shm-ipcs
> (the other machine)
>  - This problem existed on this hardware at least from 3.12.* upwards.
> 
> I've recompiled kernel to include kmemleak (i figured it'd be some module
> that i've only got with this board), but it didn't point to anything (i
> tested also with the test module, to see if it was working).
> 
> My questions are:
>  - Is this a kernel memory leak somewhere?

Hi, this does look like a kernel memory leak. There was recently a known
one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
You would recognize if this is the fix for you by checking the
thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
that X*2 MB memory is leaked.
You say in the serverfault post that 3.17.2 helped, but the fix is not
in 3.17.2... but it could be just that the circumstances changed and THP
zero pages are no longer freed and realocated.
So if you want to be sure, I would suggest trying again a version where
the problem appeared on your system, and checking the
thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2, which
means some leak did occur there as well, but maybe not so severe.

>  - How can i find out what is allocating all this memory?

There's no simple way, unfortunately. Checking the kpageflags /proc file
might help. IIRC there used to be a patch in -mm tree to store who
allocated what page, but it might be bitrotten.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/