linux-kernel - Re: Memory leaks on atom-based boards?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27d4dc6a448169861446f8c1b3c3cadd.squirrel@mail.rmail.be>
Date:	Sun, 9 Nov 2014 16:38:34 -0000
From:	"AL13N" <alien@...il.be>
To:	linux-kernel@...r.kernel.org
Cc:	"Vlastimil Babka" <vbabka@...e.cz>
Subject: Re: Memory leaks on atom-based boards?

> On 10/27/2014 07:44 PM, AL13N wrote:
>> I have several machines with the same OS and kernel (3.14.22).
>>
>> 2 of those machines are both atom-based boards and they get OOM, without
>> swap being used (MemAvail crawls down towards 0, even though not more
>> memory is used on processes).
>>
>> Specifically, this one machine, i need to reboot every 3 à 5 days.
>>
>> It has 4GB RAM and 4GB swap(SSD), but:
>>  - sum of all vmRSS < 500MB
>>  - sum of all tmpfs < 100MB
>>  - Slab is around 16MB
>>  - Cache will usually crawl down towards 0 (just like MemAvail)
>>  - I couldn't find another explanation for the loss of Memory
>>  - I also asked
>> http://serverfault.com/questions/616856/where-did-my-memory-go-on-linux-no-cache-slab-shm-ipcs
>> (the other machine)
>>  - This problem existed on this hardware at least from 3.12.* upwards.
>>
>> I've recompiled kernel to include kmemleak (i figured it'd be some
>> module
>> that i've only got with this board), but it didn't point to anything (i
>> tested also with the test module, to see if it was working).
>>
>> My questions are:
>>  - Is this a kernel memory leak somewhere?
>
> Hi, this does look like a kernel memory leak. There was recently a known
> one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
> it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
> You would recognize if this is the fix for you by checking the
> thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
> that X*2 MB memory is leaked.
> You say in the serverfault post that 3.17.2 helped, but the fix is not
> in 3.17.2... but it could be just that the circumstances changed and THP
> zero pages are no longer freed and realocated.
> So if you want to be sure, I would suggest trying again a version where
> the problem appeared on your system, and checking the
> thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2, which
> means some leak did occur there as well, but maybe not so severe.


i was gonna tell you guys, but i was waiting until i was sure, but indeed
3.17.2 fixed, it, where i had OOM after 3, maybe 4 days (for at least 2
months), now i'm up more than 4 days and the MemAvailable is still high
enough... at about 3.5GB whereas otherwise it would dwindle until 0. (at
about 1GB/day)

Well, it results to 0 on 3.17.2 ... so... i guess not? i'll keep this
value under observation...


>>  - How can i find out what is allocating all this memory?
>
> There's no simple way, unfortunately. Checking the kpageflags /proc file
> might help. IIRC there used to be a patch in -mm tree to store who
> allocated what page, but it might be bitrotten.


i checked what was in kpageflags (or kpagecount) but it's all some kind of
binary stuff...

do i need some tool to interprete these values?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/