linux-kernel - Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161129230135.GM7179@merlins.org>
Date:   Tue, 29 Nov 2016 15:01:35 -0800
From:   Marc MERLIN <marc@...lins.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Michal Hocko <mhocko@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Tejun Heo <tj@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

On Tue, Nov 29, 2016 at 09:40:19AM -0800, Marc MERLIN wrote:
> Thanks for the reply and suggestions.
> 
> On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote:
> > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN <marc@...lins.org> wrote:
> > > Now, to be fair, this is not a new problem, it's just varying degrees of
> > > bad and usually only happens when I do a lot of I/O with btrfs.
> > 
> > One situation where I've seen something like this happen is
> > 
> >  (a) lots and lots of dirty data queued up
> >  (b) horribly slow storage
> 
> In my case, it is a 5x 4TB HDD with 
> software raid 5 < bcache < dmcrypt < btrfs
> bcache is currently half disabled (as in I removed the actual cache) or
> too many bcache requests pile up, and the kernel dies when too many
> workqueues have piled up.
> I'm just kind of worried that since I'm going through 4 subsystems
> before my data can hit disk, that's a lot of memory allocations and
> places where data can accumulate and cause bottlenecks if the next
> subsystem isn't as fast.
> 
> But this shouldn't be "horribly slow", should it? (it does copy a few
> terabytes per day, not fast, but not horrible, about 30MB/s or so)
> 
> > Sadly, our defaults for "how much dirty data do we allow" are somewhat
> > buggered. The global defaults are in "percent of memory", and are
> > generally _much_ too high for big-memory machines:
> > 
> >     [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio
> >     20
> >     [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio
> >     10
> 
> I can confirm I have the same.
> 
> > says that it only starts really throttling writes when you hit 20% of
> > all memory used. You don't say how much memory you have in that
> > machine, but if it's the same one you talked about earlier, it was
> > 24GB. So you can have 4GB of dirty data waiting to be flushed out.
> 
> Correct, 24GB and 4GB.
> 
> > And we *try* to do this per-device backing-dev congestion thing to
> > make things work better, but it generally seems to not work very well.
> > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD
> > does really well, and we want to open up, and then it shuts down).
> > 
> > One thing you can try is to just make the global limits much lower. As in
> > 
> >    echo 2 > /proc/sys/vm/dirty_ratio
> >    echo 1 > /proc/sys/vm/dirty_background_ratio
> 
> I will give that a shot, thank you.

And, after 5H of copying, not a single hang, or USB disconnect, or anything.
Obviously this seems to point to other problems in the code, and I have no
idea which layer is a culprit here, but reducing the buffers absolutely
helped a lot.

Thanks much,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/