linux-kernel - Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170228151535.GE26792@dhcp22.suse.cz>
Date:   Tue, 28 Feb 2017 16:15:35 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Robert Kudyba <rkudyba@...dham.edu>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS

On Tue 28-02-17 09:59:35, Robert Kudyba wrote:
> 
> > On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > 
> > On Tue 28-02-17 09:33:49, Robert Kudyba wrote:
> >> 
> >>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@...nel.org> wrote:
> >>> and this one is hitting the min watermark while there is not really
> >>> much to reclaim. Only the page cache which might be pinned and not
> >>> reclaimable from this context because this is GFP_NOFS request. It is
> >>> not all that surprising the reclaim context fights to get some memory.
> >>> There is a huge amount of the reclaimable slab which probably just makes
> >>> a slow progress.
> >>> 
> >>> That is not something completely surprsing on 32b system I am afraid.
> >>> 
> >>> Btw. is the stall repeating with the increased time or it gets resolved
> >>> eventually?
> >> 
> >> Yes and if you mean by repeating it’s not only affecting rsync but
> >> you can see just now automount and NetworkManager get these page
> >> allocation stalls and kswapd0 is getting heavy CPU load, are there any
> >> other settings I can adjust?
> > 
> > None that I am aware of. You might want to talk to FS guys, maybe they
> > can figure out who is pinning file pages so that they cannot be
> > reclaimed. They do not seem to be dirty or under writeback. It would be
> > also interesting to see whether that is a regression. The warning is
> > relatively new so you might have had this problem before just haven't
> > noticed it.
> 
> We have been getting out of memory errors for a while but those seem
> to have gone away.

this sounds suspicious. Are you really sure that this is a new problem?
Btw. is there any reason to use 32b kernel at all? It will always suffer
from a really small lowmem...

> We did just replace the controller in the VessRAID
> as there were some timeouts observed and multiple login/logout
> attempts.
> 
> By FS guys do you mean the linux-fsdevel or linux-fsf list?

yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you
could collect some tracepoints before reporting the issue. At least
those in events/vmscan/*.

-- 
Michal Hocko
SUSE Labs