[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170228151535.GE26792@dhcp22.suse.cz>
Date: Tue, 28 Feb 2017 16:15:35 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Robert Kudyba <rkudyba@...dham.edu>
Cc: linux-kernel@...r.kernel.org
Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS
On Tue 28-02-17 09:59:35, Robert Kudyba wrote:
>
> > On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@...nel.org> wrote:
> >
> > On Tue 28-02-17 09:33:49, Robert Kudyba wrote:
> >>
> >>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@...nel.org> wrote:
> >>> and this one is hitting the min watermark while there is not really
> >>> much to reclaim. Only the page cache which might be pinned and not
> >>> reclaimable from this context because this is GFP_NOFS request. It is
> >>> not all that surprising the reclaim context fights to get some memory.
> >>> There is a huge amount of the reclaimable slab which probably just makes
> >>> a slow progress.
> >>>
> >>> That is not something completely surprsing on 32b system I am afraid.
> >>>
> >>> Btw. is the stall repeating with the increased time or it gets resolved
> >>> eventually?
> >>
> >> Yes and if you mean by repeating it’s not only affecting rsync but
> >> you can see just now automount and NetworkManager get these page
> >> allocation stalls and kswapd0 is getting heavy CPU load, are there any
> >> other settings I can adjust?
> >
> > None that I am aware of. You might want to talk to FS guys, maybe they
> > can figure out who is pinning file pages so that they cannot be
> > reclaimed. They do not seem to be dirty or under writeback. It would be
> > also interesting to see whether that is a regression. The warning is
> > relatively new so you might have had this problem before just haven't
> > noticed it.
>
> We have been getting out of memory errors for a while but those seem
> to have gone away.
this sounds suspicious. Are you really sure that this is a new problem?
Btw. is there any reason to use 32b kernel at all? It will always suffer
from a really small lowmem...
> We did just replace the controller in the VessRAID
> as there were some timeouts observed and multiple login/logout
> attempts.
>
> By FS guys do you mean the linux-fsdevel or linux-fsf list?
yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you
could collect some tracepoints before reporting the issue. At least
those in events/vmscan/*.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists