linux-kernel - Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170228165638.GA27726@dhcp22.suse.cz>
Date:   Tue, 28 Feb 2017 17:56:39 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Robert Kudyba <rkudyba@...dham.edu>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS

On Tue 28-02-17 11:19:33, Robert Kudyba wrote:
> 
> > On Feb 28, 2017, at 10:15 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > 
> > On Tue 28-02-17 09:59:35, Robert Kudyba wrote:
> >> 
> >>> On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@...nel.org> wrote:
> >>> 
> >>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote:
> >>>> 
> >>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@...nel.org> wrote:
> >>>>> and this one is hitting the min watermark while there is not really
> >>>>> much to reclaim. Only the page cache which might be pinned and not
> >>>>> reclaimable from this context because this is GFP_NOFS request. It is
> >>>>> not all that surprising the reclaim context fights to get some memory.
> >>>>> There is a huge amount of the reclaimable slab which probably just makes
> >>>>> a slow progress.
> >>>>> 
> >>>>> That is not something completely surprsing on 32b system I am afraid.
> >>>>> 
> >>>>> Btw. is the stall repeating with the increased time or it gets resolved
> >>>>> eventually?
> >>>> 
> >>>> Yes and if you mean by repeating it’s not only affecting rsync but
> >>>> you can see just now automount and NetworkManager get these page
> >>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any
> >>>> other settings I can adjust?
> >>> 
> >>> None that I am aware of. You might want to talk to FS guys, maybe they
> >>> can figure out who is pinning file pages so that they cannot be
> >>> reclaimed. They do not seem to be dirty or under writeback. It would be
> >>> also interesting to see whether that is a regression. The warning is
> >>> relatively new so you might have had this problem before just haven't
> >>> noticed it.
> >> 
> >> We have been getting out of memory errors for a while but those seem
> >> to have gone away.
> > 
> > this sounds suspicious. Are you really sure that this is a new problem?
> > Btw. is there any reason to use 32b kernel at all? It will always suffer
> > from a really small lowmem…
> 
> No this has been a problem for a while. Not sure if this server can
> handle 64b it’s a bit old.

Ok, this is unfortunate. There is usually not much interest to fixing
32b issues which are inherent to the used memory model and which are not
regressions which would be fixable, I am afraid.

> >> We did just replace the controller in the VessRAID
> >> as there were some timeouts observed and multiple login/logout
> >> attempts.
> >> 
> >> By FS guys do you mean the linux-fsdevel or linux-fsf list?
> > 
> > yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you
> > could collect some tracepoints before reporting the issue. At least
> > those in events/vmscan/*.
> 
> Will do here’s a perf report:

this will not tell us much. Tracepoints have much better chance to tell
us how reclaim is progressing.
-- 
Michal Hocko
SUSE Labs