linux-kernel - Re: Interactivity regression since v3.11 in mm/vmscan.c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140606091620.GC26253@dhcp22.suse.cz>
Date:	Fri, 6 Jun 2014 11:16:20 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Felipe Contreras <felipe.contreras@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>
Subject: Re: Interactivity regression since v3.11 in mm/vmscan.c

On Thu 05-06-14 09:00:10, Felipe Contreras wrote:
> On Thu, Jun 5, 2014 at 8:37 AM, Michal Hocko <mhocko@...e.cz> wrote:
> > On Thu 05-06-14 06:33:40, Felipe Contreras wrote:
> 
> >> For a while I've noticed that my machine bogs down in certain
> >> situations, usually while doing heavy I/O operations, it is not just the
> >> I/O operations, but everything, including the graphical interface, even
> >> the mouse pointer.
> >>
> >> As far as I can recall this did not happen in the past.
> >>
> >> I noticed this specially on certain operations, for example updating a
> >> a game on Steam (to an exteranl USB 3.0 device), or copying TV episodes
> >> to a USB memory stick (probably flash-based).
> >
> > We had a similar report for opensuse. The common part was that there was
> > an IO to a slow USB device going on.
> 
> Well, it's a USB 3.0 device, I can write at 250 MB/s, so it's not
> really that slow.
> 
> And in fact, when I read and write to and from the same USB 3.0
> device, I don't see the issue.
> 
> >> Then I went back to the latest stable version (v3.14.5), and commented
> >> out the line I think is causing the slow down:
> >>
> >>   if (nr_unqueued_dirty == nr_taken || nr_immediate)
> >>         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >
> > Yes, I came to the same check. I didn't have any confirmation yet so
> > thanks for your confirmation. I've suggested to reduce this
> > congestion_wait only to kswapd:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 32c661d66a45..ef6a1c0e788c 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1566,7 +1566,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >                  * implies that pages are cycling through the LRU faster than
> >                  * they are written so also forcibly stall.
> >                  */
> > -               if (nr_unqueued_dirty == nr_taken || nr_immediate)
> > +               if ((nr_unqueued_dirty == nr_taken || nr_immediate) && current_is_kswapd())
> >                         congestion_wait(BLK_RW_ASYNC, HZ/10);
> >         }
> 
> Unfortunately that doesn't fix the issue for me.

That is really interesting. So removing the test completely helps but
reducing it to kswapd doesn't. I would expect stalls coming from direct
reclaimers not the kswapd.

Mel has a nice systemtap script (attached) to watch for stalls. Maybe
you can give it a try?

-- 
Michal Hocko
SUSE Labs

View attachment "watch-dstate-new.pl" of type "text/x-perl" (11167 bytes)