lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170612062918.GA4145@dhcp22.suse.cz>
Date:   Mon, 12 Jun 2017 08:29:18 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Larry Finger <Larry.Finger@...inger.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: Sleeping BUG in khugepaged for i586

On Sun 11-06-17 16:28:11, David Rientjes wrote:
> On Sat, 10 Jun 2017, Michal Hocko wrote:
> 
> > > > I would just pull the cond_resched out of __collapse_huge_page_copy
> > > > right after pte_unmap. But I am not really sure why this cond_resched is
> > > > really needed because the changelog of the patch which adds is is quite
> > > > terse on details.
> > > 
> > > I'm not sure what could possibly be added to the changelog.  We have 
> > > encountered need_resched warnings during the iteration.
> > 
> > Well, the part the changelog is not really clear about is whether the
> > HPAGE_PMD_NR loops itself is the source of the stall. This would be
> > quite surprising because doing 512 iterations taking up to 20+s sounds
> > way to much.
> 
> I have no idea where you come up with 20+ seconds.

OK, I misread your report as a soft lockup.

> These are not soft lockups, these are need_resched warnings.  We monitor 
> how long need_resched has been set and when a thread takes an excessive 
> amount of time to reschedule after it has been set.  A loop of 512 pages 
> with ptl contention and doing {clear,copy}_user_highpage() shows that 
> need_resched can sit without scheduling for an excessive amount of time.

How much is excessive here?
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ