lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20161130074014.GN3924@linux.vnet.ibm.com>
Date:   Tue, 29 Nov 2016 23:40:14 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Sudeep Holla <sudeep.holla@....com>,
        Boris Zhmurov <bb@...nelpanic.ru>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org,
        kernel test robot <xiaolong.ye@...el.com>
Subject: Re: [lkp] [mm] e7c1db75fe:
 BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c

On Wed, Nov 30, 2016 at 08:16:02AM +0100, Michal Hocko wrote:
> On Tue 29-11-16 11:14:48, Paul E. McKenney wrote:
> > On Tue, Nov 29, 2016 at 05:21:19PM +0000, Sudeep Holla wrote:
> > > On Sun, Nov 27, 2016 at 6:16 PM, kernel test robot
> > > <xiaolong.ye@...el.com> wrote:
> > > >
> > > > FYI, we noticed the following commit:
> > > >
> > > > commit e7c1db75fed821a961ce1ca2b602b08e75de0cd8 ("mm: Prevent __alloc_pages_nodemask() RCU CPU stall warnings")
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next
> > > >
> > > > in testcase: boot
> > > >
> > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu Nehalem -smp 2 -m 1G
> > > >
> > > > caused below changes:
> > > >
> > > [...]
> > > 
> > > > [    8.953192] BUG: sleeping function called from invalid context at mm/page_alloc.c:3746
> > > > [    8.956353] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0
> > > 
> > > I am observing similar BUG/backtrace even on ARM64 platform.
> > 
> > Does the (untested) patch below help?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit ccc0666e2049e5818c236e647cf20c552a7b053b
> > Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > Date:   Tue Nov 29 11:06:05 2016 -0800
> > 
> >     rcu: Allow boot-time use of cond_resched_rcu_qs()
> >     
> >     The cond_resched_rcu_qs() macro is used to force RCU quiescent states into
> >     long-running in-kernel loops.  However, some of these loops can execute
> >     during early boot when interrupts are disabled, and during which time
> >     it is therefore illegal to enter the scheduler.  This commit therefore
> >     makes cond_resched_rcu_qs() be a no-op during early boot.
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> 
> This is not the problem with your "mm: Prevent __alloc_pages_nodemask()
> RCU CPU stall warnings", though. The main problem imho is that the
> allocator might be called from the atomic contexts (aka
> gfp_mask & ~__GFP_DIRECT_RECLAIM). Besides that I do not think that any
> variant of cond_resched inside the allocator hot path
> __alloc_pages_nodemask is just wrong. If anything such a scheduling/RCU
> point should be added to the slow path. But as I've said earlier we
> already have these points in that path so new ones shouldn't be really
> necessary.
> 
> Could you drop this patch Paul, please?

Good point, dropped.

Boris's test results show that something else is needed, will review
his splats and see what else presents itself.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ