lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDFUBe+qD9z0YYb7yyup_mhdNNX+zopFwcnyh+G41viAw@mail.gmail.com>
Date:   Wed, 23 Jun 2021 19:27:07 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Sachin Sant <sachinp@...ux.vnet.ibm.com>
Cc:     Odin Ugedal <odin@...d.al>,
        Linux Next Mailing List <linux-next@...r.kernel.org>,
        linuxppc-dev@...ts.ozlabs.org,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [powerpc][next-20210621] WARNING at kernel/sched/fair.c:3277
 during boot

On Wed, 23 Jun 2021 at 18:55, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Wed, 23 Jun 2021 at 18:46, Sachin Sant <sachinp@...ux.vnet.ibm.com> wrote:
> >
> >
> > > Ok. This becomes even more weird. Could you share your config file and more details about
> > > you setup ?
> > >
> > > Have you applied the patch below ?
> > > https://lore.kernel.org/lkml/20210621174330.11258-1-vincent.guittot@linaro.org/
> > >
> > > Regarding the load_avg warning, I can see possible problem during attach. Could you add
> > > the patch below. The load_avg warning seems to happen during boot and sched_entity
> > > creation.
> > >
> >
> > Here is a summary of my testing.
> >
> > I have a POWER box with PowerVM hypervisor. On this box I have a logical partition(LPAR) or guest
> > (allocated with 32 cpus 90G memory) running linux-next.
> >
> > I started with a clean slate.
> > Moved to linux-next 5.13.0-rc7-next-20210622 as base code.
> > Applied patch #1 from Vincent which contains changes to dequeue_load_avg()
> > Applied patch #2 from Vincent which contains changes to enqueue_load_avg()
> > Applied patch #3 from Vincent which contains changes to attach_entity_load_avg()
> > Applied patch #4 from https://lore.kernel.org/lkml/20210621174330.11258-1-vincent.guittot@linaro.org/
> >
> > With these changes applied I was still able to recreate the issue. I could see kernel warning
> > during boot.
> >
> > I then applied patch #5 from Odin which contains changes to update_cfs_rq_load_avg()
> >
> > With all the 5 patches applied I was able to boot the kernel without any warning messages.
> > I also ran scheduler related tests from ltp (./runltp -f sched) . All tests including cfs_bandwidth01
> > ran successfully. No kernel warnings were observed.
>
> ok so Odin's patch fixes the problem which highlights that we
> overestimate _sum or don't sync _avg and _sum correctly
>
> I'm going to look at this further

The problem is  "_avg * divider" makes the assumption that all pending
contrib are not null contributions whereas they can be null.

Odin patch is the right way to fix this. Other patches should not be
useful for your problem

>
> >
> > Have also attached .config in case it is useful. config has CONFIG_HZ_100=y
>
> Thanks, i will have a look
>
> >
> > Thanks
> > -Sachin
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ