lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO7JXPgq8V5yHM6F2+iXf4XJ9cyT30Hn4ot5b2k7srjsaPc3JQ@mail.gmail.com>
Date:   Mon, 15 May 2023 21:47:03 -0400
From:   Vineeth Remanan Pillai <vineeth@...byteword.org>
To:     luca abeni <luca.abeni@...tannapisa.it>
Cc:     Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

Hi Luca,

On Mon, May 15, 2023 at 4:06 AM luca abeni <luca.abeni@...tannapisa.it> wrote:

>
> this patch is giving me some headaches:
>
Sorry about that.. I was also stressing out on how to get the
reclaiming done right for the past couple of days ;-)

> Vineeth Pillai <vineeth@...byteword.org> wrote:
> [...]
> >   *   Uextra:         Extra bandwidth not reserved:
> > - *                   = Umax - \Sum(u_i / #cpus in the root domain)
> > + *                   = Umax - this_bw
>
> While I agree that this setting should be OK, it ends up with
>         dq = -Uact / Umax * dt
> which I remember I originally tried, and gave some issues
> (I do not remember the details, but I think if you try N
> identical reclaiming tasks, with N > M, the reclaimed time
> is not distributed equally among them?)
>
I have noticed this behaviour where the reclaimed time is not equally
distributed when we have more tasks than available processors. But it
depended on where the task was scheduled. Within the same cpu, the
distribution seemed to be proportional. But the tasks migrated often
and then depending on whether the task got a whole cpu for its
runtime or not, the reclaimed bandwidth differed. I thought that
should be okay as it depended upon where the task landed.

One other problem I saw was cpu usage spiking above max_bw leading to
system hang sometimes. I thought stopping reclaiming when running_bw
gets larger than max_bw(in 4th patch) fixed this, but when I ran the
tests long enough, I did see this hang.

> I need to think a little bit more about this...
>
Thanks for looking into this.. I have a basic idea why tasks with less
bandwidth reclaim less in SMP when number of tasks is less than number
of cpus, but do not yet have a verifiable fix for it.

If patches 1 and 4 looks good to you, we shall drop 2 and 3 and fix the
SMP issue with varying bandwidth separately.. Patch 4 would differ a
bit when I remove 2 and 3 so as to use the formula:
 "dq = -(max{u, (Umax_reclaim - Uinact - Uextra)} / Umax_reclaim) dt"

Thanks for your patience with all these brainstorming:-)

Thanks,
Vineeth

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ