[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC6yHM4LON5ASooVa_eUaDYsN1W0HYTMX76yHDxf8Mff0mKqiA@mail.gmail.com>
Date: Fri, 29 Jul 2016 14:31:45 -0400
From: Francis Giraldeau <francis.giraldeau@...il.com>
To: Chris Metcalf <cmetcalf@...lanox.com>
Cc: Christoph Lameter <cl@...ux.com>,
Gilad Ben Yossef <giladb@...lanox.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...capital.net>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
linux-doc@...r.kernel.org, linux-api@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: clocksource_watchdog causing scheduling of timers every second
(was [v13] support "task_isolation" mode)
I tested this patch on 4.7 and confirm that irq_work does not occurs anymore on
the isolated cpu. Thanks!
I don't know of any utility to test the task isolation feature, so I started
one:
https://github.com/giraldeau/taskisol
The script exp.sh runs the taskisol to test five different conditions, but some
behavior is not the one I would expect.
At startup, it does:
- register a custom signal handler for SIGUSR1
- sched_setaffinity() on CPU 1, which is isolated
- mlockall(MCL_CURRENT) to prevent undesired page faults
The default strict mode is set with:
prctl(PR_SET_TASK_ISOLATION, PR_TASK_ISOLATION_ENABLE)
And then, the syscall write() is called. From previous discussion, the SIGKILL
should be sent, but it does not occur. When instead of calling write() we force
a page fault, then the SIGKILL is correctly sent.
When instead a custom signal handler SIGUSR1:
prctl(PR_SET_TASK_ISOLATION, PR_TASK_ISOLATION_USERSIG |
PR_TASK_ISOLATION_SET_SIG(SIGUSR1)
The signal is never delivered, either when the syscall is issued nor when the
page fault occurs.
I can confirm that, if two taskisol are created on the same CPU, the second one
fails with Resource temporarily unavailable, so that's fine.
I can add more test cases depending on your comments, such as the TLB events
triggered by another thread on a non-isolated core. But maybe there is already
a test suite?
Francis
2016-07-27 15:58 GMT-04:00 Chris Metcalf <cmetcalf@...lanox.com>:
> On 7/27/2016 3:53 PM, Christoph Lameter wrote:
>>
>> On Wed, 27 Jul 2016, Chris Metcalf wrote:
>>
>>> Looks good. Did you omit the equivalent fix in
>>> clocksource_start_watchdog()
>>> on purpose? For now I just took your change, but tweaked it to add the
>>> equivalent diff with cpumask_first_and() there.
>>
>> Can the watchdog be started on an isolated cpu at all? I would expect that
>> the code would start a watchdog only on a housekeeping cpu.
>
>
> The code just starts the watchdog initially on the first online cpu.
> In principle you could have configured that as an isolated cpu, so
> without any change to that code, you'd interrupt that cpu.
>
> I guess another way to slice it would be to start the watchdog on the
> current core. But just using the same idiom as in clocksource_watchdog()
> seems cleanest to me.
>
> I added your patch to the series and pushed it up (along with adding your
> Tested-by to the x86 enablement commit). It's still based on 4.6 so I'll
> need
> to rebase it once the merge window closes.
>
>
> --
> Chris Metcalf, Mellanox Technologies
> http://www.mellanox.com
>
Powered by blists - more mailing lists