[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210731004951.GA77573@fuller.cnet>
Date: Fri, 30 Jul 2021 21:49:51 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Nitesh Lal <nilal@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>,
Christoph Lameter <cl@...ux.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Alex Belits <abelits@...its.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [patch 1/4] add basic task isolation prctl interface
On Fri, Jul 30, 2021 at 07:36:31PM -0400, Nitesh Lal wrote:
> On Fri, Jul 30, 2021 at 4:21 PM Marcelo Tosatti <mtosatti@...hat.com> wrote:
>
> > Add basic prctl task isolation interface, which allows
> > informing the kernel that application is executing
> > latency sensitive code (where interruptions are undesired).
> >
> > Interface is described by task_isolation.rst (added by this patch).
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@...hat.com>
> >
> > Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> > @@ -0,0 +1,187 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Task isolation prctl interface
> > +===============================
> > +
> > +Certain types of applications benefit from running uninterrupted by
> > +background OS activities. Realtime systems and high-bandwidth networking
> > +applications with user-space drivers can fall into the category.
> > +
> > +
> > +To create a OS noise free environment for the application, this
> > +interface allows userspace to inform the kernel the start and
> > +end of the latency sensitive application section (with configurable
> > +system behaviour for that section).
> > +
> > +The prctl options are:
> > +
> > +
> > + - PR_ISOL_FEAT: Retrieve supported features.
> > + - PR_ISOL_GET: Retrieve task isolation parameters.
> > + - PR_ISOL_SET: Set task isolation parameters.
> > + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > + - PR_ISOL_CTRL_SET: Set task isolation state (enable/disable task
> > isolation).
> > +
> >
>
> Didn't we decide to replace FEAT/FEATURES with MODE?
Searching for the definition of mode:
mode: one of a series of ways that a machine can be made to work
in manual/automatic mode.
mode: a particular way of doing something.
mode: a way of operating, living, or behaving.
So "mode" seems to fit the case where one case can be chosen
between different choices (exclusively).
Now for this case it seems a composition of things is what is
happening, because quiescing might be functional with both
"syscalls allowed" and "syscalls not allowed" modes
(in that case, "mode" makes more sense).
> > +The isolation parameters and state are not inherited by
> > +children created by fork(2) and clone(2). The setting is
> > +preserved across execve(2).
> > +
> > +The sequence of steps to enable task isolation are:
> > +
> > +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> > +
> > +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> > +
> > +3. Activate or deactivate task isolation features
> > + (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> > +
> > +This interface is based on ideas and code from the
> > +task isolation patchset from Alex Belits:
> > +https://lwn.net/Articles/816298/
> > +
> > +--------------------
> > +Feature description
> > +--------------------
> > +
> > + - ``ISOL_F_QUIESCE``
> > +
> > + This feature allows quiescing select kernel activities on
> > + return from system calls.
> > +
> > +---------------------
> > +Interface description
> > +---------------------
> > +
> > +**PR_ISOL_FEAT**:
> > +
> > + Returns the supported features and feature
> > + capabilities, as a bitmask. Features and its capabilities
> > + are defined at include/uapi/linux/task_isolation.h::
> > +
> > + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies whether to return
> > + supported features (if zero), or feature capabilities
> > + (if not zero). Possible non-zero values for 'feat' are:
> >
>
> By feature capabilities you mean the kernel activities (vmstat, tlb_flush)?
Not necessarily, but in the case of ISOL_F_QUIESCE, yes, the different
kernel activities that might interrupt the task.
Feature capabilities is a generic term. For example, one might add
ISOL_F_NOTIFY with ISOL_F_NOTIFY_SIGNAL capabilities.
or
ISOL_F_NOTIFY with ISOL_F_NOTIFY_EVENTFD capabilities.
or
ISOL_F_future_feature with ISOL_F_future_feature_capability.
> +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + If arg3 is zero, returns a bitmask containing
> > + which kernel activities are supported for quiescing.
> > +
> > + If arg3 is ISOL_F_QUIESCE_DEFMASK, returns
> > + default_quiesce_mask, a system-wide configurable.
> > + See description of default_quiesce_mask below.
> > +
> > +**PR_ISOL_GET**:
> > +
> > + Retrieve task isolation feature configuration.
> > + The general format is::
> > +
> > + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > +
> > + Possible values for feat are:
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + Returns a bitmask containing which kernel
> > + activities are enabled for quiescing.
> > +
> > +
> > +**PR_ISOL_SET**:
> > +
> > + Configures task isolation features. The general format is::
> > +
> > + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies which feature to configure.
> > + Possible values for feat are:
> >
>
> We should be able to enable multiple features as well via this? Something
> like ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS as you have mentioned in the
> last posting.
One probably would do it separately (PR_ISOL_SET configures each
feature separately):
ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0);
if (ret == -1) {
perror("prctl PR_ISOL_FEAT");
return EXIT_FAILURE;
}
if (!(ret & ISOL_F_BLOCK_INTERRUPTORS)) {
printf("ISOL_F_BLOCK_INTERRUPTORS feature unsupported, quitting\n");
return EXIT_FAILURE;
}
ret = prctl(PR_ISOL_SET, ISOL_F_BLOCK_INTERRUPTORS, params...);
if (ret == -1) {
perror("prctl PR_ISOL_SET");
return EXIT_FAILURE;
}
/* configure ISOL_F_QUIESCE, ISOL_F_NOTIFY,
* ISOL_F_future_feature... */
ctrl_set_mask = ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS|
ISOL_F_NOTIFY|ISOL_F_future_feature;
/*
* activate isolation mode with the features
* as configured above
*/
ret = prctl(PR_ISOL_CTRL_SET, ctrl_set_mask, 0, 0, 0);
if (ret == -1) {
perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
return EXIT_FAILURE;
}
latency sensitive loop
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + The 'arg3' argument is a bitmask specifying which
> > + kernel activities to quiesce. Possible bit sets are:
> > +
> > + - ``ISOL_F_QUIESCE_VMSTATS``
> > +
> > + VM statistics are maintained in per-CPU counters to
> > + improve performance. When a CPU modifies a VM statistic,
> > + this modification is kept in the per-CPU counter.
> > + Certain activities require a global count, which
> > + involves requesting each CPU to flush its local counters
> > + to the global VM counters.
> > +
> > + This flush is implemented via a workqueue item, which
> > + might schedule a workqueue on isolated CPUs.
> > +
> > + To avoid this interruption, task isolation can be
> > + configured to, upon return from system calls,
> > synchronize
> > + the per-CPU counters to global counters, thus avoiding
> > + the interruption.
> > +
> > + To ensure the application returns to userspace
> > + with no modified per-CPU counters, its necessary to
> > + use mlockall() in addition to this isolcpus flag.
> > +
> > +**PR_ISOL_CTRL_GET**:
> > +
> > + Retrieve task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > +
> > + Returns which isolation features are active.
> > +
> > +**PR_ISOL_CTRL_SET**:
> > +
> > + Activates/deactivates task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > +
> > + The 'mask' argument specifies which features
> > + to activate (bit set) or deactivate (bit clear).
> > +
> > + For ISOL_F_QUIESCE, quiescing of background activities
> > + happens on return to userspace from the
> > + prctl(PR_ISOL_CTRL_SET) call, and on return from
> > + subsequent system calls.
> > +
> > + Quiescing can be adjusted (while active) by
> > + prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> >
>
> Why do we need this additional control? We should be able to enable or
> disable task isolation using the _GET_ and _SET_ calls, isn't it?
The distinction is so one is able to configure the features separately,
and then enter isolated mode with them activated.
> > +
> > +--------------------
> > +Default quiesce mask
> > +--------------------
> > +
> > +Applications can either explicitly specify individual
> > +background activities that should be quiesced, or
> > +obtain a system configurable value, which is to be
> > +configured by the system admin/mgmt system.
> > +
> > +/sys/kernel/task_isolation/available_quiesce lists, as
> > +one string per line, the activities which the kernel
> > +supports quiescing.
> >
>
> Probably replace 'quiesce' with 'quiesce_activities' because we are really
> controlling the kernel activities via this control and not the quiesce
> state/feature itself.
OK, makes sense.
> > +
> > +To configure the default quiesce mask, write a comma separated
> > +list of strings (from available_quiesce) to
> > +/sys/kernel/task_isolation/default_quiesce.
> > +
> > +echo > /sys/kernel/task_isolation/default_quiesce disables
> > +all quiescing via ISOL_F_QUIESCE_DEFMASK.
> > +
> > +Using ISOL_F_QUIESCE_DEFMASK allows for the application to
> > +take advantage of future quiescing capabilities without
> > +modification (provided default_quiesce is configured
> > +accordingly).
> >
>
> ISOL_F_QUIESCE_DEFMASK is really telling to quite all kernel
> activities including the one that is not currently supported or I am
> misinterpreting something?
Its telling to quiesce activities that are configured via
/sys/kernel/task_isolation/default_quiesce, including
ones that are not currently supported (in the future,
/sys/kernel/task_isolation/default_quiesce will have to contain the bit
for the new feature as 1).
So userspace can either:
quiesce_mask = value of /sys/kernel/task_isolation/default_quiesce
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);
(so that new features might be automatically enabled by
a sysadmin).
or
quiesce_mask = application choice of bits
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);
(so that new features might be automatically enabled by
a sysadmin).
Powered by blists - more mailing lists