lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200401100029.1445-4-john.mathew@unikie.com>
Date:   Wed,  1 Apr 2020 13:00:29 +0300
From:   John Mathew <john.mathew@...kie.com>
To:     linux-doc@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, corbet@....net, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, tsbogend@...ha.franken.de,
        lukas.bulwahn@...il.com, x86@...nel.org,
        linux-mips@...r.kernel.org, tglx@...utronix.de,
        mostafa.chamanara@...emark.com,
        John Mathew <john.mathew@...kie.com>
Subject: [RFC PATCH 3/3] docs: scheduler: Add introduction to scheduler context-switch

Add documentation for introduction to
 -context-switch
 -x86 context-switch
 -MIPS context switch

Suggested-by: Lukas Bulwahn <lukas.bulwahn@...il.com>
Co-developed-by: Mostafa Chamanara <mostafa.chamanara@...emark.com>
Signed-off-by: Mostafa Chamanara <mostafa.chamanara@...emark.com>
Signed-off-by: John Mathew <john.mathew@...kie.com>
---
 Documentation/scheduler/arch-specific.rst     |  3 +
 Documentation/scheduler/context-switching.rst | 71 +++++++++++++++++
 Documentation/scheduler/index.rst             |  1 +
 .../scheduler/mips-context-switch.rst         | 78 +++++++++++++++++++
 .../scheduler/x86-context-switch.rst          | 59 ++++++++++++++
 5 files changed, 212 insertions(+)
 create mode 100644 Documentation/scheduler/context-switching.rst
 create mode 100644 Documentation/scheduler/mips-context-switch.rst
 create mode 100644 Documentation/scheduler/x86-context-switch.rst

diff --git a/Documentation/scheduler/arch-specific.rst b/Documentation/scheduler/arch-specific.rst
index c9c34863d994..65dc393b605f 100644
--- a/Documentation/scheduler/arch-specific.rst
+++ b/Documentation/scheduler/arch-specific.rst
@@ -9,3 +9,6 @@ Architecture Specific Scheduler Implementation Differences
 
 .. toctree::
    :maxdepth: 2
+
+   x86-context-switch
+   mips-context-switch
diff --git a/Documentation/scheduler/context-switching.rst b/Documentation/scheduler/context-switching.rst
new file mode 100644
index 000000000000..dd4ff63b1e97
--- /dev/null
+++ b/Documentation/scheduler/context-switching.rst
@@ -0,0 +1,71 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==========================
+Process context switching
+==========================
+
+Context Switching
+-----------------
+
+Context switching, the switching from a running task to another, is handled by
+the :c:func:`context_switch()` function defined in kernel/sched.c . It is called
+by __schedule() when a new process has been selected to run.
+
+ The execution flow is as follows:
+
+* Calls prepare_task_switch() to prepare both previous and new task by
+  storing or changing some values in their task_struct.
+
+
+* Calls macro :c:macro:`arch_start_context_switch()`
+  A facility to provide batching of the reload of page tables and  other process
+  state with the actual context switch code for  paravirtualized guests.  By
+  convention, only one of the batched  update (lazy) modes (CPU, MMU) should be
+  active at any given time,  entry should never be nested, and entry and exits
+  should always be  paired. This is for sanity of maintaining and reasoning about
+  the kernel code.  In this case, the exit (end of the context switch) is  in
+  architecture-specific code, and so doesn't need a generic definition.
+
+
+* The next few steps consists of handling the transfer of real and anonymous
+  address spaces between the switching tasks.  Four possible context switches are
+
+  - kernel task switching to another kernel task.
+  - user task switching to a kernel task.
+  - kernel task switching to user task,
+  - user task switching to  user task.
+
+For a kernel task switching to kernel task :c:func:`enter_lazy_tlb()` is called
+which is an architecture specific implementation to handle a context without an
+mm. Architectures implement lazy tricks to minimize tlb flushes here.
+Then the active address space from the previous task is borrowed (transferred)
+to the next task. The active address space of the previous task is set to NULL.
+
+For a user task switching to kernel task it will have a real address space. This
+address space is pinned by calling :c:func:`mmgrab()` . This makes sure that the
+address space will not get freed even after the previous task exits.
+
+For a user task switching to user task the architecture specific
+:c:func:`switch_mm_irqs_off()` or :c:func:`switch_mm()` functions. The main
+functionality of this calls is to switch the address space between the
+user space processes. This includes switching the page table pointers and
+ensuring that the new address space has a valid ASID.
+
+For a kernel task switching to a user task, the active address space of the
+kernel task is transferred to the user task and the active address space of the
+kernel task is set to NULL.
+
+* Next the  :c:func:`prepare_lock_switch()` function is called for a lockdep
+  release of the runqueue lock to handle the special case of the scheduler in which
+  the runqueue lock will be released by the next task.
+
+* Then the architecture specific implementation of the :c:func:`switch_to()`
+  function is called to switch the register state and the stack. This involves
+  saving and restoring stack information and the processor registers and any other
+  architecture-specific state that must be managed and restored on a per-process
+  basis.
+
+* Calls finish_task_switch()  must be called after the context switch,
+  paired with a prepare_task_switch() call before the context switch.It will
+  reconcile locking set up by prepare_task_switch, and do any other
+  architecture-specific cleanup actions.
diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
index 9772cf81fd96..289e06a68764 100644
--- a/Documentation/scheduler/index.rst
+++ b/Documentation/scheduler/index.rst
@@ -18,6 +18,7 @@ specific implementation differences.
 
     overview
     cfs-sched-overview
+    context-switching
     sched-features
     arch-specific.rst
     sched-debugging.rst
diff --git a/Documentation/scheduler/mips-context-switch.rst b/Documentation/scheduler/mips-context-switch.rst
new file mode 100644
index 000000000000..e917bbe1c104
--- /dev/null
+++ b/Documentation/scheduler/mips-context-switch.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==============================================
+MIPS Architecture And Scheduler implementation
+==============================================
+
+Multi-threading in MIPS CPUs
+-----------------------------
+The MIPS architecture defines four coprocessors.
+
+- CP0: supports virtual memory system and exception handling.
+- CP1: reserved for the floating point coprocessor, the FPU
+- CP2: available for specific implementations.
+- CP3: reserved for floating point operations in the release 1 implementation
+       of MIPS64.
+
+MIPS32 and MIPS64 architectures provide support for optional components known
+as Modules or Application Specific Extensions. The MT module enables the
+architecture to support multi-threaded implementations. This includes support
+for virtual processors and light weight thread contexts. Implementation of MT
+features depends on the individual MIPS cores. The virtual processing element (VPE)
+maintains a complete copy of the processor state as seen by the software system
+which includes interrupts, register set, and MMU. This enables a single processor
+to appear to an SMP operating system like two separate cores if it has 2 VPE's.
+For example two separate OS can run on each VPE such as Linux and and an RTOS.
+
+A lighter version of VPE enables threading at the user/application software level.
+It is called Thread Context (TC). TC, is the hardware state necessary to support
+a thread of execution. This includes a set of general purpose registers (GPRs),
+a program counter (PC), and some multiplier and coprocessor state. TC's have
+common execution unit. MIPS ISA provides instructions to utilize TC.
+
+The Quality of service block of the MT module allows the allocation of processor
+cycles to threads, and sets relative thread priorities. This enables 2 thread
+prioritization mechanisms. The user can prioritize one thread over the other as
+well as allocate a specific ratio of the cycles to specific threads. These
+mechanisms help to allocate bandwidth a set of threads effectively. QoS block
+improves system level determinism  and predictability. QosS block can be replaced
+by more application specific blocks.
+
+MIPS Context Switch
+-------------------
+
+Context switch behavior specific to MIPS begins in the way :c:macro:`switch_to()`
+macro is implemented. The main steps in the MIPS implementation of the macro are:
+
+* Handle the FPU affinity management feature . This feature is enabled by the
+  :c:macro:`CONFIG_MIPS_MT_FPAFF` at build time The macro checks if the FPU was
+  used in the most recent time slice. In case FPU was not used, the restriction of
+  having to run on a cpu with FPU is removed.
+* For the previous task, disable the fpu and clear the bit indicating the FPU was
+  used in this quantum for the task.
+* If fpu is enabled in the next task, check FCSR for any unmasked exceptions
+  pending, clear them and send a signal.
+* if MIPS DSP modules is enabled, save the dsp context of the previous task and
+  restore the dsp context of the next task.
+* If coprocessor 2 is present set the access allowed field of the coprocessor 2.
+* if coprocessor 2 access allowed field was set in previous task, clear it.
+* clear the the access allowed field of the coprocessor 2.
+* clear the llbit on MIPS release 6 such that instruction eretnc can be used
+  unconditionally when returning to userland in entry.S. LLbit is used to specify
+  operation for instructions that provide atomic read-modify-write. LLbit is set
+  when a linked load occurs and is tested by the conditional store. It is cleared,
+  during other CPU operation, when a store to the location would no longer be
+  atomic. In particular, it is cleared by exception return instructions.
+  eretnc instruction enables to return from interrupt, exception, or error trap
+  without clearing the LLbit.
+* clear the global variable ll_bit used by mips exception handler.
+* write the thread pointer to the mips userlocal register if the cpu supports
+  this feature. This register is not interpreted by hardware and can be used to
+  share data between privileged and unprivileged software.
+* if hardware watchpoint feature is enabled during build the watchpoint registers
+  are restored from the next task.
+* Finally the mips processor specific implementation of the :c:func:`resume()`
+  function is called. It restores the registers of the next task including the
+  stack pointer. The implementation is in assembly::
+
+    arch/mips/kernel/r4k_switch.S
diff --git a/Documentation/scheduler/x86-context-switch.rst b/Documentation/scheduler/x86-context-switch.rst
new file mode 100644
index 000000000000..ae7b2e09453a
--- /dev/null
+++ b/Documentation/scheduler/x86-context-switch.rst
@@ -0,0 +1,59 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+X86 Context Switch
+------------------
+
+The x86 architecture context switching logic is as follows.
+After the switching of MM in the scheduler :c:func:`context_switch()` the call
+to the x86 implementation of :c:macro:`switch_to()`
+is made.  For x86 arch it is located at ::
+
+    arch/x86/include/asm/switch_to.h
+
+Since 4.9, switch_to() has been broken in to two parts: a :c:func:`prepare_switch_to()`
+macro and the inline assembly portion of has been moved to an actual assembly
+file ::
+
+    arch/x86/entry/entry_64.S
+
+* There is still a C portion of the switch which occurs via a jump in the middle
+  of the assembly code. The source is located in arch/x86/kernel/process_64.c
+  since 2.6.24
+
+The main function of the prepare_switch_to() is to handle the case when stack
+uses virtual memory. This is configured at build time and is mostly enable in
+most modern distributions. This function accesses the stack pointer to prevent a
+double fault.Switching to a stack that has top-level paging entry that is not
+present in the current MM will result in a page fault which will be promoted to
+double fault and the result is a panic. So it is necessary to probe the stack now
+so that the vmalloc_fault can fix the page tables.
+
+The main steps of the inline assembly function __switch_to_asm() are:
+
+* store the callee saved registers to the old stack which will be switched away from.
+* swaps the stack pointers between the old and the new task.
+* move the stack canary value to the current cpu's interrupt stack.
+* if return trampoline is enabled, overwrite all entries in the RSB on exiting
+  a guest, to prevent malicious branch target predictions from affecting the host
+  kernel.
+* restore all registers from the new stack previously pushed in reverse order.
+
+The main steps of the c function :c:func:`__switch_to()` which the assembly code
+jumps to is as follows:
+
+* retrieve the thread :c:type:`struct thread_struct <thread_struct>` and fpu
+  :c:type:`struct fpu <fpu>` structs from the next and previous tasks.
+* gets the current cpu TSS :c:type:`struct tss_struct <tss_struct>`
+* save the current FPU state while on the old task.
+* store the FS and GS segment registers before changing the thread local storage.
+* reload the GDT for the new tasks TLS.
+* save the ES and DS segments of the previous task and load the same from the
+  nest task.
+* load the FS and GS segment registers.
+* update the current task of the cpu.
+* update the top of stack pointer for the CPU for entry trampoline.
+* initialize FPU state for next task.
+* set sp0 to point to the entry trampoline stack.
+* call :c:func:`_switch_to_xtra()` to  handles debug registers, i/o bitmaps and
+  speculation mitigation.
+* Writes the task's CLOSid/RMID to IA32_PQR_MSR.
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ