lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230530214550.864894-1-rrendec@redhat.com>
Date:   Tue, 30 May 2023 17:45:45 -0400
From:   Radu Rendec <rrendec@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     Marc Zyngier <maz@...nel.org>, Thomas Gleixner <tglx@...utronix.de>
Subject: [RFC PATCH 0/5] irq: sysfs interface improvements for SMP affinity control

This patch set implements new sysfs interfaces that facilitate SMP
affinity control of chained interrupts. It follows the guidelines in
https://lore.kernel.org/all/87fslr7ygl.wl-maz@kernel.org/ with slight
deviations, which are explained below.

The assumption is that irqbalance must be aware of the chained interrupt
topology regardless of how it is exposed to userspace, for the following
reasons:
- Interrupt counters are not updated for the parent interrupt. Counters
  must be read separately for each of the chained interrupts and summed
  up to assess the CPU usage impact of the group as a whole.
- The affinity setting is shared by all multiplexed interrupts (and the
  parent interrupt) and cannot be changed individually.

Since irqbalance must be aware of the topology anyway, it is easier to
move parts of the problem there and reduce the complexity of the kernel
changes.
- Instead of creating a new affinity interface for chained interrupts
  that has different semantics from the existing procfs interface (and
  changes the affinity of the parent interrupt in the case of muxed
  interrupts), it is easier to let irqbalance set the affinity of the
  parent interrupt by itself (since it already knows who the parent is).
- Tracking groups of interrupts in the kernel creates additional
  synchronization challenges that are otherwise unnecessary. The kernel
  already has a (struct irq_desc).parent_irq field that can be (re)used
  for this purpose (see below).

Brief description of the patches in this set:
- Patch 1/5 makes the (struct irq_desc).parent_irq field available
  unconditionally. So far, it has been used for IRQ-resend and depended
  on CONFIG_HARDIRQS_SW_RESEND. But it can be (re)used to track chained
  interrupt parents for the general use case, without any changes to the
  existing IRQ chip drivers.
- Patch 2/5 is trivial and just exposes (struct irq_desc).parent_irq in
  debugfs.
- Patch 3/5 exposes the chained interrupt topology in sysfs in two ways:
  the muxed_irqs directory (as described in the original email thread)
  and the parent_irq symlink. From a userspace perspective, they are
  redundant. However, in the first case the synchronization is likely
  incomplete/broken and not so easy to fix.
- Patch 4/5 moves the SMP affinity write handlers from procfs code to
  generic code, with the intention to reuse them for a new sysfs
  interface.
- Patch 5/5 creates a sysfs interface for the affinity, with identical
  semantics to the existing procfs interface. The sole purpose is to
  allow userspace (irqbalance) to control the affinity of the parent
  interrupt, which is typically *not* visible in procfs.

The only required change to existing chained IRQ chip drivers in order
to support the new affinity control is to call irq_set_parent() in their
.map domain op. If they use the newer hierarchical API, they should call
irq_set_parent() in their .alloc domain op instead. This doesn't affect
the existing procfs based affinity interface in any way.

A few IRQ chip drivers already call irq_set_parent() in their .map
domain op to implement IRQ-resend. No change is required to those
drivers to support the new affinity control.

Last but not least, it turns out that hierarchical domains are entirely
out of the scope of these changes (unless chained interrupts are used
along the path). In the case of hierarchical domains, each interrupt in
the outermost domain has a *single* corresponding Linux virq (that is
mapped to each domain in the hierarchy). That makes it perfectly safe to
implement the .irq_set_affinity chip op as irq_chip_set_affinity_parent
and delegate affinity control to the parent chip/domain. This will *not*
suddenly change the affinity of a different interrupt behind anyone's
back simply because there cannot be another interrupt that shares the
same affinity setting.

Note: I still need to update the Documentation/ directory for the new
      sysfs interface, and I will address that in a future version.
      At this point, I just want to get feedback about the current
      approach.

Radu Rendec (5):
  irq: Always enable parent interrupt tracking
  irq: Show the parent chained interrupt in debugfs
  irq: Expose chained interrupt parents in sysfs
  irq: Move SMP affinity write handler out of proc.c
  irq: Add smp_affinity/list attributes to sysfs

 include/linux/irq.h     |   9 +-
 include/linux/irqdesc.h |   1 +
 kernel/irq/debugfs.c    |   1 +
 kernel/irq/internals.h  |  10 ++
 kernel/irq/irqdesc.c    | 206 +++++++++++++++++++++++++++++++++++++---
 kernel/irq/irqdomain.c  |  15 +++
 kernel/irq/manage.c     |  20 +++-
 kernel/irq/proc.c       |  72 +-------------
 8 files changed, 244 insertions(+), 90 deletions(-)

-- 
2.40.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ