lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Sep 2017 13:21:47 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
        Marc Zyngier <marc.zyngier@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>, Chen Yu <yu.c.chen@...el.com>,
        Rui Zhang <rui.zhang@...el.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Len Brown <lenb@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Christoph Hellwig <hch@....de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joerg Roedel <joro@...tes.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Tony Luck <tony.luck@...el.com>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Alok Kataria <akataria@...are.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Arjan van de Ven <arjan@...ux.intel.com>
Subject: Re: [patch 00/52] x86: Rework the vector management

On 13/09/17 23:29, Thomas Gleixner wrote:
> Sorry for the large CC list, but this is a major surgery.
> 
> The vector management in x86 including the surrounding code is a
> conglomorate of ancient bits and pieces which have been subject to
> 'modernization' and featuritis over the years. The most obscure parts are
> the vector allocation mechanics, the cleanup vector handling and the cpu
> hotplug machinery. Replacing these pieces of art was on my todo list for a
> long time.
> 
> Recent attempts to 'solve' CPU offline / hibernation issues which are
> partially caused by the current vector management implementation made me
> look for real. Further information in this thread:
> 
>     http://lkml.kernel.org/r/cover.1504235838.git.yu.c.chen@intel.com
> 
> Aside of drivers allocating gazillion of interrupts, there are quite some
> things which can be addressed in the x86 vector management and in the core
> code.
> 
>   - Multi CPU affinities:
> 
>     A dubious property which is not available on all machines and causes
>     major complexity both in the allocator and the cleanup/hotplug
>     management. See:
> 
>        http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
> 
>   - Priority level spreading:
> 
>     An obscure and undocumented property which I think is sufficiently
>     argued to be not required in:
> 
>        http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos
> 
>   - Allocation of vectors when interrupt descriptors are allocated.
> 
>     This is a historical implementation detail, which is not really
>     required when the vector allocation is delayed up to the point when
>     request_irq() is invoked. This might make request_irq() fail, when the
>     vector space is exhausted, but drivers should handle request_irq()
>     fails anyway.
> 
>     The upside of changing this is that the active vector space becomes
>     smaller especially on hibernation/cpu offline when drivers shut down
>     queue interrupts of outgoing CPUs.
> 
>     Some of this is already addressed with the managed interrupt facility,
>     but that was bolted on top of the existing vector management because
>     proper integration was not possible at that point. I take the blame
>     for this, but the tradeoff of not doing it would have been more
>     broken driver boiler plate code all over the place. So I went for the
>     lesser of two evils.
> 
>   - Allocation of vectors on the wrong place
> 
>     Even for managed interrupts the vector allocation at descriptor
>     allocation happens on the wrong place and gets fixed after the fact
>     with a call to set_affinity(). In case of not remapped interrupts
>     this results in at least one interrupt on the wrong CPU before it is
>     migrated to the desired target.
> 
>   - Lack of instrumentation
>  
>     All of this is a black box which allows no insight into the actual
>     vector usage.
> 
> The series addresses these points and converts the x86 vector management to
> a bitmap based allocator which provides proper reservation management for
> 'managed interrupts' and best effort reservation for regular interrupts.
> The latter allows overcommitment, which 'fixes' some of hotplug/hibernation
> problems in a clean way. It can't fix all of them depending on the driver
> involved.
> 
> This rework is no excuse for driver writers to do exhaustive vector
> allocations instead of utilizing the managed interrupt infrastructure, but
> it addresses long standing issues in this code with the side effect of
> mitigating some of the driver oddities. The proper solution for multi queue
> management are 'managed interrupts' which has been proven in the block-mq
> work as they solve issues which are worked around in other drivers in
> creative ways with lots of copied code and often enough broken attempts to
> handle interrupt affinity and CPU hotplug problems.
> 
> The new bitmap allocator and the x86 vector management code are
> instrumented with tracepoints and the irq domain debugfs files allow deep
> insight into the vector allocation and reservations.
> 
> The patches work on machines with and without interrupt remapping and
> inside of KVM guests of various flavours, though I have no idea what I
> broke on the way with other hypervisors, posted interrupts etc. So I kindly
> ask for your support in testing and review.
> 
> The series applies on top of Linus tree and is available as git branch:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic
> 
> Note, that this branch is Linus tree plus scheduler and x86 fixes which I
> required to do proper testing. They have outstanding pull requests and
> might be merged already when you read this.
> 
> Thanks,
> 
> 	tglx
> ---
>  arch/x86/include/asm/x2apic.h              |   49 -
>  b/arch/x86/Kconfig                         |    1 
>  b/arch/x86/include/asm/apic.h              |  255 +-----
>  b/arch/x86/include/asm/desc.h              |    2 
>  b/arch/x86/include/asm/hw_irq.h            |    6 
>  b/arch/x86/include/asm/io_apic.h           |    2 
>  b/arch/x86/include/asm/irq.h               |    4 
>  b/arch/x86/include/asm/irq_vectors.h       |    8 
>  b/arch/x86/include/asm/irqdomain.h         |    5 
>  b/arch/x86/include/asm/kvm_host.h          |    2 
>  b/arch/x86/include/asm/trace/irq_vectors.h |  244 ++++++
>  b/arch/x86/kernel/apic/Makefile            |    2 
>  b/arch/x86/kernel/apic/apic.c              |   38 -
>  b/arch/x86/kernel/apic/apic_common.c       |   46 +
>  b/arch/x86/kernel/apic/apic_flat_64.c      |   10 
>  b/arch/x86/kernel/apic/apic_noop.c         |   25 
>  b/arch/x86/kernel/apic/apic_numachip.c     |   12 
>  b/arch/x86/kernel/apic/bigsmp_32.c         |    8 
>  b/arch/x86/kernel/apic/htirq.c             |    5 
>  b/arch/x86/kernel/apic/io_apic.c           |   94 --
>  b/arch/x86/kernel/apic/msi.c               |    5 
>  b/arch/x86/kernel/apic/probe_32.c          |   29 
>  b/arch/x86/kernel/apic/vector.c            | 1090 +++++++++++++++++------------
>  b/arch/x86/kernel/apic/x2apic.h            |    9 
>  b/arch/x86/kernel/apic/x2apic_cluster.c    |  196 +----
>  b/arch/x86/kernel/apic/x2apic_phys.c       |   44 +
>  b/arch/x86/kernel/apic/x2apic_uv_x.c       |   17 
>  b/arch/x86/kernel/i8259.c                  |    1 
>  b/arch/x86/kernel/idt.c                    |   12 
>  b/arch/x86/kernel/irq.c                    |  101 --
>  b/arch/x86/kernel/irqinit.c                |    1 
>  b/arch/x86/kernel/setup.c                  |   12 
>  b/arch/x86/kernel/smpboot.c                |   14 
>  b/arch/x86/kernel/traps.c                  |    2 
>  b/arch/x86/kernel/vsmp_64.c                |   19 
>  b/arch/x86/platform/uv/uv_irq.c            |    5 
>  b/arch/x86/xen/apic.c                      |    6 
>  b/drivers/gpio/gpio-xgene-sb.c             |    7 
>  b/drivers/iommu/amd_iommu.c                |   44 -
>  b/drivers/iommu/intel_irq_remapping.c      |   43 -
>  b/drivers/irqchip/irq-gic-v3-its.c         |    5 
>  b/drivers/pinctrl/stm32/pinctrl-stm32.c    |    5 
>  b/include/linux/irq.h                      |   22 
>  b/include/linux/irqdesc.h                  |    1 
>  b/include/linux/irqdomain.h                |   14 
>  b/include/linux/msi.h                      |    5 
>  b/include/trace/events/irq_matrix.h        |  201 +++++
>  b/kernel/irq/Kconfig                       |    3 
>  b/kernel/irq/Makefile                      |    1 
>  b/kernel/irq/autoprobe.c                   |    2 
>  b/kernel/irq/chip.c                        |   37 
>  b/kernel/irq/debugfs.c                     |   12 
>  b/kernel/irq/internals.h                   |   19 
>  b/kernel/irq/irqdesc.c                     |    3 
>  b/kernel/irq/irqdomain.c                   |   43 -
>  b/kernel/irq/manage.c                      |   18 
>  b/kernel/irq/matrix.c                      |  443 +++++++++++
>  b/kernel/irq/msi.c                         |   32 
>  58 files changed, 2133 insertions(+), 1208 deletions(-)

Complete series tested with paravirt + xen enabled 64 bit kernel:

bare metal boot okay
boot as Xen dom0 okay
boot as Xen pv-domU okay
boot as Xen HVM-domU with PV-drivers okay
Vcpu onlining/offlining in pv-domU okay

So you can add my:

Tested-by: Juergen Gross <jgross@...e.com>
Acked-by: Juergen Gross <jgross@...e.com>


Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ