lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87czoacrfr.ffs@tglx>
Date:   Tue, 12 Oct 2021 15:18:16 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mark Rutland <mark.rutland@....com>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
        linux-mips@...r.kernel.org
Subject: Re: [GIT PULL] arm64 fixes for 5.15-rc5

Linus,

On Mon, Oct 11 2021 at 12:54, Linus Torvalds wrote:
> On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@....com> wrote:
> And so the reason I really hate that patch is that it introduces a new
> "different architectures randomly and inexplicably do different
> things, and the generic behavior is very different on arm64 than it is
> elsewhere".
>
> That's just the worst kind of hack to me.
>
> And in this case, it's really *horribly* hard to see what the call
> chain is. It all ends up being actively obfuscated and obscured
> through that 'handle_arch_irq' function pointer, that is sometimes set
> through set_handle_irq(), and sometimes set directly.
>
> I really think that if the rule is "we can't do accounting in
> handle_domain_irq(), because it's too late for arm64", then the fix
> really should be to just not do that.
>
> Move the irq_enter()/irq_exit() to the callers - quite possibly far up
> the call chain to the root of it all, and just say "architecture code
> needs to do this in the low-level code before calling
> handle_arch_irq".

That's where it belongs. It's mandatory to have it there for NOHZ_FULL
to work correctly vs. instrumentation etc. I've pointed that out back
then after we fed the X86 entry code into the mincer and added noinstr
sections to keep tracers, BPF and kprobes away from it.

Looking at the architectures which "support" that by selecting
HAVE_CONTEXT_TRACKING:

arch/arm/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/arm64/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/csky/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/mips/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/powerpc/Kconfig:	select HAVE_CONTEXT_TRACKING		if PPC64
arch/riscv/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/sparc/Kconfig:	select HAVE_CONTEXT_TRACKING
arch/x86/Kconfig:	select HAVE_CONTEXT_TRACKING		if X86_64

S390 and X86 are (mostly) complete and use the generic entry code. S390
does not even select HAVE_CONTEXT_TRACKING!

PPC64 has done quite some work to fix that, but it looks not yet complete. 

Mark is working on ARM64.

There is some effort underway to convert MIPS over to generic entry.

The rest needs all the fundamental architecture side changes.

> Anyway, it _looks_ to me like the pattern is very simple:
>
> Step 1:
>  - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers.
>
> This clearly doesn't change anything at all, but also doesn't fix the
> problem you have. But it's easy to verify that the code is the same
> before-and-after.
>
> Step 2 is the pattern matching step:
>
>  - if the caller of handle_domain_irq() ends up being a function that
> is registered with set_handle_irq(), then we
>    (a) remove the irq_enter/irq_exit from it
>    (b) add it to the architectures that use handle_arch_irq.
>    (c) make sure that if there are other callers of it (not through
> handle_arch_irq) we move that irq_enter/irq_exit into them too
>
> I _suspect_ - but didn't check - that Step 2(c) doesn't actually
> exist. But who knows.

It only exists with chained handlers, but they do not need that at all
because:

        irq_enter()
        arch_handle_irq()
          handle_domain_irq()
            chained_handler()
              handle_domain_irq()

which is still the same interrupt context and not a nested interrupt.

> It really looks like there is a very tight connection between "uses
> handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No?

Looks like. That might conflict with the MIPS rework though. I don't
know how far that came already. Cc'ed the MIPS people.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ