lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Dec 2020 21:12:26 +0100
From:   "Enrico Weigelt, metux IT consult" <lkml@...ux.net>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Enrico Weigelt, metux IT consult" <info@...ux.net>,
        linux-kernel@...r.kernel.org
Cc:     James.Bottomley@...senPartnership.com, deller@....de,
        benh@...nel.crashing.org, paulus@...ba.org, jdike@...toit.com,
        richard@....at, anton.ivanov@...bridgegreys.com, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        linux-s390@...r.kernel.org, linux-um@...ts.infradead.org
Subject: Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings

On 09.12.20 00:01, Thomas Gleixner wrote:

> There are a few situations why it is invoked or not:
> 
>   1) The original x86 usage is not longer using it because it complains
>      rightfully about a vector being raised which has no interrupt
>      descriptor associated to it. So the original reason for naming it
>      vector is gone long ago. It emits:
> 
>      pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>                           __func__, smp_processor_id(), vector);
> 
>      Directly from the x86 C entry point without ever invoking that
>      function.  Pretty popular error message due to some AMD BIOS
>      wreckage. :)

Of course, the term "vector" should be replaced by something like
"irqnr" or "virq", but I didn't have name changes within scope - just
wanted to fix the printing of that number, as i've stupled over it while
working on something different and wondered why the number differed from
what I had expected, until I seen that it prints hex instead of decimal.

But if you prefer a more complete cleanup, I'll be happy to do it.

>   3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
>      handed in by the caller does not resolve to a mapped Linux
>      interrupt which is pretty much the same as the x86 situation above
>      in #1, but it prints useless data.
> 
>      It prints 'irq' which is invalid but it does not print the really
>      interesting 'hwirq' which was handed in by the caller and did
>      not resolve.

I wouldn't say the irq-nr isn't interesting. In my particular case it
was quite what I've been looking for. But you're right, hwirq should
also be printed.

>      In this case the Linux irq number is uninteresting as it is known
>      to be invalid and simply is not mapped and therefore does not
>      exist.

In my case it came in from generic_handle_irq(), and in this case this
irq number (IMHO) has been valid, but nobody handled it, so it went to
ack_bad_irq.

Of course, if this function is meant as a fallback to ack some not
otherwise handled IRQ on the hw, the linux irq number indeed isn't quite
helpful (unless we expect that code to do a lookup to the hw irq).

... rethinking this further ... shouldn't we also pass in even more data
(eg. irq_desc, irqchip, ...), so this function can check which hw to
actually talk to ?

>   4) It's invoked from the dummy irq chip which is installed for a
>      couple of truly virtual interrupts where the invocation of
>      dummy_irq_chip::irq_ack() is indicating wreckage.
> 
>      In that case the Linux irq number is the thing which is printed.
> 
> So no. It's not just inconsistent it's in some places outright
> wrong. What we really want is:
> 
> ack_bad_irq(int hwirq, int virq)

is 'int' correct here ?

BTW: I also wonder why the virq is unsigned int, while hwirq (eg. in
struct irq_data) is unsigned long. shouldn't the virtual number space
be at least as big (or even bigger) than the hw one ?

 {
>         if (hwirq >= 0)
>            print_useful_info(hwirq);
>         if (virq > 0)
>            print_useful_info(virq);
>         arch_try_to_ack(hwirq, virq);
> }
>     
> for this to make sense. Just fixing the existing printk() to be less
> wrong is not really an improvement.

Okay, makes sense.

OTOH: since both callers (dummychip.c, handle.c) already dump out before
ack_bad_irq(), do we need to print out anything at all ?

I've also seen that many archs increase a counter (some use long, others
atomic_t) - should we also consolidate this in an arch-independent way
in handle.c (or does kstat_incr_irqs_this_cpu already do this) ?

--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@...ux.net -- +49-151-27565287

Powered by blists - more mailing lists