linux-kernel - RE: Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <417fa728c3ff4418ac87e0d409b5a084@hisilicon.com>
Date:   Thu, 11 Feb 2021 11:01:39 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Finn Thain <fthain@...egraphics.com.au>
CC:     tanxiaofei <tanxiaofei@...wei.com>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linuxarm@...neuler.org" <linuxarm@...neuler.org>,
        "linux-m68k@...r.kernel.org" <linux-m68k@...r.kernel.org>
Subject: RE: Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI
 drivers

> >
> > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> >
> > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > >
> > > > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:
> > > > > >
> > > > > > > > There is no warning from m68k builds. That's because
> > > > > > > > arch_irqs_disabled() returns true when the IPL is non-zero.
> > > > > > >
> > > > > > > So for m68k, the case is arch_irqs_disabled() is true, but
> > > > > > > interrupts can still come?
> > > > > > >
> > > > > > > Then it seems it is very confusing. If prioritized interrupts
> > > > > > > can still come while arch_irqs_disabled() is true,
> > > > > >
> > > > > > Yes, on m68k CPUs, an IRQ having a priority level higher than the
> > > > > > present priority mask will get serviced.
> > > > > >
> > > > > > Non-Maskable Interrupt (NMI) is not subject to this rule and gets
> > > > > > serviced regardless.
> > > > > >
> > > > > > > how could spin_lock_irqsave() block the prioritized interrupts?
> > > > > >
> > > > > > It raises the the mask level to 7. Again, please see
> > > > > > arch/m68k/include/asm/irqflags.h
> > > > >
> > > > > Hi Finn,
> > > > > Thanks for your explanation again.
> > > > >
> > > > > TBH, that is why m68k is so confusing. irqs_disabled() on m68k
> > > > > should just reflect the status of all interrupts have been disabled
> > > > > except NMI.
> > > > >
> > > > > irqs_disabled() should be consistent with the calling of APIs such
> > > > > as local_irq_disable, local_irq_save, spin_lock_irqsave etc.
> > > > >
> > > >
> > > > When irqs_disabled() returns true, we cannot infer that
> > > > arch_local_irq_disable() was called. But I have not yet found driver
> > > > code or core kernel code attempting that inference.
> > > >
> > > > > >
> > > > > > > Isn't arch_irqs_disabled() a status reflection of irq disable
> > > > > > > API?
> > > > > > >
> > > > > >
> > > > > > Why not?
> > > > >
> > > > > If so, arch_irqs_disabled() should mean all interrupts have been
> > > > > masked except NMI as NMI is unmaskable.
> > > > >
> > > >
> > > > Can you support that claim with a reference to core kernel code or
> > > > documentation? (If some arch code agrees with you, that's neither here
> > > > nor there.)
> > >
> > > I think those links I share you have supported this. Just you don't
> > > believe :-)
> > >
> >
> > Your links show that the distinction between fast and slow handlers was
> > removed. Your links don't support your claim that "arch_irqs_disabled()
> > should mean all interrupts have been masked". Where is the code that makes
> > that inference? Where is the documentation that supports your claim?
> 
> (1)
> https://lwn.net/Articles/380931/
> Looking at all these worries, one might well wonder if a system which *disabled
> interrupts for all handlers* would function well at all. So it is interesting
> to note one thing: any system which has the lockdep locking checker enabled
> has been running all handlers that way for some years now. Many developers
> and testers run lockdep-enabled kernels, and they are available for some of
> the more adventurous distributions (Rawhide, for example) as well. So we
> have quite a bit of test coverage for this mode of operation already.
> 
> (2)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=b738a50a
> 
> "We run all handlers *with interrupts disabled* and expect them not to
> enable them. Warn when we catch one who does."
> 
> (3)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=e58aa3d2d0cc
> genirq: Run irq handlers *with interrupts disabled*
> 
> Running interrupt handlers with interrupts enabled can cause stack
> overflows. That has been observed with multiqueue NICs delivering all
> their interrupts to a single core. We might band aid that somehow by
> checking the interrupt stacks, but the real safe fix is to *run the irq
> handlers with interrupts disabled*.
> 
> 
> All these documents say we are running irq handler with interrupts
> disabled. but it seems you think high-prio interrupts don't belong
> to "interrupts" in those documents :-)
> 
> that is why we can't get agreement. I think "interrupts" mean
> all except NMI in these documents, but you insist high-prio IRQ
> is an exception.
> 
> >
> > > >
> > > > > >
> > > > > > Are all interrupts (including NMI) masked whenever
> > > > > > arch_irqs_disabled() returns true on your platforms?
> > > > >
> > > > > On my platform, once irqs_disabled() is true, all interrupts are
> > > > > masked except NMI. NMI just ignore spin_lock_irqsave or
> > > > > local_irq_disable.
> > > > >
> > > > > On ARM64, we also have high-priority interrupts, but they are
> > > > > running as PESUDO_NMI:
> > > > > https://lwn.net/Articles/755906/
> > > > >
> > > >
> > > > A glance at the ARM GIC specification suggests that your hardware
> > > > works much like 68000 hardware.
> > > >
> > > >    When enabled, a CPU interface takes the highest priority pending
> > > >    interrupt for its connected processor and determines whether the
> > > >    interrupt has sufficient priority for it to signal the interrupt
> > > >    request to the processor. [...]
> > > >
> > > >    When the processor acknowledges the interrupt at the CPU interface,
> > > >    the Distributor changes the status of the interrupt from pending to
> > > >    either active, or active and pending. At this point the CPU
> > > >    interface can signal another interrupt to the processor, to preempt
> > > >    interrupts that are active on the processor. If there is no pending
> > > >    interrupt with sufficient priority for signaling to the processor,
> > > >    the interface deasserts the interrupt request signal to the
> > > >    processor.
> > > >
> > > > https://developer.arm.com/documentation/ihi0048/b/
> > > >
> > > > Have you considered that Linux/arm might benefit if it could fully
> > > > exploit hardware features already available, such as the interrupt
> > > > priority masking feature in the GIC in existing arm systems?
> > >
> > > I guess no:-) there are only two levels: IRQ and NMI. Injecting a
> > > high-prio IRQ level between them makes no sense.
> > >
> > > To me, arm64's design is quite clear and has no any confusion.
> > >
> >
> > Are you saying that the ARM64 hardware design is confusing because it
> > implements a priority mask, and that's why you had to simplify it with a
> > pseudo-nmi scheme in software?
> 
> No, I was not saying this. I think both m68k and arm64 have good hardware
> design. Just Linux's implementation is running irq-handlers with interrupts
> disabled. So ARM64's pseudo-nmi is adapted to Linux better.
> 
> >
> > > >
> > > > > On m68k, it seems you mean：
> > > > > irq_disabled() is true, but high-priority interrupts can still come;
> > > > > local_irq_disable() can disable high-priority interrupts, and at that
> > > > > time, irq_disabled() is also true.
> > > > >
> > > > > TBH, this is wrong and confusing on m68k.
> > > > >
> > > >
> > > > Like you, I was surprised when I learned about it. But that doesn't mean
> > > > it's wrong. The fact that it works should tell you something.
> > > >
> > >
> > > The fact is that m68k lets arch_irq_disabled() return true to pretend
> > > all IRQs are disabled while high-priority IRQ is still open, thus "pass"
> > > all sanitizing check in genirq and kernel core.
> > >
> >
> > The fact is that m68k has arch_irq_disabled() return false when all IRQs
> > are enabled. So there is no bug.
> 
> But it has arch_irq_disabled() return true while some interrupts(not NMI)
> are still open.
> 
> >
> > > > Things could always be made simpler. But discarding features isn't
> > > > necessarily an improvement.
> > >
> > > This feature could be used by calling local_irq_enable_in_hardirq() in
> > > those IRQ handlers who hope high-priority interrupts to preempt it for a
> > > while.
> > >
> >
> > So, if one handler is sensitive to interrupt latency, all other handlers
> > should be modified? I don't think that's workable.
> 
> I think we just enable preempt_rt or force threaded_irq, and then improve
> the priority of the irq thread who is sensitive to latency. No need to
> touch all threads.
> 
> I also understand your point, we let one high-prio interrupt preempt
> low priority interrupt, then we don't need to change the whole system.
> But I think Linux prefers the method of threaded_irq or preempt_rt
> for this kind of problems.
> 
> >
> > In anycase, what you're describing is a completely different nested
> > interrupt scheme that would defeat the priority level mechanism that the
> > hardware provides us with.
> 
> Yes. Indeed.
> 
> >
> > > It shouldn't hide somewhere and make confusion.
> > >
> >
> > The problem is hiding so well that no-one has found it! I say it doesn't
> > exist.
> 
> Long long ago(before 2.6.38), we had a kernel supporting IRQF_DISABLED and
> nested interrupts were widely supported, but system also ran well in most
> cases. That means nested interrupts don't really matter in most cases.
> That is why m68k is also running well even though it is still nesting.
> 
> >
> > > On the other hand, those who care about realtime should use threaded IRQ
> > > and let IRQ threads preempt each other.
> > >
> >
> > Yes. And those threads also have priority levels.
> 
> Finn, I am not a m68k guy, would you help check if this could activate a
> warning on m68k. maybe we can discuss this question in genirq maillist from
> this warning if you are happy. Thanks very much.
> 
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index 7c9d6a2d7e90..b8ca27555c76 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
>   */
>  #define __irq_enter()                                  \
>         do {                                            \
> +               WARN_ONCE(in_hardirq() && irqs_disabled(), "nested
> interrupts\n"); \
>                 preempt_count_add(HARDIRQ_OFFSET);      \
>                 lockdep_hardirq_enter();                \
>                 account_hardirq_enter(current);         \
> @@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
>   */
>  #define __irq_enter_raw()                              \
>         do {                                            \
> +               WARN_ONCE(in_hardirq() && irqs_disabled(), "nested
> interrupts\n"); \
>                 preempt_count_add(HARDIRQ_OFFSET);      \
>                 lockdep_hardirq_enter();                \
>         } while (0)
> 

I am not a m68k guy, here I can show you some code in cortex-m,
which supports full interrupt priority preemption in hardware,
but its arch code just disables it to satisfy Linux:

arch/arm/kernel/entry-header.S

.macro	v7m_exception_entry
	...

	@ Linux expects to have irqs off. Do it here before taking stack space
	cpsid	i

arch/arm/kernel/entry-v7m.S

__irq_entry:
	v7m_exception_entry

	@
	@ Invoke the IRQ handler
	@
	...
	@ routine called with r0 = irq number, r1 = struct pt_regs *
	bl	nvic_handle_irq

	pop	{lr}


Another example in arch/arc/kernel/entry-arcv2.S:

ENTRY(handle_interrupt)

	INTERRUPT_PROLOGUE

	# irq control APIs local_irq_save/restore/disable/enable fiddle with
	# global interrupt enable bits in STATUS32 (.IE for 1 prio, .E[] for 2 prio)
	# However a taken interrupt doesn't clear these bits. Thus irqs_disabled()
	# query in hard ISR path would return false (since .IE is set) which would
	# trips genirq interrupt handling asserts.
	#
	# So do a "soft" disable of interrutps here.
	#
	# Note this disable is only for consistent book-keeping as further interrupts
	# will be disabled anyways even w/o this. Hardware tracks active interrupts
	# seperately in AUX_IRQ_ACT.active and will not take new interrupts
	# unless this one returns (or higher prio becomes pending in 2-prio scheme)

	IRQ_DISABLE

	; icause is banked: one per priority level
	; so a higher prio interrupt taken here won't clobber prev prio icause
	lr  r0, [ICAUSE]
	mov   blink, ret_from_exception

	b.d  arch_do_IRQ
	mov r1, sp


Actually in m68k, I also saw its IRQ entry disabled interrupts by
' move	#0x2700,%sr		/* disable intrs */'

arch/m68k/include/asm/entry.h:

.macro SAVE_ALL_SYS
	move	#0x2700,%sr		/* disable intrs */
	btst	#5,%sp@(2)		/* from user? */
	bnes	6f			/* no, skip */
	movel	%sp,sw_usp		/* save user sp */
...

.macro SAVE_ALL_INT
	SAVE_ALL_SYS
	moveq	#-1,%d0			/* not system call entry */
	movel	%d0,%sp@(PT_OFF_ORIG_D0)
.endm

arch/m68k/kernel/entry.S:

/* This is the main interrupt handler for autovector interrupts */

ENTRY(auto_inthandler)
	SAVE_ALL_INT
	GET_CURRENT(%d0)
					|  put exception # in d0
	bfextu	%sp@(PT_OFF_FORMATVEC){#4,#10},%d0
	subw	#VEC_SPUR,%d0

	movel	%sp,%sp@-
	movel	%d0,%sp@-		|  put vector # on stack
auto_irqhandler_fixup = . + 2
	jsr	do_IRQ			|  process the IRQ
	addql	#8,%sp			|  pop parameters off stack
	jra	ret_from_exception

So my question is that " move	#0x2700,%sr" is actually disabling
all interrupts? And is m68k actually running irq handlers
with interrupts disabled?

Best Regards
Barry