lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0811202035590.19962@ask.diku.dk>
Date:	Thu, 20 Nov 2008 20:48:42 +0100 (CET)
From:	Jesper Dangaard Brouer <hawk@...u.dk>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	David Miller <davem@...emloft.net>,
	Jesper Dangaard Brouer <jdb@...x.dk>,
	netdev <netdev@...r.kernel.org>, linux-kernel@...r.kernel.org,
	Robert Olsson <Robert.Olsson@...a.slu.se>
Subject: Regression: Bisected, IRQ and MSI allocations screwed without sparse
 irq

Hi Thomas Gleixner,

I have bisected a regression to your commit
3235e936c0cc3589309280b6f59e5096779adae3,
"x86: remove sparse irq from Kconfig".

Its actually not necessary your fault, as your commit simply removes
the config option HAVE_SPARSE_IRQ.  This revels the bug / regression
I'm exposted to.

Guess I should bisect again to find the exact faulty commit, but I'm
rather sick of bisecting at the moment, and though you might have a
better idea whats going wrong.  I would rather spend my time
performance tuning the multiqueue routing code...

[The regression]:

During my testing of the Sun Neptune based NICs.  On kernel 2.6.27 I
get really good performance (900-1200kpps) compared to 2.6.28 (davem
git net-2.6).

The cause of this problem (tracked down together with Robert Olsson)
is that on 2.6.28 I have a lot less IRQs available.  It seems max 34
IRQs.  Due the reduced number of IRQs the NIU driver cannot get
enough IRQs to the interfaces, and starts to use "IO-APIC" based
IRQs.

On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge".  BUT
my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with
the usb driver.  That my performance problem on 2.6.28.

[Other related bugs]:
  Is that unloading the "niu" driver will give a kernel BUG during
  deallocation og MSI interrupts. (See dmesg output below if interested)

(I have attached full bisect history)

Cheers,
   Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------


On Wed, 19 Nov 2008, David Miller wrote:
> From: Jesper Dangaard Brouer <hawk@...u.dk>
> Date: Wed, 19 Nov 2008 23:58:12 +0100 (CET)
>
>> Well that was not the real cause of the performance loss.  Because
>> on kernel 2.6.27 I get really good performance (900-1200kpps)
>> compared to 2.6.28 (git net-2.6).
>>
>> The cause of this problem (tracked down together with Robert Olsson)
>> is that on 2.6.28 I have a lot less IRQs available.  It seems max 34
>> IRQs.
>>
>> Due the reduced number of IRQs the NIU driver cannot get enough IRQs
>> to the interfaces, and starts to use "IO-APIC" based IRQs.
>
> This is almost certainly related to the driver unload bug.
>
> I know you ran into unbuildable/unbootable kernels during a bisect,
> but you really need to track down this regression.


------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:632!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: ehci_hcd bnx2 uhci_hcd zlib_inflate serio_raw hpilo 
niu(-)

Pid: 3036, comm: rmmod Not tainted (2.6.27-bisect #5) ProLiant DL380 G5
EIP: 0060:[<c021ecac>] EFLAGS: 00010286 CPU: 2
EIP is at msi_free_irqs+0xdc/0xe0
EAX: f6b8f860 EBX: 00000030 ECX: f7156ba8 EDX: c0420500
ESI: f7156800 EDI: f7156ba8 EBP: f6431eb4 ESP: f6431ea8
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rmmod (pid: 3036, ti=f6430000 task=f70f9b20 task.ti=f6430000)
Stack:
  f7156800 f670c400 f7156800 f6431ebc c021ecb8 f6431ec8 c021ef41 f670c000
  f6431edc f809d3f8 f7156800 f80a1ed4 f80a1ed4 f6431ee8 c0219c29 f7156858
  f6431ef8 c026b0d4 f7156858 f7156914 f6431f0c c026b197 f80a1ea0 f80a1ed4
Call Trace:
  [<c021ecb8>] ? msix_free_all_irqs+0x8/0x10
  [<c021ef41>] ? pci_disable_msix+0x31/0x40
  [<f809d3f8>] ? niu_pci_remove_one+0x88/0x8a [niu]
  [<c0219c29>] ? pci_device_remove+0x19/0x40
  [<c026b0d4>] ? __device_release_driver+0x54/0x80
  [<c026b197>] ? driver_detach+0x97/0xa0
  [<c026a475>] ? bus_remove_driver+0x75/0xa0
  [<c026b609>] ? driver_unregister+0x39/0x40
  [<c0219e51>] ? pci_unregister_driver+0x21/0x80
  [<f809a0ad>] ? niu_exit+0xd/0x10 [niu]
  [<c0145d74>] ? sys_delete_module+0x114/0x1d0
  [<c016810a>] ? remove_vma+0x3a/0x50
  [<c0168c29>] ? do_munmap+0x189/0x1e0
  [<c0103229>] ? sysenter_do_call+0x12/0x21
  [<c0330000>] ? quirk_disable_msi+0x30/0x50
Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b 
14 75 aa 8b 43 1c e8 3d 92 ef ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe 
55 89 e5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55
EIP: [<c021ecac>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f6431ea8
---[ end trace f72de2e283920207 ]---

View attachment "bisect_IO-APIC.txt" of type "TEXT/plain" (32509 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ