linux-kernel - Re: system stops sending IOs for long time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <472792CC.7090009@anagramm.de>
Date:	Tue, 30 Oct 2007 21:23:40 +0100
From:	Clemens Koller <clemens.koller@...gramm.de>
To:	saeed bishara <saeed.bishara@...il.com>
CC:	linux-kernel@...r.kernel.org
Subject: Re: system stops sending IOs for long time

saeed bishara schrieb:
 > Hi,
 > I have a performance issue with my system that runs samba server, once
 > in a while (6 seconds) I see that the throughput falls almost to to
 > zero when doing read for large file. I'm running kernel 2.6.22.10. I
 > don't see this issue when using kernel 2.6.12.
 >
 > when observing the activity leds on my sata controller, I see that it
 > get off for about 1 second meaning no commands sent to the HDD for
 > that long time. I used the blktrace in order to have a look into the
 > IOs, and found that the system really stops sending commands for long
 > periods, here is the section of the blkparse that shows this problem:
 >   8,2    0    12747     5.550000000   328  D   R 108068834 + 512 [smbd]
 >   8,2    0    12748     5.550000000   328  D   R 108069346 + 512 [smbd]
 >   8,2    0    12749     5.550000000   328  D   R 108069858 + 512 [smbd]
 >   8,2    0    12750     5.550000000   328  D   R 108070370 + 512 [smbd]
 >   8,2    0    12751     5.550000000   328  U   N [smbd] 5
 >   8,2    0    12752     5.550000000     0  C   R 108068834 + 512 [0]
 >   8,2    0    12753     5.560000000     0  C   R 108069346 + 512 [0]
 >   8,2    0    12754     5.570000000     0  C   R 108069858 + 512 [0]
 >   8,2    0    12755     5.570000000     0  C   R 108070370 + 512 [0]
 >   8,2    0    12756     6.570000000     0  C   R 108066786 + 512 [0]
 >   8,2    0    12757     6.590000000   328  Q   R 108070882 + 8 [smbd]
 >   8,2    0    12758     6.590000000   328  G   R 108070882 + 8 [smbd]
 >   8,2    0    12759     6.590000000   328  P   N [smbd]
 >   8,2    0    12760     6.590000000   328  I   R 108070882 + 8 [smbd]
 >   8,2    0    12761     6.590000000   328  Q   R 108070890 + 8 [smbd]
 >   8,2    0    12762     6.590000000   328  M   R 108070890 + 8 [smbd]
 >
 > any idea how to fix this issue?


Not really.

I've had a similar looking problem with a md raid-1 set of harddisks
running 2.6.18. md0_raid1 took every 10 seconds all of the cpu blocking other
stuff on the machine for some other 2 secons doing no disk writes.

$ uname -a
Linux rio 2.6.22.9 #1 Mon Oct 1 19:48:54 CEST 2007 i686 pentium3 i386 GNU/Linux

$ lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03)
00:04.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:04.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:04.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:04.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
00:0b.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:0b.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:0b.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51)
00:0c.0 Mass storage controller: Promise Technology, Inc. 20269 (rev 02)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G100 [Productiva] AGP (rev 01)

The raid is on the Promise PDC20269

$ mdadm --misc --detail /dev/md0
/dev/md0:
         Version : 00.90.03
   Creation Time : Sat Oct 29 20:51:41 2005
      Raid Level : raid1
      Array Size : 293049600 (279.47 GiB 300.08 GB)
   Used Dev Size : 293049600 (279.47 GiB 300.08 GB)
    Raid Devices : 2
   Total Devices : 2
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Tue Oct 30 19:20:12 2007
           State : clean
  Active Devices : 2
Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0

            UUID : b2d7f60d:008fa0df:136207f1:ae0e7a9e
          Events : 0.432306

     Number   Major   Minor   RaidDevice State
        0      33        1        0      active sync   /dev/hde1
        1      34        1        1      active sync   /dev/hdg1

I didn't dare to report a bug against the old kernel. I moved to
2.6.22.9 and the problem disappeared...
Any ways to debug that (non-intrusive on a production machine)?

Regards,

Clemens Koller
__________________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/