lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4819E316.7000607@krogh.cc>
Date:	Thu, 01 May 2008 17:34:46 +0200
From:	Jesper Krogh <jesper@...gh.cc>
To:	linux-kernel@...r.kernel.org
Subject: Many open/close on same files yeilds "No such file or directory".

Hi list.

I have a "fairly" reproducible problem. When a program opens and closes
the same file many times, it eventually ends up with a "no such file or
directory". Test program that can reproduce the problem on my setup:

root@...t:~# cat test-file-c.c
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    unsigned long i=0;
    int fh;
    char *filename;

    filename=argv[1];

    while(1) {
       fh=open(filename, O_RDONLY);
       if (fh==-1) {
          printf("Failed to open %s\n", filename);
          printf("Open number: %ld\n",i);
          exit(10);
       }
       close(fh);
       i++;
    }

    exit(0);
}
root@...t:~# ./test-file-c /z/bio/databases/online/index/index-by-accno
Failed to open /z/bio/databases/online/index/index-by-accno
Open number: 61785000
root@...t:~# ./test-file-c /z/bio/databases/online/index/index-by-accno
Failed to open /z/bio/databases/online/index/index-by-accno
Open number: 120929685
(The problem is not isolate to a single file on the filesystem).


strace on the program reviel that the system indeed return a "No such
file or directory" to the program.

This is run on an Ubuntu Gutsy (vendor kernel): 2.6.22-14-server on an
4.5TB ext3 filesystem on an LVM volume. The volume was created on a
dapper (2 releases back) and has just followed with during upgrades.
I cannot reproduce it on other disks attached to the same server or on
other servers attached to similar disksystems.

The filesystem was taken offline yesterday for a forced fsck and it was
found to be clean.

The diskarray is a quite old Fibrenetix FX1200 with 12xPATA disk
in raid5 (with hotspare) exposed to the OS as 3 SCSI-disks of
2+2+0.5TB assembled with LVM afterwards. The SCSI-controller is a:
05:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev c1)

What suggestions do you have to solve this problem?

I'm about to mkfs.ext3 the volume and spool it back in from the backup,
but somehow I'm not convinced that it will solve the problem at all.
It may just be a hardware problem, but dmesg doesnt tell anything.

We actually got the problem from a perl-script, but this seems to be the
minimal program that reproduces the problem.

Jesper
-- 
Jesper


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ