lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 07 May 2008 22:51:59 +0200
From:	Jesper Krogh <jesper@...gh.cc>
To:	Ray Lee <ray-lk@...rabbit.org>
CC:	"Randy.Dunlap" <rdunlap@...otime.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Many open/close on same files yeilds "No such file or directory".

Ray Lee wrote:
> On Mon, May 5, 2008 at 11:29 AM, Jesper Krogh <jesper@...gh.cc> wrote:
>>> I'd been meaning to ask what the topology was. External, eh? Are you
>>> sure the enclosure, cabling, and card/connectors are all good? Have
>>> you tried swapping out cables?
>>>
>>  It is new SCSI-controller, new cable and new terminator put onto it. But
>>  (just enlighten me), if I had problems at this level I'd expect the
>>  serverlog to be full of SCSI/FS-related errors and not just a single
>>  syscall, that doesn't even touch the array due to caching, to be
>>  failing.
> 
> Borderline hardware does not always create logged errors.

Ok. I think this _really_ point to a kernel problem.
(or just some broken hardware from Sun in multiple copies)

> If I understood you correctly earlier, identical hardware on another
> system does not show the error. That, quite honestly, rules out the
> software.

Now I've moved the data to fresh ext3 filesystems on a storage-array
based on iscsi. Mounted the filesystems to another, similar server and
I can still reproduce the problem.

Both servers are 16 cores. The problem wasn't there on a different 
server with only 2 cores. (or I didn't run into it).

The 3 setups above has both been tested with a 2.6.22-14-server and 
2.6.24-17-server (towards the iscsi volume).

Doing more testing show that I have 3 machines (all X4600, 16 cores/32GB 
ram that I can reproduce it on against different filesystem)

The more processes running on the system (accessing the FS volume), the
easier it seems to get into the problem.

> What's left, however unlikely, has to be the issue. And what's left is
> your scsi controller, the cable, and the external disk array.

Now I've removed all of them.. and still got the problem.

-- 
Jesper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ