[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4822166F.50002@krogh.cc>
Date: Wed, 07 May 2008 22:51:59 +0200
From: Jesper Krogh <jesper@...gh.cc>
To: Ray Lee <ray-lk@...rabbit.org>
CC: "Randy.Dunlap" <rdunlap@...otime.net>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: Many open/close on same files yeilds "No such file or directory".
Ray Lee wrote:
> On Mon, May 5, 2008 at 11:29 AM, Jesper Krogh <jesper@...gh.cc> wrote:
>>> I'd been meaning to ask what the topology was. External, eh? Are you
>>> sure the enclosure, cabling, and card/connectors are all good? Have
>>> you tried swapping out cables?
>>>
>> It is new SCSI-controller, new cable and new terminator put onto it. But
>> (just enlighten me), if I had problems at this level I'd expect the
>> serverlog to be full of SCSI/FS-related errors and not just a single
>> syscall, that doesn't even touch the array due to caching, to be
>> failing.
>
> Borderline hardware does not always create logged errors.
Ok. I think this _really_ point to a kernel problem.
(or just some broken hardware from Sun in multiple copies)
> If I understood you correctly earlier, identical hardware on another
> system does not show the error. That, quite honestly, rules out the
> software.
Now I've moved the data to fresh ext3 filesystems on a storage-array
based on iscsi. Mounted the filesystems to another, similar server and
I can still reproduce the problem.
Both servers are 16 cores. The problem wasn't there on a different
server with only 2 cores. (or I didn't run into it).
The 3 setups above has both been tested with a 2.6.22-14-server and
2.6.24-17-server (towards the iscsi volume).
Doing more testing show that I have 3 machines (all X4600, 16 cores/32GB
ram that I can reproduce it on against different filesystem)
The more processes running on the system (accessing the FS volume), the
easier it seems to get into the problem.
> What's left, however unlikely, has to be the issue. And what's left is
> your scsi controller, the cable, and the external disk array.
Now I've removed all of them.. and still got the problem.
--
Jesper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists