[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <48FFD2E4.6080000@atmos.washington.edu>
Date: Wed, 22 Oct 2008 18:27:00 -0700
From: Harry Edmon <harry@...os.washington.edu>
To: Trond Myklebust <trond.myklebust@....uio.no>
CC: linux-kernel@...r.kernel.org
Subject: Re: SUNRPC problem with 2.6.26 and beyond - try again with response
in correct place.
Trond Myklebust wrote:
> On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
>
>> Trond Myklebust wrote:
>>
>>> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
>>>
>>>
>>>> I have a dual quad-core Xeon system running software
>>>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes
>>>> weather data through RPC calls, keeping a queue of data in a memory
>>>> mapped file. Up until 2.6.26 the system has run just fine (for example
>>>> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
>>>> into a problem after approximately 24 hours. The symptom is that the
>>>> processing slows down to a crawl. Using "top" I can see that the System
>>>> time is up over 90%, with almost no User and Wait time. If I stop and
>>>> restart the software, most of the time it gets better - but sometimes it
>>>> takes a reboot to fix the problem. I have an identical system that does
>>>> just processing and ingesting data from remote systems, and it does not
>>>> have this problem. I have tried a number of different kernel
>>>> configurations, but they all show the same problem.
>>>>
>>>> I suspect a problem with SUNRPC. I notice that there were a large
>>>> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
>>>> to pin down which patches are causing the problem. Are there ways to
>>>> figure where in the kernel the time is being spent? I am will to work
>>>> on isolating the problem, but I need some suggestions on the best way to
>>>> do it given the large number of SUNRPC patches in 2.6.26 and the fact
>>>> that each experiment takes a day.
>>>>
>>>>
>>> The kernel sunrpc interface is not exported to user land: the glibc code
>>> uses its own, entirely separate implementation of sunrpc.
>>>
>>> I cannot therefore see, how your application's RPC calls can be affected
>>> by kernel sunrpc changes.
>>>
>>> Cheers
>>> Trond
>>>
>>>
>>>
>> Then how do you explain the the large system time used with 2.6.26 and
>> beyond? Is it some other patch I should be looking at?
>>
>
> I'm not explaining it. I'm saying that nothing outside the kernel NFS
> and NLM code uses the kernel sunrpc implementation. Your userland RPC
> calls are using glibc's implementation of sunrpc. Those are unaffected
> by patches to the kernel sunrpc layer.
>
> If you are seeing a hang, then I suggest you start by using the strace
> utility to figure out which system call is actually involved.
>
> Cheers
> Trond
>
>
The problem is that it is not hanging. The processes are running
through a lot of systems calls. It is just that the system time jumps
up to over 95% on all 8 processors with 2.6.26 and beyond. I never see
that with 2.6.25.17. I will try looking again and see if there are
certain calls that are taking a lot of time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists