lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 22 Oct 2008 18:27:00 -0700 From: Harry Edmon <harry@...os.washington.edu> To: Trond Myklebust <trond.myklebust@....uio.no> CC: linux-kernel@...r.kernel.org Subject: Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place. Trond Myklebust wrote: > On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote: > >> Trond Myklebust wrote: >> >>> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote: >>> >>> >>>> I have a dual quad-core Xeon system running software >>>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes >>>> weather data through RPC calls, keeping a queue of data in a memory >>>> mapped file. Up until 2.6.26 the system has run just fine (for example >>>> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs >>>> into a problem after approximately 24 hours. The symptom is that the >>>> processing slows down to a crawl. Using "top" I can see that the System >>>> time is up over 90%, with almost no User and Wait time. If I stop and >>>> restart the software, most of the time it gets better - but sometimes it >>>> takes a reboot to fix the problem. I have an identical system that does >>>> just processing and ingesting data from remote systems, and it does not >>>> have this problem. I have tried a number of different kernel >>>> configurations, but they all show the same problem. >>>> >>>> I suspect a problem with SUNRPC. I notice that there were a large >>>> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how >>>> to pin down which patches are causing the problem. Are there ways to >>>> figure where in the kernel the time is being spent? I am will to work >>>> on isolating the problem, but I need some suggestions on the best way to >>>> do it given the large number of SUNRPC patches in 2.6.26 and the fact >>>> that each experiment takes a day. >>>> >>>> >>> The kernel sunrpc interface is not exported to user land: the glibc code >>> uses its own, entirely separate implementation of sunrpc. >>> >>> I cannot therefore see, how your application's RPC calls can be affected >>> by kernel sunrpc changes. >>> >>> Cheers >>> Trond >>> >>> >>> >> Then how do you explain the the large system time used with 2.6.26 and >> beyond? Is it some other patch I should be looking at? >> > > I'm not explaining it. I'm saying that nothing outside the kernel NFS > and NLM code uses the kernel sunrpc implementation. Your userland RPC > calls are using glibc's implementation of sunrpc. Those are unaffected > by patches to the kernel sunrpc layer. > > If you are seeing a hang, then I suggest you start by using the strace > utility to figure out which system call is actually involved. > > Cheers > Trond > > The problem is that it is not hanging. The processes are running through a lot of systems calls. It is just that the system time jumps up to over 95% on all 8 processors with 2.6.26 and beyond. I never see that with 2.6.25.17. I will try looking again and see if there are certain calls that are taking a lot of time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists