lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 22 Oct 2008 19:37:46 -0400
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	Harry Edmon <harry@...os.washington.edu>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: SUNRPC problem with 2.6.26 and beyond - try again with
	response in correct place.

On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
> Trond Myklebust wrote:
> > On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
> >   
> >> I have a dual quad-core Xeon system running software 
> >> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
> >> weather data through RPC calls, keeping a queue of data in a memory 
> >> mapped file.  Up until 2.6.26 the system has run just fine (for example 
> >> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
> >> into a problem after approximately 24 hours.  The symptom is that the 
> >> processing slows down to a crawl.  Using "top" I can see that the System 
> >> time is up over 90%, with almost no User and Wait time.  If I stop and 
> >> restart the software, most of the time it gets better - but sometimes it 
> >> takes a reboot to fix the problem.  I have an identical system that does 
> >> just processing and ingesting data from remote systems, and it does not 
> >> have this problem.  I have tried a number of different kernel 
> >> configurations, but they all show the same problem.
> >>
> >> I suspect a problem with SUNRPC.  I notice that there were a large 
> >> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
> >> to pin down which patches are causing the problem.  Are there ways to 
> >> figure where in the  kernel the time is being spent?  I am will to work 
> >> on isolating the problem, but I need some suggestions on the best way to 
> >> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
> >> that each experiment takes a day.
> >>     
> >
> > The kernel sunrpc interface is not exported to user land: the glibc code
> > uses its own, entirely separate implementation of sunrpc.
> >
> > I cannot therefore see, how your application's RPC calls can be affected
> > by kernel sunrpc changes.
> >
> > Cheers
> >   Trond
> >
> >   
> Then how do you explain the the large system time used with 2.6.26 and 
> beyond?  Is it some other patch I should be looking at?

I'm not explaining it. I'm saying that nothing outside the kernel NFS
and NLM code uses the kernel sunrpc implementation. Your userland RPC
calls are using glibc's implementation of sunrpc. Those are unaffected
by patches to the kernel sunrpc layer.

If you are seeing a hang, then I suggest you start by using the strace
utility to figure out which system call is actually involved.

Cheers
  Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ