linux-kernel - Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1393471595.5519.22.camel@marge.simpson.net>
Date:	Thu, 27 Feb 2014 04:26:35 +0100
From:	Mike Galbraith <bitbucket@...ine.de>
To:	Ken Moffat <zarniwhoop@...world.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

On Thu, 2014-02-27 at 00:52 +0000, Ken Moffat wrote: 
> Hi,
> 
>  Short summary : on 3.13.5, rm -rf of an application source
> directory on an ext4 filesystem sometimes takes forever (probably
> isn't going anywhere), with one CPU pegged at all-but 100% utilization.
> 
>  I've nearly finished building a new system from source, to check
> various desktop packages in linuxfromscratch.  On this build, much of
> it is things I don't normally use and I needed to upgrade my
> buildscripts, so most of it was built in chroot using 3.10.32.  But
> late last night I booted the new system using 3.13.5 to finish the
> build.  This morning I discovered that rm -rf for the icedtea source
> directory was still running, and had taken over 5 hours of CPU time
> (one CPU seemd to be running at close to 100%, the others had dropped
> to their slowest frequency).  That script was running as root (yeah,
> but it's a new system) and it looks as if /etc/passwd~ had got
> trashed, because I could no longer su or login.  Not sure if that is
> related, at this stage it might just be a side-effect of my scripts.
> 
>  Booted another system, chrooted, fixed up passwords.  Started
> again after commenting out icedtea - I hadn't intended to build
> what was an old version, I'd just forgotten it was in this script -
> that's why I do things in userspace, not the kernel :-(
> 
>  Continued with remaining packages, but a couple of hours later I
> saw a similar "one CPU at 100%, rm -rf GConf source taking forever"
> problem.  Dumped all the processes with Alt-SysRQ-T [ huge log ] but
> at that point 'rm' was merely 'ready' so I doubt there is anything
> useful to see in the log.
> 
>  Built 3.13.4, booted to that.  So far, everything looks good - but
> I'm now building the _current_ version of icedtea, so if this isn't
> a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.
> 
>  Meanwhile, any suggestions about how I can debug this if I hit it
> again, please ?

I would start with strace to see if a task is looping in userspace, then
move on to perf top -g -p <pid> (or perf record/report) to peek at what
it's up to in the kernel.  Once you have the where, trace_printk() is
the best thing since sliced bread (which ranks just below printk()).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/