linux-kernel - Re: multi-second application stall in open()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKA=qzb0T62xFsDpiQ0nsnE--HQ2bXFKN2EWr4knjR_+bOshUA@mail.gmail.com>
Date:	Wed, 7 Mar 2012 13:47:25 -0600
From:	Josh Hunt <joshhunt00@...il.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	axboe@...nel.dk, linux-kernel@...r.kernel.org, tj@...nel.org
Subject: Re: multi-second application stall in open()

On Wed, Mar 7, 2012 at 10:28 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
> On Wed, Mar 07, 2012 at 07:43:42AM -0600, Josh Hunt wrote:
>> On Tue, Mar 6, 2012 at 3:56 PM, Josh Hunt <joshhunt00@...il.com> wrote:
>> > I am trying to diagnose a stall in an application when making a call
>> > to open() with O_TRUNC. The issue was originally seen on 2.6.38 and
>> > I've since reproduced on 2.6.35. I believe the last kernel that I did
>> > not see the stall was 2.6.32. The system is under low to moderate cpu
>> > and memory load when the issue occurs. The filesystem is ext2, but I
>> > mounted it as ext4 and still saw the issue.
>> >
>> > To me it appears the O_TRUNC needs to truncate pages in the page
>> > cache, but maybe those are already out for writeback? This causes the
>> > open() to wait until that operation completes at which time it is then
>> > able to proceed. If that's true then it looks like the writeback is
>> > sitting on the queue forever waiting (tens of seconds) to be
>> > dispatched to the device.
>> >
>> > Here's some more info/data I collected:
>> > (Unfortunately the data here and attached are from separate runs so
>> > the #s will not match up file to file.)
>> >
>> > Snippet of an strace when it is hanging:
>> > 1329323286.589182 close(37)             = 0 <0.000010>
>> > 1329323286.589212 clock_gettime(0x5 /* CLOCK_??? */, {1329323286,
>> > 587579583}) = 0 <0.000007>
>> > 1329323286.589241 open("/usr/local/etc/abcd/known/test808",
>> > O_WRONLY|O_CREAT|O_TRUNC, 0644)                = 37 <76.841586>
>> > 1329323363.441841 fchmod(37, 0644)      = 0 <0.000012>
>> > 1329323363.441882 fcntl64(37, F_GETFL)  = 0x1 (flags O_WRONLY) <0.000011>
>> > 1329323363.441915 fcntl64(37, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 <0.000010>
>> >
>> > To debug this more I enabled the hang check timer to dump stack when
>> > anything hung for longer than 10s. There I see the the hanging
>> > application appearing to be stuck in sync_page(). See attached dmesg
>> > for traces.
>> >
>> > I've also captured "iostat -xm 1" while the application was running
>> > and have attached a snippet of that when the problem is occurring. I
>> > see 100% utilization for many seconds and then eventually the
>> > operations complete, but the await time is extremely high. To me there
>> > doesn't look like much IO going on which could be starving out any
>> > requests.
>> >
>> > I have not been able to perform a full bisection to better understand
>> > what may be causing this, but do see this on 2.6.38, a little less
>> > frequently but still occurs on 2.6.35, and no longer see the issue on
>> > 2.6.32. If I switch slice_idle to 0 or change to the deadline
>> > scheduler the issue appears to go away in both 2.6.35 and 38.
>
> Is it happening on latest kernel (3.3-rc6) ?
>
> Can you change the IO scheduler to deadline on sda and see if the issue
> still happens. (echo deadline > /sys/block/sda/queue/scheduler).
>
> blktrace of cfq look odd. I see that some IO (async writes) are being
> submitted but CFQ did not dispatch it for a long time.  Even some unplugs
> came in still nothing happened. Also no completions are happening during
> that window. Not sure why CFQ refuses to dispatch queued writes.
>
> Request added by flusher.
>
>  8,0    1    36926  5028.546000122  2846  A   W 20147012 + 8 <- (8,3)
> 3375152
>  8,0    1    36927  5028.546001798  2846  Q   W 20147012 + 8 [flush-8:0]
>  8,0    1    36928  5028.546009900  2846  G   W 20147012 + 8 [flush-8:0]
>  8,0    1    36929  5028.546014649  2846  I   W 20147012 + 8 (    4749)
> [flush-8:0]
>
> And this request is dispatched after 22 seconds.
>
>  8,0    1    37056  5050.117337221   162  D   W 20147012 + 16 (21571322572) [sync_supers]
>
>
> And it completes fairly fast.
>
>  8,0    0    36522  5050.117686149  9657  C   W 20147012 + 16 (  348928)
> [0]
>
> So not sure why CFQ will hold that request for so long when other IO is
> not happening.
>
> Please try latest kernels and see if deadline has the same issue. If not,
> then we know somehow CFQ is related. If it still happens on latest
> kernels, can you try capturing blktrace again when you are experiencing
> the delays.
>
> Thanks
> Vivek

Vivek thanks for the response. I did try deadline and do not see the
issue using that. I also do not see the problem when slice_idle is set
to 0. Unfortunately I cannot run the testcase on 3.3-rc right now. I
also wanted to point out that there seems to be very little IO going
on (at least from what I can tell.) Please double check me with the
iostat output I sent in the previous mail. I am wondering if it's
possible that there's not enough requests here to keep things running
smoothly? If that's a possibility?

Thanks
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/