lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 30 Mar 2010 21:01:08 -0700
From:	Jiaying Zhang <jiayingz@...gle.com>
To:	Steven Rostedt <srostedt@...hat.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: lockup in rb_get_reader_page

Thanks a lot for your quick reply!

On Tue, Mar 30, 2010 at 5:35 PM, Steven Rostedt <srostedt@...hat.com> wrote:
> Hi Jiaying,
>
>
> On Tue, 2010-03-30 at 16:27 -0700, Jiaying Zhang wrote:
>> Hi Steven,
>>
>> We recently saw some cpu lockups while running kernel tracing.
>> The problem started to happen after we synced up our ring buffer
>> code with the upstream lockless ring buffer change. It usually
>> took many hours and heavy trace load to hit this problem. When
>> the lockup happens, the problematic cpu seemed to be in an infinite
>> loop of trying to grab the head_page in rb_get_reader_page().
>>
>> We would like to check with you on whether this is a known issue.
>> If so, do we have a bug fix? If not, do you have any suggestions on
>> where we should check?
>
> I'm unaware of any problems with the ring buffer. I've been running the
> lockless version for over a year now, and hammering it with very
> intensive tracing.
>
> Now, you see this on the reader side. There has been a few recent fixes
> that could cause problems when we have multiple readers. Ftrace usage
> does not usually encounter multiple readers so I have not had issues.
> But Li Zefan had a stress test that did find and trigger the problems.
>
I looked at Li Zefan's patch but I think that is to fix a lockup issue in
ftrace. We are not using ftrace right now but still use our own kernel
trace wrapper built on top of ring buffer. We also only use one reader.
I guess it is more likely a race between trace reader and trace writer.

> Are you using the latest ring buffer that is in Linus's tree? Are you
> resetting the ring buffer while reading it?
>
No. We don't allow resetting buffer when there is any active reader.
I don't think we ever reset the ring buffer on those machines that hit
this problem.

> I guess I need to know more exactly what you are doing to understand the
> problem.
>
I am going to patch rb_get_reader_page() to print out some debugging
message when it enters into an infinitely loop. I will keep you updated
if I find any interesting info.

Thanks a lot!

Jiaying

> Thanks,
>
> -- Steve
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ