linux-kernel - Re: lockup in rb_get_reader

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <q2o5df78e1d1003302101q585143aan32e8872fcdb05c8a@mail.gmail.com>
Date:	Tue, 30 Mar 2010 21:01:08 -0700
From:	Jiaying Zhang <jiayingz@...gle.com>
To:	Steven Rostedt <srostedt@...hat.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: lockup in rb_get_reader_page

Thanks a lot for your quick reply!

On Tue, Mar 30, 2010 at 5:35 PM, Steven Rostedt <srostedt@...hat.com> wrote:
> Hi Jiaying,
>
>
> On Tue, 2010-03-30 at 16:27 -0700, Jiaying Zhang wrote:
>> Hi Steven,
>>
>> We recently saw some cpu lockups while running kernel tracing.
>> The problem started to happen after we synced up our ring buffer
>> code with the upstream lockless ring buffer change. It usually
>> took many hours and heavy trace load to hit this problem. When
>> the lockup happens, the problematic cpu seemed to be in an infinite
>> loop of trying to grab the head_page in rb_get_reader_page().
>>
>> We would like to check with you on whether this is a known issue.
>> If so, do we have a bug fix? If not, do you have any suggestions on
>> where we should check?
>
> I'm unaware of any problems with the ring buffer. I've been running the
> lockless version for over a year now, and hammering it with very
> intensive tracing.
>
> Now, you see this on the reader side. There has been a few recent fixes
> that could cause problems when we have multiple readers. Ftrace usage
> does not usually encounter multiple readers so I have not had issues.
> But Li Zefan had a stress test that did find and trigger the problems.
>
I looked at Li Zefan's patch but I think that is to fix a lockup issue in
ftrace. We are not using ftrace right now but still use our own kernel
trace wrapper built on top of ring buffer. We also only use one reader.
I guess it is more likely a race between trace reader and trace writer.

> Are you using the latest ring buffer that is in Linus's tree? Are you
> resetting the ring buffer while reading it?
>
No. We don't allow resetting buffer when there is any active reader.
I don't think we ever reset the ring buffer on those machines that hit
this problem.

> I guess I need to know more exactly what you are doing to understand the
> problem.
>
I am going to patch rb_get_reader_page() to print out some debugging
message when it enters into an infinitely loop. I will keep you updated
if I find any interesting info.

Thanks a lot!

Jiaying

> Thanks,
>
> -- Steve
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/