linux-kernel - Re: Tracing: rb_head_page_deactivate() caught in an infinite loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200701220316.1baf0a50@oasis.local.home>
Date:   Wed, 1 Jul 2020 22:03:16 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     rananta@...eaurora.org
Cc:     mingo@...hat.com, psodagud@...eaurora.org,
        linux-kernel@...r.kernel.org
Subject: Re: Tracing: rb_head_page_deactivate() caught in an infinite loop

On Wed, 01 Jul 2020 10:07:06 -0700
rananta@...eaurora.org wrote:

> Hi Steven and Mingo,
> 

Hi Raghavendra,


> While trying to adjust the buffer size (echo <size> > 
> /sys/kernel/debug/tracing/buffer_size_kb), we see that the kernel gets 
> caught up in an infinite loop
> while traversing the "cpu_buffer->pages" list in 
> rb_head_page_deactivate().
> 
> Looks like the last node of the list could be uninitialized, thus 
> leading to infinite traversal. From the data that we captured:
> 000|rb_head_page_deactivate(inline)
>      |  cpu_buffer = 0xFFFFFF8000671600 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE600 -> (
> ...
>      |    pages = 0xFFFFFF80A909D980 = 
> kernel_size_le_lo32+0xFFFFFF65D811A980 -> (
>      |      next = 0xFFFFFF80A909D200 = 
> kernel_size_le_lo32+0xFFFFFF65D811A200 -> (
>      |        next = 0xFFFFFF80A909D580 = 
> kernel_size_le_lo32+0xFFFFFF65D811A580 -> (
>      |          next = 0xFFFFFF8138D1CD00 = 
> kernel_size_le_lo32+0xFFFFFF6667D99D00 -> (
>      |            next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |              next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |                next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
>      |                  next = 0xFFFFFF80006716F0 = 
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0,
> 
> Wanted to check with you if there's any scenario that could lead us into 
> this state.
> 
> Test details:
> -- Arch: arm64
> -- Kernel version 5.4.30; running on Andriod
> -- Test case: Running the following set of commands across reboot will 
> lead us to the scenario
> 
>    atrace --async_start -z -c -b 120000 sched audio irq idle freq
>    < Run any workload here >
>    atrace --async_dump -z -c -b 1200000 sched audio irq idle freq > 
> mytrace.trace
>    atrace --async_stop > /dev/null
>    echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb
>    echo 200000 > /sys/kernel/debug/tracing/buffer_size_kb
>    reboot
> 
> Repeating the above lines across reboots would reproduce the issue.
> The "atrace" or "echo" would just get stuck while resizing the buffer 
> size.

What do you mean repeat across reboots? If it doesn't happen it wont
ever happen, but if you reboot it may have it happen again?

> I'll try to reproduce the issue without atrace as well, but wondering 
> what could be the reason for leading us to this state.

I haven't used arm lately, and I'm unfamiliar with atrace. So I don't
really know what is going on. If you can reproduce this with just a
shell script accessing the ftrace files, that would be much more useful.

Thanks,

-- Steve