linux-kernel - Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <04ba3f0f-6ca8-b8ff-3fd8-2ccc63482e04@quicinc.com>
Date:   Tue, 14 Feb 2023 17:44:31 +0530
From:   Mukesh Ojha <quic_mojha@...cinc.com>
To:     Steven Rostedt <rostedt@...dmis.org>
CC:     <zhengyejian1@...wei.com>, <linux-kernel@...r.kernel.org>,
        <linux-trace-kernel@...r.kernel.org>, <wanghai38@...wei.com>
Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of
 iter read



On 2/11/2023 10:07 PM, Steven Rostedt wrote:
> On Fri, 10 Feb 2023 20:52:36 +0530
> Mukesh Ojha <quic_mojha@...cinc.com> wrote:
> 
>>>> 		return -1;
>>>>
>>>> 	list_for_each_entry_safe(bpage, tmp, head, list) {
>>>
>>> I'd like to know if there is a case that "head" happens to be a
>>> "reader_page", and the ring buffer is not exactly being traversed?
> 
> No, the way it works is that the reader page is found by searching for the
> head pointer, and then it is set when swapped. Basically, the pseudo code
> is:
> 
>    reader->next = head_page | HEAD_FLAG
>    val = head_page->prev->next
>    val &= ~FLAGS
>    val |= HEAD_FLAG
>    cmpxchg(head_page->prev->next, val, reader)
> 
> The HEAD_FLAG is always on the pointer that points to the head page that
> gets swapped. This will never point to the reader page, as that would mean
> the writer has access to it.
> 
>>
>> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
>> cpu_buffer->pages has duplicate entry in the list.
>>
>> -00 |rb_list_head_clear(inline)
>> -00 |rb_head_page_deactivate(inline)
>> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
>> -01 |atomic_try_cmpxchg_acquire(inline)
>> -01 |queued_spin_lock(inline)
>> -01 |do_raw_spin_lock_flags(inline)
>> -01 |__raw_spin_lock_irqsave(inline)
>> -01 |_raw_spin_lock_irqsave(inline)
>> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
>> -02 |cpumask_next(inline)
>> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
>> -03 |__fput(file = 0xFFFFFF8A53A63F00)
>> -04 |____fput(work = ?)
>> -05 |_raw_spin_unlock_irq(inline)
>> -05 |task_work_run()
>> -06 |tracehook_notify_resume(inline)
>> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
>> -07 |prepare_exit_to_user_mode(inline)
>> -07 |exit_to_user_mode(inline)
>> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
>> -08 |el0t_64_sync_handler(regs = ?)
>> -09 |el0t_64_sync(asm)
>>
>> ...
>> ..
>> ffffff80359eeb00 --> Duplicate entry
>> ffffff80359ee300
>> ffffff80359ee180
>> ffffff80359eeec0
>> ffffff80359eec00
>> ffffff80359ee800 -- Tail page
>> ffffff80359eedc0 -- Head page
>> ffffff80359ee640
>> ffffff80359ee080
>> ffffff80359ee700
>> ffffff80359ee7c0
>> ffffff80359eed80
>> ffffff80359ee900
>> ffffff80359ee9c0
>> ffffff80359eea00
>> ffffff80359eea80
>> ffffff80359eec80
>> ffffff80359ee240
>> ffffff80359ee6c0
>> ffffff80359ee0c0
>> ffffff80359ee8c0
>> ffffff80359ee940
>> ffffff80359eee00
>> ffffff80359ee000
>> ffffff80359eeb00 ---> Duplicate entry
> 
> So this is a separate issue where the ring buffer is corrupted?

It looks to be different issue and there also i see similar call stack 
of tracing_release() but in that issue it is looping forever in 
deactivate call due to list corruption.

I am not yet able to root cause a place of corruption as it is 
reproduced only once, will need to check more on this.

For this issue, i have posted at
https://lore.kernel.org/lkml/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com/

-Mukesh

> 
> -- Steve