[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <04ba3f0f-6ca8-b8ff-3fd8-2ccc63482e04@quicinc.com>
Date: Tue, 14 Feb 2023 17:44:31 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Steven Rostedt <rostedt@...dmis.org>
CC: <zhengyejian1@...wei.com>, <linux-kernel@...r.kernel.org>,
<linux-trace-kernel@...r.kernel.org>, <wanghai38@...wei.com>
Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of
iter read
On 2/11/2023 10:07 PM, Steven Rostedt wrote:
> On Fri, 10 Feb 2023 20:52:36 +0530
> Mukesh Ojha <quic_mojha@...cinc.com> wrote:
>
>>>> return -1;
>>>>
>>>> list_for_each_entry_safe(bpage, tmp, head, list) {
>>>
>>> I'd like to know if there is a case that "head" happens to be a
>>> "reader_page", and the ring buffer is not exactly being traversed?
>
> No, the way it works is that the reader page is found by searching for the
> head pointer, and then it is set when swapped. Basically, the pseudo code
> is:
>
> reader->next = head_page | HEAD_FLAG
> val = head_page->prev->next
> val &= ~FLAGS
> val |= HEAD_FLAG
> cmpxchg(head_page->prev->next, val, reader)
>
> The HEAD_FLAG is always on the pointer that points to the head page that
> gets swapped. This will never point to the reader page, as that would mean
> the writer has access to it.
>
>>
>> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
>> cpu_buffer->pages has duplicate entry in the list.
>>
>> -00 |rb_list_head_clear(inline)
>> -00 |rb_head_page_deactivate(inline)
>> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
>> -01 |atomic_try_cmpxchg_acquire(inline)
>> -01 |queued_spin_lock(inline)
>> -01 |do_raw_spin_lock_flags(inline)
>> -01 |__raw_spin_lock_irqsave(inline)
>> -01 |_raw_spin_lock_irqsave(inline)
>> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
>> -02 |cpumask_next(inline)
>> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
>> -03 |__fput(file = 0xFFFFFF8A53A63F00)
>> -04 |____fput(work = ?)
>> -05 |_raw_spin_unlock_irq(inline)
>> -05 |task_work_run()
>> -06 |tracehook_notify_resume(inline)
>> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
>> -07 |prepare_exit_to_user_mode(inline)
>> -07 |exit_to_user_mode(inline)
>> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
>> -08 |el0t_64_sync_handler(regs = ?)
>> -09 |el0t_64_sync(asm)
>>
>> ...
>> ..
>> ffffff80359eeb00 --> Duplicate entry
>> ffffff80359ee300
>> ffffff80359ee180
>> ffffff80359eeec0
>> ffffff80359eec00
>> ffffff80359ee800 -- Tail page
>> ffffff80359eedc0 -- Head page
>> ffffff80359ee640
>> ffffff80359ee080
>> ffffff80359ee700
>> ffffff80359ee7c0
>> ffffff80359eed80
>> ffffff80359ee900
>> ffffff80359ee9c0
>> ffffff80359eea00
>> ffffff80359eea80
>> ffffff80359eec80
>> ffffff80359ee240
>> ffffff80359ee6c0
>> ffffff80359ee0c0
>> ffffff80359ee8c0
>> ffffff80359ee940
>> ffffff80359eee00
>> ffffff80359ee000
>> ffffff80359eeb00 ---> Duplicate entry
>
> So this is a separate issue where the ring buffer is corrupted?
It looks to be different issue and there also i see similar call stack
of tracing_release() but in that issue it is looping forever in
deactivate call due to list corruption.
I am not yet able to root cause a place of corruption as it is
reproduced only once, will need to check more on this.
For this issue, i have posted at
https://lore.kernel.org/lkml/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com/
-Mukesh
>
> -- Steve
Powered by blists - more mailing lists