lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50F482D1.4090301@gmail.com>
Date:	Mon, 14 Jan 2013 14:12:33 -0800
From:	David Daney <ddaney.cavm@...il.com>
To:	Alan Cooper <alcooperx@...il.com>
CC:	ralf@...ux-mips.org, linux-mips@...ux-mips.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mips: function tracer: Fix broken function tracing

On 01/14/2013 01:10 PM, Alan Cooper wrote:
> I already tried using "adddiu sp, sp, 8" and it caused the kernel to
> randomly crash. After many hours of debugging the reason occurred to
> me while in bed in the middle of the night. The problem is that if we
> get an interrupt between the add 8 and the add -8 instructions, we
> trash the existing stack.
>
> The problem with the 2 nop approach is that there are a series of
> subroutines used to write each nop and these nested subroutines are
> traceable.

This seems like a bug.  The low-level code used to do code patching 
probably should be CFLAGS_REMOVE_file.o = -pg


> This means on the second call to these subroutines they
> execute with only one nop and crash. I could  write  some new code
> that wrote the 2 nops at once, but (now that I understand
> "stop_machine") with the branch likely solution we should be able to
> stop using "stop_machine" when we write nops to the 20-30 thousand
> Linux functions. It looks like other platforms have stopped using
> stop_machine.

I don't particularly object to the 'branch likely solution', but I think 
the failures of the other approaches indicates underlying bugs in the 
tracing code.  Those bugs should probably be fixed.

David Daney


>
> Al
>
> On Fri, Jan 11, 2013 at 12:01 PM, David Daney <ddaney.cavm@...il.com> wrote:
>> On 01/11/2013 06:33 AM, Al Cooper wrote:
>>>
>>> Function tracing is currently broken for all 32 bit MIPS platforms.
>>> When tracing is enabled, the kernel immediately hangs on boot.
>>> This is a result of commit b732d439cb43336cd6d7e804ecb2c81193ef63b0
>>> that changes the kernel/trace/Kconfig file so that is no longer
>>> forces FRAME_POINTER when FUNCTION_TRACING is enabled.
>>>
>>> MIPS frame pointers are generally considered to be useless because
>>> they cannot be used to unwind the stack. Unfortunately the MIPS
>>> function tracing code has bugs that are masked by the use of frame
>>> pointers. This commit fixes the bugs so that MIPS frame pointers do
>>> not need to be enabled.
>>>
>>> The bugs are a result of the odd calling sequence used to call the trace
>>> routine. This calling sequence is inserted into every tracable function
>>> when the tracing CONFIG option is enabled. This sequence is generated
>>> for 32bit MIPS platforms by the compiler via the "-pg" flag.
>>> Part of the sequence is "addiu sp,sp,-8" in the delay slot after every
>>> call to the trace routine "_mcount" (some legacy thing where 2 arguments
>>> used to be pushed on the stack). The _mcount routine is expected to
>>> adjust the sp by +8 before returning.
>>>
>>> One of the bugs is that when tracing is disabled for a function, the
>>> "jalr _mcount" instruction is replaced with a nop, but the
>>> "addiu sp,sp,-8" is still executed and the stack pointer is left
>>> trashed. When frame pointers are enabled the problem is masked
>>> because any access to the stack is done through the frame
>>> pointer and the stack pointer is restored from the frame pointer when
>>> the function returns. This patch uses a branch likely instruction
>>> "bltzl zero, f1" instead of "nop" to disable the call because this
>>> instruction will not execute the "addiu sp,sp,-8" instruction in
>>> the delay slot. The other possible solution would be to nop out both
>>> the jalr and the "addiu sp,sp,-8", but this would need to be interrupt
>>> and SMP safe and would be much more intrusive.
>>
>>
>> I thought all CPUs were in stop_machine() when the modifications were done,
>> so that there is no issue with multi-word instruction patching.
>>
>> Am I wrong about this?
>>
>> So really I think you can do two NOP just as easily.
>>
>> The only reason I bring this up is that I am not sure all 32-bit CPUs
>> implement the 'Likely' branch variants. Also there may be an affect on the
>> branch predictor.
>>
>> A third possibility would be to replace the JALR with 'ADDIU SP,SP,8' That
>> way the following adjustment would be canceled out.  This would work well on
>> single-issue CPUs, but the instructions may not be able to dual-issue on a
>> multi issue machine due to data dependencies.
>>
>> David Daney
>>
>>
>>>
>>> A few other bugs were fixed where the _mcount routine itself did not
>>> always fix the sp on return.
>>>
>>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ