linux-kernel - Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5721FF35.6090602@huawei.com>
Date:	Thu, 28 Apr 2016 20:16:53 +0800
From:	"Zhangjian (Bamvor)" <bamvor.zhangjian@...wei.com>
To:	Andrew Pinski <pinskia@...il.com>
CC:	Yury Norov <ynorov@...iumnetworks.com>,
	Arnd Bergmann <arnd@...db.de>,
	Catalin Marinas <catalin.marinas@....com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	"Kapoor, Prasun" <Prasun.Kapoor@...iumnetworks.com>,
	Andreas Schwab <schwab@...e.de>,
	"Nathan Lynch" <Nathan_Lynch@...tor.com>,
	Alexander Graf <agraf@...e.de>,
	"Alexey Klimov" <klimov.linux@...il.com>,
	Mark Brown <broonie@...nel.org>,
	"Joseph S. Myers" <joseph@...esourcery.com>,
	<christoph.muellner@...obroma-systems.com>,
	<linux-doc@...r.kernel.org>,
	Linux-Arch <linux-arch@...r.kernel.org>,
	linux-s390 <linux-s390@...r.kernel.org>,
	Hanjun Guo <guohanjun@...wei.com>,
	GCC Mailing List <gcc@....gnu.org>,
	"Zhangjian (Bamvor)" <bamvor.zhangjian@...wei.com>
Subject: Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

Hi, Andrew

On 2016/4/28 5:15, Andrew Pinski wrote:
> On Wed, Apr 27, 2016 at 12:30 AM, Andrew Pinski <pinskia@...il.com> wrote:
>> On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor)
>> <bamvor.zhangjian@...wei.com> wrote:
>>> Hi, Yury
>>>
>>>
>>> On 2016/4/6 6:44, Yury Norov wrote:
>>>>
>>>> There are about 20 failing tests of 782 in lite scenario.
>>>> float_bessel
>>>> float_exp_log
>>>> float_iperb
>>>> float_power
>>>> float_trigo
>>>> pipeio_1
>>>> pipeio_3
>>>> pipeio_5
>>>> pipeio_8
>>>> abort01
>>>> clone02
>>>> kill11
>>>> mmap16
>>>> open12
>>>> pause01
>>>> rename11
>>>> rmdir02
>>>> umount2_01
>>>> umount2_02
>>>> umount2_03
>>>> utime06
>>>> mtest06
>>>>
>>>> The list is rough because some tests fail not every time.
>>>>
>>>> Tests abort01 and kill11 fail for lp64 too, so maybe there's
>>>> a reason unrelated to ilp32 itself.
>>>>
>>>> float_xxx tests fail because they call unwind() from signal context,
>>>> and GCC for ilp32 has problem with it, as Andrew told.
>>>
>>> Is there some progress about this issue. When we talk about unwind
>>> functions, do you mean the function in libgcc?
>>>
>>> We encountered another issue(abort not segfault) which also called
>>> pthread_cancel(). The test code is in the attachment. Here is the
>>> backtrace:
>>
>> Yes this was a known issue I knew about.  I have a patch GCC to fix
>> this.  Basically REG_VALUE_IN_UNWIND_CONTEXT needs to be defined while
>> building libgcc to support the correct unwind information.
>> I will be posting a GCC patch to fix this tomorrow.  This was a bug
>> even in the original set of ilp32 patches.  I only finally was able to
>> sit down and fix it today.
>
> Here is the link to the GCC patch which I said was going to submit today:
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01726.html
It works for me. Both float_xx in ltp and my pthread_cancel testcase is
pass.

Regards

Bamvor

>
> Thanks,
> Andrew
>
>>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> ```
>>> Program received signal SIGABRT, Aborted.
>>> [Switching to Thread 0xf77ee330 (LWP 2958)]
>>> 0x000000000040f5bc in raise (sig=sig@...ry=6)
>>>      at ../sysdeps/unix/sysv/linux/raise.c:55
>>> 55      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>> (gdb) bt
>>> #0  0x000000000040f5bc in raise (sig=sig@...ry=6)
>>>      at ../sysdeps/unix/sysv/linux/raise.c:55
>>> #1  0x000000000040f884 in abort () at abort.c:89
>>>
>>> #2  0x00000000004073b4 in uw_update_context_1 (
>>>      context=context@...ry=0xf77ec820, fs=fs@...ry=0xf77ebec8)
>>> at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1430
>>>
>>> #3  0x00000000004078c0 in uw_update_context
>>> (context=context@...ry=0xf77ec820,
>>>      fs=fs@...ry=0xf77ebec8)
>>>     at
>>> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1506
>>> #4  0x0000000000407a9c in uw_advance_context (fs=0xf77ebec8,
>>>      context=0xf77ec820)
>>>      at
>>> /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind-dw2.c:1529
>>> #5  _Unwind_ForcedUnwind_Phase2 (exc=exc@...ry=0xf77ee580,
>>>      context=context@...ry=0xf77ec820)
>>>      at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:185
>>> #6  0x0000000000408228 in _Unwind_ForcedUnwind (exc=0xf77ee580,
>>>      stop=stop@...ry=0x405440 <unwind_stop>, stop_argument=0xf77eddd8)
>>>      at /home/GCC-Build/p660/p660_build_dir/src/gcc-4.9/libgcc/unwind.inc:207
>>> #7  0x00000000004055c4 in __pthread_unwind (buf=<optimized out>)
>>>      at unwind.c:126
>>> #8  0x00000000004050b4 in __do_cancel () at ./pthreadP.h:283
>>> #9  sigcancel_handler (sig=<optimized out>, si=<optimized out>,
>>>      ctx=<optimized out>) at nptl-init.c:225
>>> ---Type <return> to continue, or q <return> to quit---
>>> #10 <signal handler called>
>>>
>>> #11 0x0000000000000000 in ?? ()
>>>
>>> #12 0x0000000000423084 in __select (nfds=-66661, readfds=<optimized out>,
>>>      writefds=<optimized out>, exceptfds=<optimized out>, timeout=0x0)
>>>      at ../sysdeps/unix/sysv/linux/generic/select.c:45
>>> #13 0x0000000000400604 in TEST_TaskDelay (
>>>      uiMillSecs=<error reading variable: can't compute CFA for this frame>)
>>>      at test-cancel.c:18
>>> #14 0x0000000000400680 in printids (
>>>      s=<error reading variable: can't compute CFA for this frame>)
>>>      at test-cancel.c:38
>>> #15 0x00000000004006d0 in thr_fn (
>>>      arg=<error reading variable: can't compute CFA for this frame>)
>>>      at test-cancel.c:49
>>> #16 0x0000000000401b28 in start_thread (arg=0x4a3000) at
>>> pthread_create.c:335
>>> #17 0x0000000000401b28 in start_thread (arg=0x4a3000) at
>>> pthread_create.c:335
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> ```
>>>
>>> Such abort is raise by the following code:
>>> ```
>>> static void
>>> uw_update_context_1 (struct _Unwind_Context *context, _Unwind_FrameState
>>> *fs)
>>> {
>>> //...
>>>    /* Compute this frame's CFA.  */
>>>    switch (fs->regs.cfa_how)
>>>      {
>>>      case CFA_REG_OFFSET:
>>>        cfa = _Unwind_GetPtr (&orig_context, fs->regs.cfa_reg);
>>>        cfa += fs->regs.cfa_offset;
>>>        break;
>>>
>>>      case CFA_EXP:
>>>        {
>>>          const unsigned char *exp = fs->regs.cfa_exp;
>>>          _uleb128_t len;
>>>
>>>          exp = read_uleb128 (exp, &len);
>>>          cfa = (void *) (_Unwind_Ptr)
>>>            execute_stack_op (exp, exp + len, &orig_context, 0);
>>>          break;
>>>        }
>>>
>>>      default:
>>>        gcc_unreachable ();
>>>      }
>>>    context->cfa = cfa;
>>> //...
>>> }
>>> ``
>>>
>>> Any suggestion is appreciated.
>>>
>>> CC gcc mailing list. Sorry if it is off topic.
>>>
>>> Regards
>>>
>>> Bamvor
>>>
>>>
>>>
>>>
>>>> pipeio_x tests are very unstable and may fail randomly. I strongly
>>>> suspect race conditions, as they all work like a charm if pinned to
>>>> single CPU with taskset. Probably, race is the reason of clone02 too.
>>>> Though I'm not sure, is the race in kernel, glibc or test itself.
>>>>
>>>> But I know for sure that pause01 fails due to test design:
>>>>          if (setitimer(ITIMER_REAL, &it, NULL)) // For 1000us
>>>>                  tst_brkm(TBROK | TERRNO, NULL, "setitimer() failed");
>>>>
>>>>          TEST(pause());
>>>>
>>>> As setitimer() and pause() calls are not atomic, alarm may come before
>>>> pause()
>>>> is called, and be silently dropped by the handler. Next pause() call hangs
>>>> test forever. I already reported to LTP list.
>>>>
>>>> open12, rename11, rmdir02, mmap16, mtest06 - all call mkfs tool, and it
>>>> returns
>>>> error code. I didn't investigate it much yet.
>>>>
>>>> umount02_x, utime06 - cannot reproduce out of scenario, even run it in
>>>> infinite
>>>> loop - they work fine.
>>>>
>>>> Full test log is attached.
>>>>
>>>> Yury
>>>>
>>>