lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200627122332.ki2otaiw3v7wndbl@wittgenstein>
Date:   Sat, 27 Jun 2020 14:23:32 +0200
From:   Christian Brauner <christian.brauner@...ntu.com>
To:     kernel test robot <rong.a.chen@...el.com>
Cc:     linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        linux-m68k@...ts.linux-m68k.org, x86@...nel.org, lkp@...ts.01.org
Subject: Re: [fork] 11689456e6: ltp.clone302.fail

On Sat, Jun 27, 2020 at 04:27:48PM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 11689456e6df828b7917a558a36212e68fa9aa69 ("[PATCH 01/17] fork: fold legacy_clone_args_valid() into _do_fork()")
> url: https://github.com/0day-ci/linux/commits/Christian-Brauner/arch-remove-do_fork-and-HAVE_COPY_THREAD_TLS/20200623-080105
> base: https://git.kernel.org/cgit/linux/kernel/git/davem/sparc.git master
> 
> in testcase: ltp
> with following parameters:
> 
> 	disk: 1HDD
> 	fs: ext4
> 	test: syscalls_part1
> 
> test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
> 
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <rong.a.chen@...el.com>
> 
> 
> <<<test_start>>>
> tag=clone302 stime=1593153327
> cmdline="clone302"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_buffers.c:55: INFO: Test is using guarded buffers
> tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> clone302.c:92: PASS: invalid args: clone3() failed as expected: EFAULT (14)
> clone302.c:92: PASS: zero size: clone3() failed as expected: EINVAL (22)
> clone302.c:92: PASS: short size: clone3() failed as expected: EINVAL (22)
> clone302.c:92: PASS: extra size: clone3() failed as expected: EFAULT (14)
> clone302.c:92: PASS: sighand-no-VM: clone3() failed as expected: EINVAL (22)
> clone302.c:92: PASS: thread-no-sighand: clone3() failed as expected: EINVAL (22)
> clone302.c:92: PASS: fs-newns: clone3() failed as expected: EINVAL (22)
> clone302.c:92: PASS: invalid pidfd: clone3() failed as expected: EFAULT (14)
> clone302.c:92: PASS: invalid childtid: clone3() failed as expected: EFAULT (14)
> clone302.c:88: FAIL: invalid parenttid: clone3() should fail with EFAULT: EINVAL (22)

In short, this is a change in failure behavior for clone3() I did expect
and am willing to risk. Here's why in the short form:
- clone3() is extremely new
- this failed before
- setting both CLONE_PIDFD and CLONE_PARENT_SETTID is extremely rare
  (haven't seen it in any codebases I know that use clone3())
- setting both CLONE_PIDFD and CLONE_PARENT_SETTID __and__ pointing them
  to the same adress doesn't work
  (haven't seen it in any codebases I know that use clone3() but see
  some more notes on that below)
- the change makes a special case go away and simplifies multiple
  call-sites

So a few notes about the test. I did stare at it for a while and was
confused why you expect EFAULT to be returned when CLONE_PARENT_SETTID
is set to an invalid memory address. Because that doesn't make sense.
When the parent tid is written to the memory location for
CLONE_PARENT_SETTID we're past the point of no return of process
creation, i.e. the return value from put_user() isn't checked and can't
be checked anymore so you'd never receive EFAULT for a bogus parent_tid
memory address. This is not something new. This has been the case since
the introduction of pid namespaces and specifically since commit
30e49c263e36 ("pid namespaces: allow cloning of new namespace").

But then it dawned on me. You're setting CLONE_PIDFD |
CLONE_PARENT_SETTID and you're pointing:
- args->parent_tid	= <invalid-address>
- args->pidfd		= NULL
so the EFAULT you've seen so far in your test-suite has never been for
CLONE_PARENT_SETTID but for CLONE_PIDFD since that value is written
before the point of no return and consequently put_user() is checked and
the EFAULT is surfaced. So independent of that issue here you might want
to adapt that test so it really tests what you want. :) (And maybe it's
worth documenting on the manpage for clone{3}() that failures for
CLONE_PARENT_SETTID and CLONE_CHILD_SETTID are not seen.)

(Also, note that for some reason, args->pidfd and pargs->parent_tid
must've ended up pointing to the same address in your test-suite. So my
guess is that args->pidfd pointed to garbage which got turned into a
useable address after tst_get_bad_addr() returned &invalid_address.
Maybe I'm missing something subtle though.)

Christian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ