linux-kernel - Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55CCB510.3060807@redhat.com>
Date:	Thu, 13 Aug 2015 17:17:36 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	David Drysdale <drysdale@...gle.com>,
	Kees Cook <keescook@...omium.org>,
	Andy Lutomirski <luto@...capital.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Will Drewry <wad@...omium.org>, Ingo Molnar <mingo@...nel.org>
CC:	Alok Kataria <akataria@...are.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Borislav Petkov <bp@...en8.de>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>, Oleg Nesterov <oleg@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>, X86 ML <x86@...nel.org>
Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values
 wrong in VM?

On 08/13/2015 10:30 AM, David Drysdale wrote:
> Hi folks,
> 
> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone
> else could reproduce it.
> 
> The problem occurs with a seccomp-bpf filter program that's set up to return
> an errno value -- an errno of 1 is always returned instead of what's in the
> filter, plus other oddities (selftest output below).
> 
> The problem seems to need a combination of circumstances to occur:
> 
>  - The seccomp-bpf userspace program needs to be 32-bit, running against a
>    64-bit kernel -- I'm testing with seccomp_bpf from
>    tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'.

Does it work correctly when built as 64-bit program?

> 
>  - The kernel needs to be running as a VM guest -- it occurs inside my
>    VMware Fusion host, but not if I run on bare metal.  Kees tells me he
>    cannot repro with a kvm guest though.
> 
> Bisecting indicates that the commit that induces the problem is
> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the
> 64-bit logic", included in all the v4.2-rc* candidates.
> 
> Apologies if I've just got something odd with my local setup, but the
> bisection was unequivocal enough that I thought it worth reporting...
> 
> Thanks,
> David
> 
> 
> seccomp_bpf failure outputs:
> 
> seccomp_bpf.c:533:global.ERRNO_valid:Expected 7 (7) ==
> (*__errno_location ()) (1)

Test source code:

TEST(ERRNO_valid)
{
        struct sock_filter filter[] = {
                BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
                        offsetof(struct seccomp_data, nr)),
                BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
                BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
                BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
        };
        struct sock_fprog prog = {
                .len = (unsigned short)ARRAY_SIZE(filter),
                .filter = filter,
        };
        long ret;
        pid_t parent = getppid();

        ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
        ASSERT_EQ(0, ret);

        ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
        ASSERT_EQ(0, ret);

        EXPECT_EQ(parent, syscall(__NR_getppid));
        EXPECT_EQ(-1, read(0, NULL, 0));
        EXPECT_EQ(E2BIG, errno);
}

The last EXPECT expects 7 (E2BIG) but sees 1.


I'm trying to see how that happens.
SECCOMP_RET_ERRNO action is processed as follows:


static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
{
...
        case SECCOMP_RET_ERRNO:
                /* Set low-order bits as an errno, capped at MAX_ERRNO. */
                if (data > MAX_ERRNO)
                        data = MAX_ERRNO;
                syscall_set_return_value(current, task_pt_regs(current),
                                         -data, 0);
                goto skip;
...
skip:
        audit_seccomp(this_syscall, 0, action);
        return SECCOMP_PHASE1_SKIP;  // "the syscall should not be invoked"
}

The above is called from:

unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
{
...
        if (work & _TIF_SECCOMP) {
...                ret = seccomp_phase1(&sd);
                if (ret == SECCOMP_PHASE1_SKIP) {
                        regs->orig_ax = -1;
                        ret = 0;
                }
		...
        }
        /* Do our best to finish without phase 2. */
        if (work == 0)
                return ret;  /* seccomp and/or nohz only (ret == 0 here) */
#ifdef CONFIG_AUDITSYSCALL
        if (work == _TIF_SYSCALL_AUDIT) {
                /*
                 * If there is no more work to be done except auditing,
                 * then audit in phase 1.  Phase 2 always audits, so, if
                 * we audit here, then we can't go on to phase 2.
                 */
                do_audit_syscall_entry(regs, arch);
                return 0;
        }
#endif
        return 1;  /* Something is enabled that we can't handle in phase 1 */
}
...
long syscall_trace_enter(struct pt_regs *regs)
{
        u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
        unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);

        if (phase1_result == 0)
                return regs->orig_ax;
        else
                return syscall_trace_enter_phase2(regs, arch, phase1_result);
}


End result should be:
pt_regs->ax = -E2BIG (via syscall_set_return_value())
pt_regs->orig_ax = -1 ("skip syscall")
and syscall_trace_enter_phase1() usually returns with 0,
meaning "re-execute syscall at once, no phase2 needed".

This, in turn, is called from .S files, and when it returns there,
execution loops back to syscall dispatch.

Because of orig_ax = -1, syscall dispatch should skip calling syscall.
So -E2BIG should survive and be returned...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/