linux-kernel - Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <878t35s2sn.fsf@xmission.com>
Date:   Wed, 10 Oct 2018 20:14:00 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     kernel test robot <rong.a.chen@...el.com>,
        linux-kernel@...r.kernel.org, LKP <lkp@...org>
Subject: Re: [LKP] 4ce5f9c9e7 [ 1.323881] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:1031 kmalloc_slab

ebiederm@...ssion.com (Eric W. Biederman) writes:

> Sean Christopherson <sean.j.christopherson@...el.com> writes:
>
>> On Wed, Oct 10, 2018 at 05:06:52PM -0500, Eric W. Biederman wrote:
>>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>>> 
>>> > So I am flummoxed.  I am reading through the code and I don't see
>>> > anything that could trigger this, and when I ran the supplied reproducer
>>> > it did not reproduce for me.
>>> 
>>> Even more so.  With my tool chain the line that reports the failing
>>> address is impossible.
>>> 
>>> [   73.034423] RIP: 0010:copy_siginfo_from_user+0x4d/0xd0
>>> 
>>> With the supplied configureation my tool chain only has 0x30 bytes for
>>> all of copy_siginfo_from_user.  So I can't even begin to guess where
>>> in that function things are failing.
>>> 
>>> Any additional information that you can provide would be a real help
>>> in tracking down this strange failure.
>>
>> I don't have the exact toolchain, but I was able to get somewhat close
>> and may have found a smoking gun.  0x4d in my build is in the general
>> vicinity of "sig_sicodes[sig].limit" in known_siginfo_layout().  This
>> lines up with the register state from the log, e.g. RDI=0500104d8,
>> which is the mask generated by sig_specific_sicodes.  From what I can
>> tell, @sig is never bounds checked.  If the compiler generated an AND
>> instruction to compare against sig_specific_sicodes then that could
>> resolve true with any arbitrary value that happened to collide with
>> sig_specific_sicodes and result in an out-of-bounds access to
>> @sig_sicodes.  siginfo_layout() for example explicitly checks @sig
>> before indexing @sig_sicode, e.g. "sig < ARRAY_SIZE(sig_sicodes)".
>>
>> Maybe this?
>
> But sig is bounds checked.  Even better sig is checked to see if it
> is one of the values in the array.
>
>> From include/linux/signal.h
>
> #define SIG_SPECIFIC_SICODES_MASK (\
> 	rt_sigmask(SIGILL)    |  rt_sigmask(SIGFPE)    | \
> 	rt_sigmask(SIGSEGV)   |  rt_sigmask(SIGBUS)    | \
> 	rt_sigmask(SIGTRAP)   |  rt_sigmask(SIGCHLD)   | \
> 	rt_sigmask(SIGPOLL)   |  rt_sigmask(SIGSYS)    | \
> 	SIGEMT_MASK                                    )
>
> #define siginmask(sig, mask) \
> 	((sig) < SIGRTMIN && (rt_sigmask(sig) & (mask)))
>
> #define sig_specific_sicodes(sig)	siginmask(sig, SIG_SPECIFIC_SICODES_MASK)
>
>
>
> Hmm.  I wonder if something is passing in a negative signal number.
> There is not a bounds check for that.  A sufficiently large signal
> number might be the problem here.  Yes.  I can get an oops with
> a sufficiently large negative signal number.
>
> The code will later call valid_signal in check_permissions and
> that will cause the system call to fail, so the issue is just that
> the signal number is not being validated early enough.
>
> On the output path (copy_siginfo_to_user and copy_siginfo_to_user32) the
> signal number should be validated before it ever reaches userspace
> which is why I expect trinity never triggered anything.
>
> There is copy_siginfo_from_user32 and that does call siginfo_layout with
> a possibly negative signal number.  Which has the same potential issues.
>
> So I am going to go with the fix below.  That fixes things in my testing
> and by being unsigned should fix keep negative numbers from being a
> problem.

Sean thank you very much for putting me on the right path to track this
failing test down.

Eric