lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51430925-8046-7066-84ed-2ff0db835347@linaro.org>
Date:   Mon, 9 Jan 2023 09:42:30 +0100
From:   Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>
To:     Trond Myklebust <trondmy@...merspace.com>
Cc:     Anna Schumaker <Anna.Schumaker@...app.com>,
        linux-nfs <linux-nfs@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>
Subject: Re: Regression: NULL pointer dereference after NFS_V4_2_READ_PLUS
 (commit 7fd461c47)

On 09/01/2023 09:14, Krzysztof Kozlowski wrote:
> On 08/01/2023 18:09, Trond Myklebust wrote:
>> Hi Krzysztof,
>>
>>> On Jan 8, 2023, at 08:25, Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org> wrote:
>>>
>>> [You don't often get email from krzysztof.kozlowski@...aro.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification]
>>>
>>> On 07/01/2023 16:44, Krzysztof Kozlowski wrote:
>>>> Hi,
>>>>
>>>> Bisect identified commit 7fd461c47c6c ("NFSv4.2: Change the default
>>>> KConfig value for READ_PLUS") as one leading to NULL pointer exception
>>>> when mounting NFS root on NFSv4 client:
>>>>
>>>> [   25.739003] systemd[1]: Set hostname to <odroidhc1>.
>>>> [   25.771714] systemd[1]: Failed to bump fs.file-max, ignoring: Invalid
>>>> argument
>>>> [   26.199478] 8<--- cut here ---
>>>> [   26.201366] Unable to handle kernel NULL pointer dereference at
>>>> virtual address 00000004
>>>> ...
>>>> [   26.555522]  mmiocpy from xdr_inline_decode+0xec/0x16c
>>>> [   26.560628]  xdr_inline_decode from nfs4_xdr_dec_read_plus+0x178/0x358
>>>> [   26.567130]  nfs4_xdr_dec_read_plus from call_decode+0x204/0x304
>>>>
>>>> Full OOPS attached. Full log available here:
>>>> https://krzk.eu/#/builders/21/builds/3901/steps/15/logs/serial0
>>>>
>>>> Disabling NFS_V4_2_READ_PLUS fixes the issue, so obviously the commit is
>>>> not the cause, but rather making it default caused the regression.
>>>>
>>>> I did not make the bisect yet which commit introduced it, if every
>>>> config includes NFS_V4_2_READ_PLUS.
>>>
>>> When every kernel is built with NFS_V4_2_READ_PLUS, bisect pointed to:
>>> d3b00a802c84 ("NFS: Replace the READ_PLUS decoding code")
>>>
>>> commit d3b00a802c845a6021148ce2e669b5a0b5729959
>>> Author: Anna Schumaker <Anna.Schumaker@...app.com>
>>> Date:   Thu Jul 21 14:21:34 2022 -0400
>>>
>>>    NFS: Replace the READ_PLUS decoding code
>>>
>>>    We now take a 2-step process that allows us to place data and hole
>>>    segments directly at their final position in the xdr_stream without
>>>    needing to do a bunch of redundant copies to expand holes. Due to the
>>>    variable lengths of each segment, the xdr metadata might cross page
>>>    boundaries which I account for by setting a small scratch buffer so
>>>    xdr_inline_decode() won't fail.
>>>
>>>    Signed-off-by: Anna Schumaker <Anna.Schumaker@...app.com>
>>>    Signed-off-by: Trond Myklebust <trond.myklebust@...merspace.com>
>>>
>>> With a trace:
>>> [   25.898462] systemd[1]: Set hostname to <odroidhc1>.
>>> [   25.933746] systemd[1]: Failed to bump fs.file-max, ignoring: Invalid
>>> argument
>>> [   25.986237] random: crng init done
>>> [   26.264564] 8<--- cut here ---
>>> [   26.266823] Unable to handle kernel NULL pointer dereference at
>>> virtual address 00000fe8
>>> ...
>>> [   26.597263]  nfs4_xdr_dec_read_plus from call_decode+0x204/0x304
>>> [   26.603222]  call_decode from __rpc_execute+0xd0/0x890
>>> [   26.608328]  __rpc_execute from rpc_async_schedule+0x1c/0x34
>>> [   26.613960]  rpc_async_schedule from process_one_work+0x294/0x790
>>> [   26.620030]  process_one_work from worker_thread+0x54/0x518
>>> [   26.625570]  worker_thread from kthread+0xf4/0x128
>>> [   26.630336]  kthread from ret_from_fork+0x14/0x2c
>>>
>>
>> Is this test being run against a 6.2-rc2 server, or is it an older server platform? We know there were bugs in older server implementations, so the question is whether this might be a problem with handling a bad/corrupt RPC reply from the server, or whether it is happening against code that is supposed to have been fixed?
> 
> I would say that buggy server should not cause NULL pointer dereferences
> on the client. Otherwise this is a perfect recipe for a rogue server in
> the network to start crashing clients and running exploits... Imagine a
> compromised machine (through some other means) in a local company
> network running now a server with NFS share "HR salary data" or "HR
> planned layoffs", where unsuspected people in that network access it
> leading to exploit of NFS code on their side...
> 
> Server is Raspberry Pi 3 kernel: 5.10.92-2-rpi-legacy-ARCH
> 
> Which points that it is not latest stable, so anyway I need to update.

I updated the kernel to 5.15.84-3-rpi-ARCH which is pretty close to
latest stable and I can reproduce the issue. Therefore:
1. It is visible on two stable (one new, one old) kernels on the server,
2. Buggy or rogue server should not cause NULL pointer on remote devices...

Best regards,
Krzysztof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ