lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5428f192-1537-fa03-8e9c-4a8322772546@quicinc.com>
Date:   Wed, 16 Mar 2022 19:49:38 +0530
From:   Charan Teja Kalla <quic_charante@...cinc.com>
To:     Minchan Kim <minchan@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
CC:     <surenb@...gle.com>, <vbabka@...e.cz>, <rientjes@...gle.com>,
        <sfr@...b.auug.org.au>, <edgararriaga@...gle.com>,
        <nadav.amit@...il.com>, <mhocko@...e.com>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>,
        "# 5 . 10+" <stable@...r.kernel.org>
Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to
 process_madvise

Thanks Andrew and Minchan.

On 3/16/2022 7:13 AM, Minchan Kim wrote:
> On Tue, Mar 15, 2022 at 04:48:07PM -0700, Andrew Morton wrote:
>> On Tue, 15 Mar 2022 15:58:28 -0700 Minchan Kim <minchan@...nel.org> wrote:
>>
>>> On Fri, Mar 11, 2022 at 08:59:06PM +0530, Charan Teja Kalla wrote:
>>>> The process_madvise() system call is expected to skip holes in vma
>>>> passed through 'struct iovec' vector list. But do_madvise, which
>>>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
>>>> holes, despite the VMA is processed.
>>>> Thus process_madvise() should treat ENOMEM as expected and consider the
>>>> VMA passed to as processed and continue processing other vma's in the
>>>> vector list. Returning -ENOMEM to user, despite the VMA is processed,
>>>> will be unable to figure out where to start the next madvise.
>>>> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
>>>> Cc: <stable@...r.kernel.org> # 5.10+
>>>
>>> Hmm, not sure whether it's stable material since it changes semantic of
>>> API. It would be better to change the semantic from 5.19 with man page
>>> update to specify the change.
>>
>> It's a very desirable change and it makes the code match the manpage
>> and it's cc:stable.  I think we should just absorb any transitory
>> damage which this causes people.  I doubt if there will be much - if
>> anyone was affected by this they would have already told us that it's
>> broken?
> 
> 
> process_madvise fails to return exact processed bytes at several cases
> if it encounters the error, such as, -EINVAL, -EINTR, -ENOMEM in the
> middle of processing vmas. And now we are trying to make exception for
> change for only hole?
I think EINTR will never return in the middle of processing VMA's for
the behaviours supported by process_madvise().

It can return EINTR when:
-------------------------
1) PTRACE_MODE_READ is being checked in mm_access() where it is waiting
on task->signal->exec_update_lock. EINTR returned from here guarantees
that process_madvise() didn't event start processing.
https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1264 -->
https://elixir.bootlin.com/linux/v5.16.14/source/kernel/fork.c#L1318

2) The process_madvise() started processing VMA's but the required
behavior on a VMA needs mmap_write_lock_killable(), from where EINTR is
returned. The current behaviours supported by process_madvise(),
MADV_COLD, PAGEOUT, WILLNEED, just need read lock here.
https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1164
 **Thus I think no way for EINTR can be returned by process_madvise() in
the middle of processing.** . No?

for EINVAL:
-----------
The only case, I can think of,  where EINVAL can be returned in the
middle of processing is in examples like, given range contains VMA's
with a hole in between and one of the VMA contains the pages that fails
can_madv_lru_vma() condition.
So, it's a limitation that this returns -EINVAL though some bytes are
processed.
	OR
Since there exists still some invalid bytes processed it is valid to
return -EINVAL here and user has to check the address range sent?

for ENOMEM:
----------
Though complete range is processed still returns ENOMEM. IMO, This
shouldn't be treated as error which the patch is targeted for. Then
there is limitation case that you mentioned below where it returns
positive processes bytes even though it didn't process anything if it
couldn't find any vma for the first iteration in madvise_walk_vmas

I think the above limitations with EINVAL and ENOMEM are arising because
we are relying on do_madvise() functionality which madvise() call uses
to process a single VMA. When 'struct iovec' vector processing interface
is given in a system call, it is the expectation by the caller that this
system call should return the correct bytes processed to help the user
to take the correct decisions. Please correct me If i am wrong here.

So, should we add the new function say do_process_madvise(), which take
cares of above limitations? or any alternative suggestions here please?

> IMO, it's worth to note in man page.
> 

Or the current patch for just ENOMEM is sufficient here and we just have
to update the man page?

> In addition, this change returns positive processes bytes even though
> it didn't process anything if it couldn't find any vma for the first
> iteration in madvise_walk_vmas.

Thanks,
Charan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ