lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YijaP7cC6Sclxc29@google.com>
Date:   Wed, 9 Mar 2022 08:47:59 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     Charan Teja Kalla <quic_charante@...cinc.com>
Cc:     akpm@...ux-foundation.org, yuehaibing@...wei.com,
        sfr@...b.auug.org.au, rientjes@...gle.com, edgararriaga@...gle.com,
        mhocko@...e.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: madvise: return correct bytes advised with
 process_madvise

On Wed, Mar 09, 2022 at 10:57:59AM +0530, Charan Teja Kalla wrote:
> The process_madvise() system call returns error even after processing
> some VMA's passed in the 'struct iovec' vector list which leaves the
> user confused to know where to restart the advise next. It is also
> against this syscall man page[1] documentation where it mentions that
> "return value may be less than the total number of requested bytes, if
> an error occurred after some iovec elements were already processed.".
> 
> Consider a user passed 10 VMA's in the 'struct iovec' vector list of
> which 9 are processed but one. Then it just returns the error caused on
> that failed VMA despite the first 9 VMA's processed, leaving the user
> confused about on which VMA it is failed. Returning the number of bytes
> processed here can help the user to know which VMA it is failed on and
> thus can retry/skip the advise on that VMA.
> 
> [1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
> 
> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API"
> Signed-off-by: Charan Teja Kalla <quic_charante@...cinc.com>
> ---
>  mm/madvise.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 38d0f51..d3b49b3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>  
>  	while (iov_iter_count(&iter)) {
>  		iovec = iov_iter_iovec(&iter);
> +		/*
> +		 * Even when [start, end) passed to do_madvise covers
> +		 * some unmapped addresses, it continues processing with
> +		 * returning ENOMEM at the end. Thus consider the range
> +		 * as processed when do_madvise() returns ENOMEM.
> +		 * This makes process_madvise() never returns ENOMEM.
> +		 */

Looks like that this patch has two things. first, returns processed
bytes instead of error in case of error. Second, keep working on
rest vmas on -ENOMEM due to unmapped hole.

First thing totally makes sense to me(that's exactly I wanted to
do but somehow missed) so it should go stable tree. However,
second stuff might be arguble so it would be great if you split
the patch.

>  		ret = do_madvise(mm, (unsigned long)iovec.iov_base,
>  					iovec.iov_len, behavior);
> -		if (ret < 0)
> +		if (ret < 0 && ret != -ENOMEM)
>  			break;
>  		iov_iter_advance(&iter, iovec.iov_len);
>  	}
>  
> -	if (ret == 0)
> -		ret = total_len - iov_iter_count(&iter);
> +	ret = (total_len - iov_iter_count(&iter)) ? : ret;
>  
>  release_mm:
>  	mmput(mm);
> -- 
> 2.7.4
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ