lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Jun 2024 23:33:34 +0800
From: "Ma, Yu" <yu.ma@...el.com>
To: Jan Kara <jack@...e.cz>
Cc: viro@...iv.linux.org.uk, brauner@...nel.org, mjguzik@...il.com,
 edumazet@...gle.com, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, pan.deng@...el.com, tianyou.li@...el.com,
 tim.c.chen@...el.com, tim.c.chen@...ux.intel.com, yu.ma@...el.com
Subject: Re: [PATCH v2 1/3] fs/file.c: add fast path in alloc_fd()


On 6/25/2024 8:53 PM, Jan Kara wrote:
> On Tue 25-06-24 13:52:57, Jan Kara wrote:
>> On Sat 22-06-24 11:49:02, Yu Ma wrote:
>>> There is available fd in the lower 64 bits of open_fds bitmap for most cases
>>> when we look for an available fd slot. Skip 2-levels searching via
>>> find_next_zero_bit() for this common fast path.
>>>
>>> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
>>> free slot is available there, as:
>>> (1) The fd allocation algorithm would always allocate fd from small to large.
>>> Lower bits in open_fds bitmap would be used much more frequently than higher
>>> bits.
>>> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
>>> it would never be shrunk. The search size increases but there are few open fds
>>> available here.
>>> (3) find_next_zero_bit() itself has a fast path inside to speed up searching
>>> when size<=64.
>>>
>>> Besides, "!start" is added to fast path condition to ensure the allocated fd is
>>> greater than start (i.e. >=0), given alloc_fd() is only called in two scenarios:
>>> (1) Allocating a new fd (the most common usage scenario) via
>>> get_unused_fd_flags() to find fd start from bit 0 in fdt (i.e. start==0).
>>> (2) Duplicating a fd (less common usage) via dup_fd() to find a fd start from
>>> old_fd's index in fdt, which is only called by syscall fcntl.
>>>
>>> With the fast path added in alloc_fd(), pts/blogbench-1.1.0 read is improved
>>> by 17% and write by 9% on Intel ICX 160 cores configuration with v6.10-rc4.
>>>
>>> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
>>> Signed-off-by: Yu Ma <yu.ma@...el.com>
>>> ---
>>>   fs/file.c | 35 +++++++++++++++++++++--------------
>>>   1 file changed, 21 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/fs/file.c b/fs/file.c
>>> index a3b72aa64f11..50e900a47107 100644
>>> --- a/fs/file.c
>>> +++ b/fs/file.c
>>> @@ -515,28 +515,35 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
>>>   	if (fd < files->next_fd)
>>>   		fd = files->next_fd;
>>>   
>>> -	if (fd < fdt->max_fds)
>>> +	error = -EMFILE;
>>> +	if (likely(fd < fdt->max_fds)) {
>>> +		if (~fdt->open_fds[0] && !start) {
>>> +			fd = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd);
>> So I don't think this is quite correct. If files->next_fd is set, we could
>> end up calling find_next_zero_bit() starting from quite high offset causing
>> a regression? Also because we don't expand in this case, we could cause access
>> beyond end of fdtable?
> OK, I've misunderstood the next_fd logic. next_fd is the lowest 0-bit in
> the open_fds bitmap so if next_fd is big, the ~fdt->open_fds[0] should
> be false. As such the above condition could be rewritten as:
>
> 		if (!start && files->next_fd < BITS_PER_LONG)
>
> to avoid loading the first bitmap long if we know it is full? Or we could
> maybe go as far as:
>
> 		if (!start && fd < BITS_PER_LONG && !test_bit(fd, fdt->open_fds))
> 			goto fastreturn;
>
> because AFAIU this should work in exactly the same cases as your code?
>
> 								Honza

Thanks Honza for the good concern and suggestions here, while both above 
conditions are not enough to ensure that there is available fd in the 
first 64 bits of open_fds. As next_fd just means there is no available 
fd before next_fd, just imagine that fd from 0 to 66 are already 
occupied, now fd=3 is returned back, then next_fd would be set as 3 per 
fd recycling logic (i.e. in __put_unused_fd()), next time when 
alloc_fd() being called, it would return fd=3 to the caller and set 
next_fd=4. Then next time when alloc_fd() being called again, 
next_fd==4, but actually it's already been occupied. So 
find_next_zero_bit() is needed to find the real 0 bit anyway. The 
conditions should either be like it is in patch or if (!start && 
!test_bit(0, fdt->full_fds_bits)), the latter should also have the 
bitmap loading cost, but another point is that a bit in full_fds_bits 
represents 64 bits in open_fds, no matter fd >64 or not, full_fds_bits 
should be loaded any way, maybe we can modify the condition to use 
full_fds_bits ?

>>> +			goto fastreturn;
>>> +		}
>>>   		fd = find_next_fd(fdt, fd);
>>> +	}
>>> +
>>> +	if (unlikely(fd >= fdt->max_fds)) {
>>> +		error = expand_files(files, fd);
>>> +		if (error < 0)
>>> +			goto out;
>>> +		/*
>>> +		 * If we needed to expand the fs array we
>>> +		 * might have blocked - try again.
>>> +		 */
>>> +		if (error)
>>> +			goto repeat;
>>> +	}
>>>   
>>> +fastreturn:
>>>   	/*
>>>   	 * N.B. For clone tasks sharing a files structure, this test
>>>   	 * will limit the total number of files that can be opened.
>>>   	 */
>>> -	error = -EMFILE;
>>> -	if (fd >= end)
>>> +	if (unlikely(fd >= end))
>>>   		goto out;
>>>   
>>> -	error = expand_files(files, fd);
>>> -	if (error < 0)
>>> -		goto out;
>>> -
>>> -	/*
>>> -	 * If we needed to expand the fs array we
>>> -	 * might have blocked - try again.
>>> -	 */
>>> -	if (error)
>>> -		goto repeat;
>>> -
>>>   	if (start <= files->next_fd)
>>>   		files->next_fd = fd + 1;
>>>   
>>> -- 
>>> 2.43.0
>>>
>> -- 
>> Jan Kara <jack@...e.com>
>> SUSE Labs, CR
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ