[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87msuv37u4.fsf@cloudflare.com>
Date: Thu, 30 Nov 2023 10:29:33 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: David Laight <David.Laight@...LAB.COM>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 'Jakub Kicinski'
<kuba@...nel.org>, "David S. Miller" <davem@...emloft.net>, Stephen
Hemminger <stephen@...workplumber.org>, Eric Dumazet
<edumazet@...gle.com>, 'David Ahern' <dsahern@...nel.org>, Paolo Abeni
<pabeni@...hat.com>
Subject: Re: [PATCH net-next] ipv4: Use READ/WRITE_ONCE() for IP
local_port_range
On Thu, Nov 30, 2023 at 09:04 AM GMT, David Laight wrote:
> From: Jakub Sitnicki
>> Sent: 29 November 2023 20:17
>>
>> Hey David,
>>
>> On Wed, Nov 29, 2023 at 07:26 PM GMT, David Laight wrote:
>> > Commit 227b60f5102cd added a seqlock to ensure that the low and high
>> > port numbers were always updated together.
>> > This is overkill because the two 16bit port numbers can be held in
>> > a u32 and read/written in a single instruction.
>> >
>> > More recently 91d0b78c5177f added support for finer per-socket limits.
>> > The user-supplied value is 'high << 16 | low' but they are held
>> > separately and the socket options protected by the socket lock.
>> >
>> > Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk'
>> > fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are
>> > always updated together.
>> >
>> > Change (the now trival) inet_get_local_port_range() to a static inline
>> > to optimise the calling code.
>> > (In particular avoiding returning integers by reference.)
>> >
>> > Signed-off-by: David Laight <david.laight@...lab.com>
>> > ---
>>
>> Regarding the per-socket changes - we don't expect contention on sock
>> lock between inet_stream_connect / __inet_bind, where we grab it and
>> eventually call inet_sk_get_local_port_range, and sockopt handlers, do
>> we?
>>
>> The motivation is not super clear for me for that of the changes.
>
> The locking in the getsockopt() code is actually quite horrid.
> Look at the conditionals for the rntl lock.
> It is conditionally acquired based on a function that sets a flag,
> but released based on the exit path from the switch statement.
>
> But there are only two options that need the rtnl lock and the socket
> lock, and two trivial ones (including this one) that need the socket
> lock.
> So the code can be simplified by moving the locking into the case branches.
> With only 2 such cases the overhead will be minimal (compared the to
> setsockopt() case where a lot of options need locking.
>
> This is all part of a big patchset I'm trying to write that converts
> all the setsockopt code to take an 'unsigned int optlen' parameter
> and return the length to pass back to the caller.
> So the copy_to_user() of the updated length is done by the syscall
> stub rather than inside every separate function (and sometimes in
> multiple places in a function).
>
> After all, if the copy fails EFAULT the application is broken.
> It really doesn't matter if any side effects have happened.
> If you get a fault reading from a pipe the data is lost.
OK, so you're trying to refactor the setsockopt handler. Now it makes
more sense what is driving this work. Thanks for the context.
[...]
Powered by blists - more mailing lists