[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6953ec3b-6c48-954e-f3db-63450a5ab886@ya.ru>
Date: Thu, 26 Jan 2023 00:09:08 +0300
From: Kirill Tkhai <tkhai@...ru>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
kuniyu@...zon.com, gorcunov@...il.com
Subject: Re: [PATCH net-next] unix: Guarantee sk_state relevance in case of it
was assigned by a task on other cpu
On 25.01.2023 04:35, Jakub Kicinski wrote:
> On Mon, 23 Jan 2023 01:21:20 +0300 Kirill Tkhai wrote:
>> Since in this situation unix_accept() is called chronologically later, such
>> behavior is not obvious and it is wrong.
>
> Noob question, perhaps - how do we establish the ordering ?
> CPU1 knows that listen() has succeed without a barrier ?
1)There are a many combinations with third task involved:
[CPU0:Task0] [CPU1:Task1] [CPU2:Task2]
listen(sk)
kernel:
sk_diag_fill(sk)
rep->udiag_state = TCP_LISTEN
return_from_syscall
userspace:
mutex_lock()
shared_mem_var = rep->udiag_state
mutex_unlock()
userspace:
mutex_lock()
if (shared_mem_var == TCP_LISTEN)
accept(sk); /* -> fail, since sk_state is not visible */
mutex_unlock()
In this situation Task2 definitely knows Task0's listen() has succeed, but there is no a possibility
to guarantee its accept() won't fail. Despite there are appropriate barriers in mutex_lock() and mutex_unlock(),
there is no a possibility to add a barrier on CPU1 to make Task0's store visible on CPU2.
2)My understanding is chronologically later accept() mustn't miss sk_state.
Otherwise, kernel says that ordering between internal syscalls data
is userspace duty, which is wrong. Userspace knows nothing about internal
kernel data.
It's not prohibited to call accept() for a socket obtained via pidfd_getfd()
by a side application. Why doesn't the code guarantee, that accept() see
actual sk_state?
[CPU0:Task0] [CPU1:Task1]
listen(sk)
sk = pidfd_getfd()
accept(sk) /* -> fail */
3)Such possible situations in log file also look strange:
[CPU0:Task0] [CPU1:Task1]
listen()
get_time(&time1)
write_log("listening at ...", time1)
get_time(&time2)
sk = accept()
if (sk < 0)
write_log("accept failed at ...", time2)
In case of there is no kernel ordering, we may see in their logs something
like the below:
Task1.log
"listening at time1"
Task2.log
"accept failed at time2"
and time2 > time1.
Kirill
Powered by blists - more mailing lists