[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <12c30542-27bf-4e63-b4dc-1c9193863062@shopee.com>
Date: Tue, 27 Aug 2024 16:46:03 +0800
From: Haifeng Xu <haifeng.xu@...pee.com>
To: Bernd Schubert <bernd.schubert@...tmail.fm>,
Miklos Szeredi <miklos@...redi.hu>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] fuse: do not generate interrupt requests for fatal signals
On 2024/8/22 01:38, Bernd Schubert wrote:
>
>
> On 6/15/24 14:19, Haifeng Xu wrote:
>>
>>
>> On 2024/6/14 18:01, Miklos Szeredi wrote:
>>> On Thu, 13 Jun 2024 at 12:44, Haifeng Xu <haifeng.xu@...pee.com> wrote:
>>>
>>>> So why the client doesn't get woken up?
>>>
>>> Need to find out what the server (lxcfs) is doing. Can you do a
>>> strace of lxcfs to see the communication on the fuse device?
>>
>> ok, I use strace to track one of the server threads. The output
>> can be seen in attachment.
>>
>> FD: 6 DIR /run/lxcfs/controllers/sys/fs/cgroup/
>> FD: 7 CHR /dev/fuse
>
> I had missed that there is an strace output.
> Would it be possible that you describe your issue with all
> details you have? There is a timeout patch now and it would probably solve your issue
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_20240813232241.2369855-2D1-2Djoannelkoong-40gmail.com_T_&d=DwICaQ&c=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw&r=3uoFsejk1jN2oga47MZfph01lLGODc93n4Zqe7b0NRk&m=5Vvh_Xul4vraWltHaaiJGAV6x-UDqHBp5rxONeLnrrClC8HrZVWapSodWQUOIYiT&s=PsUk37fgf2VgSXOxE3UlYxu5su7eWMPCoErBzmRj2u0&e=
>
>
> but Miklos is asking for a motivation. From point of view that fuse server could
> abort requests itself Miklos is absolutely right (the product I'm actually working
> on has that...). And one could even add a timeout mechanism to libfuse.
> But question to understand your main issue and if there would be a request
> timeout needed.
>
> In general, it would be helpful if you could provide everything you know, already
> with the initial patch.
> Later on you posted that you use LXCFS, but personally I don't know anything about
> it. So it would be good to describe where that actually runs and what you do to trigger
> the issue, etc. Details...
In our production environment, this issuse happened serval times, but we don't know why the lxcfs server
didn't send reply to client(received SIGKILL). So if the fuse server can't abort the request, the client
thread will hung forever.
>
> Thanks,
> Bernd
Powered by blists - more mailing lists