lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240402173235.GK11187@unreal>
Date: Tue, 2 Apr 2024 20:32:35 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Neal Cardwell <ncardwell@...gle.com>,
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Subject: Re: ICMP_PARAMETERPROB and ICMP_TIME_EXCEEDED during connect

On Tue, Apr 02, 2024 at 10:17:16PM +0800, Jason Xing wrote:
> On Tue, Apr 2, 2024 at 9:32 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Apr 2, 2024 at 3:21 PM Leon Romanovsky <leon@...nel.org> wrote:
> > >
> > > On Wed, Mar 27, 2024 at 02:05:17PM +0100, Eric Dumazet wrote:
> > > > On Wed, Mar 27, 2024 at 12:55 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > >
> > > > > On Tue, 26 Mar 2024 23:03:26 +0100 Neal Cardwell wrote:
> > > > > > On Tue, Mar 26, 2024 at 9:34 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > > > >
> > > > > > > Hi!
> > > > > > >
> > > > > > > I got a report from a user surprised/displeased that ICMP_TIME_EXCEEDED
> > > > > > > breaks connect(), while TCP RFCs say it shouldn't. Even pointing a
> > > > > > > finger at Linux, RFC5461:
> > > > > > >
> > > > > > >    A number of TCP implementations have modified their reaction to all
> > > > > > >    ICMP soft errors and treat them as hard errors when they are received
> > > > > > >    for connections in the SYN-SENT or SYN-RECEIVED states.  For example,
> > > > > > >    this workaround has been implemented in the Linux kernel since
> > > > > > >    version 2.0.0 (released in 1996) [Linux].  However, it should be
> > > > > > >    noted that this change violates section 4.2.3.9 of [RFC1122], which
> > > > > > >    states that these ICMP error messages indicate soft error conditions
> > > > > > >    and that, therefore, TCP MUST NOT abort the corresponding connection.
> > > > > > >
> > > > > > > Is there any reason we continue with this behavior or is it just that
> > > > > > > nobody ever sent a patch?
> > > > > >
> > > > > > Back in November of 2023 Eric did merge a patch to bring the
> > > > > > processing in line with section 4.2.3.9 of [RFC1122]:
> > > > > >
> > > > > > 0a8de364ff7a tcp: no longer abort SYN_SENT when receiving some ICMP
> > > > > >
> > > > > > However, the fixed behavior did not meet some expectations of Vagrant
> > > > > > (see the netdev thread "Bug report connect to VM with Vagrant"), so
> > > > > > for now it got reverted:
> > > > > >
> > > > > > b59db45d7eba tcp: Revert no longer abort SYN_SENT when receiving some ICMP
> > > > > >
> > > > > > I think the hope was to root-cause the Vagrant issue, fix Vagrant's
> > > > > > assumptions, then resubmit Eric's commit. Eric mentioned on Jan 8,
> > > > > > 2024: "We will submit the patch again for 6.9, once we get to the root
> > > > > > cause." But I don't think anyone has had time to do that yet.
> > > > >
> > > > > Ah.
> > > > >
> > > > > Thank you!!
> > > >
> > > > For the record, Leon Romanovsky brought this issue directly to Linus
> > > > Torvalds, stating that I broke things.
> > >
> > > Just to make it clear, Linus was involved after we didn't progress for
> > > more than one month after initial starting "Bug report connect to VM with Vagrant",
> > > while approaching to merge window.
> > > https://lore.kernel.org/netdev/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com/
> > >
> > > Despite long standing netdev patch flow: apply fast -> revert fast, this
> > > patch was treated differently.
> >
> > I was waiting input from you. I think you only waited for "revert first"
> >
> > >
> > > >
> > > > It tooks weeks before Shachar did some debugging, but with no
> > > > conclusion I recall.
> > >
> > > Shachar didn't do debugging, she didn't write the bisected patch.
> > > She is verification engineer who was ready to run ANY tests and try
> > > ANY debug patch which you wanted.
> > >
> > > >
> > > > This kind of stuff makes me not very eager to work on this point.
> > > >
> > >
> > > OK, so it is not important at the end.
> >
> > I certainly do not want to waste time arguing with you on a valid
> > patch, which happens to break some buggy user space.
> >
> > Apparently some people think RFC are not important.
> 
> RFC is important.
> 
> Honestly, I read those threads over and over again. Since she provided
> some tcpdump logs which do not include ICMP, my question is still the
> same as Eric: why does this breakage have a relationship with this
> patch??? I get lost. It doesn't make sense really...

It was unfortunate outcome of moving the discussion to be private.
https://lore.kernel.org/netdev/CANn89i+e2TcvSU1EgrVZRUoEmZ5NDauXd3=kEkjpsGjmaypHOw@mail.gmail.com/

> 
> If someone is able to more easily reproduce this issue, I'm happy to help debug.
> 
> Thanks,
> Jason
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ