The individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
< Solaris Tech Day with engineering speakers in Frankfurt on December 3rd | Oracle Breakfast am 28.1.2014: "Service Management Facility" >
Thursday, December 12. 2013
Not everything in the world is brand-new and sometimes your brand new server has to communicate with a really old TCP/IP enabled device. As such devices are sometimes really limited in resources, sometimes they don't obey some of the best practices. This article will describe and issue arising out of this and how to circumvent it.
Description of the problemAll started with a customer calling: "Hey, in Solaris 10 a connection between an LDOM and such a TCP/IP device works like a charm. In a Solaris 11 LDOM it stopped working". At the first, i thought "WTF", but the solution is quite trivial.
The customer had the following tcpdump (which i have obfuscated for obvious reasons):
There are two important pieces of information in this tcpdump that are relevant for the explanation:
Why are those both pieces of information relevant? In Solaris 11 a number of security mechanisms are activated by default. One of it is checking the TCP window size. In order to protect the system against certain kinds of denial of service attacks, the networking stack looks into the setup of the connection for a certain pattern that is usually just used by such attacks. Almost all TCP/IP stacks really wanting to do some work and not just wanting eating away your resources will request a TCP receive window size significantly larger than the maximum segment size (like four times) because you get a more steady throughput with such a configuration. It's a good "suggestion" (as in: Do it this way!) to size the receive window at an integer factor of the maximum segment size. It's really a good "best practice" and all sane TCP/IP stacks will do it this way. However denial of service attacks are based on the point that the attacked target runs faster out of resources than the attacker. So you work with small receive windows, as you have to allocate the receive windows size for an TCP/IP connection.
So there is a simple check: You check for a situation, that is unlikely to see in normal operation. Like a client connecting with a receive window size smaller than the maximum segment size as this would have several negative impacts to performance. So: When the maximum segment size is larger than the window size in the third step of the handshake the system will just ignores the step. There is no connection from the perspective of the server, so it doesn't have to allocate resources. In addition to that, there is an timer set, that expires such an connection attempt earlier from the respective tables in the operating system.
On the other side: The client believes there is a connection (the server has answered with a SYN/ACK on the initial SYN) and doesn't know that the server doesn't consider this connection as completely initiated so it's normal that the client is still starting to send data, but will never receive an ACK on it.
The described mechanism helps a lot against denial of service attacks, however it's based on the assumption that the communication partner obeys the best practices and for 99,99% of all devices this is a correct assumption. But not all TCP/IP stacks are doing this ... some have to divert from this best practices in order to enable an device to communicate via TCP/IP despite having only minimal resource.
The example above is such a case. The client is a devise with minimal resources: As shown by the above tcpdump, the client specifies a maximum segment size of 1460, but only a receive window size of 560. Obviously the connection attempt fails the test describes before and such the connection will not be established. Solaris 10 didn't knew the described check. So it's quite obvious why the TCP connection worked with Solaris 10 but not with Solaris 11.
WorkaroundSo you are in a situation. Your device can't communicate. But especially with older hardware there is often no way to change the client and even when you can do it, often you don't want to do it, because you don't want to introduce a single small software change. Out of this reason, there is a way to get around the check on the server side. It's connected to a second check made by the TCP/IP-stack. With larger segment sizes, more TCP stacks will not set an larger TCP receive window size, this is based on the fact that every tcp connection relates to an allocation of memory for the received data in the size of the receive window. This requests would fail as well. To get around this, Solaris 11 checks for an additional value called
So the solution for this customer was quite easy. Right after executing the command
So far i saw the necessity to set this parameter only with really really old TCP/IP stacks (old in the sense of somewhat from last century) or TCP/IP stacks of embedded systems.
DemonstrationI wanted to check the solution after the customer told me "Everything is working again" and see the issue by my own eyes. However i hadn't the opportunity for looking on the real hardware. Thus i simulated the issue by writing a small script. However i needed some tighter control of the stuff the client was sending to the server. So i generated the TCP packets by hand. For a different project i'm playing around with Scapy. Scapy is a toolset in python to generate packets the raw way circumventing the TCP/IP-Stack. As i wrote on Facebook a few days: Working with it to create TCP communication feels like bitbanging an I2C bus with GPIO pins.
The script was called
The script takes a single command line parameter. It's the size of the window the script will use for the connection. It essentially implements a client for the ECHO service provided by the INETD of a solaris system.
Before you can use this script, there is an important prerequisite. The script totally circumvents the TCP/IP stack of the server. From the perspective of the clients OS this TCP/IP connection doesn't exist. When a SYNACK packet for a TCP/IP connection that doesn't exist arrived at the client, a security mechanism of the OS may kick in. The client sends a RST packet to terminate this connection to the server and nothing will work. You have to suppress this RST packets. I was using a notebook with ubuntu as a client, thus i used the following
So, when i'm running this script as
However, if you start it as
It get's a little bit more obvious is you comment out all lines in the script after the first
The connection stays in SYN_RECV as shown by
The reason is obvious when you put the state diagram of TCP in your mind. It switches to
However when you use the
When i start the script with
It pretty much looks like the connection made with a window of 8192. Changing the value of
Posted by Joerg Moellenkamp in English, Solaris at 21:13 | Comment (1)
Related entries by tags:
The LKSF book
The book with the consolidated Less known Solaris Tutorials is available for download here
Martin about End of c0t0d0s0.org
Mon, 01.05.2017 11:21
Thank you for many interesting blog posts. Good luck with al l new endeavours!
Hosam about End of c0t0d0s0.org
Mon, 01.05.2017 08:58
Joerg Moellenkamp about tar -x and NFS - or: The devil in the details
Fri, 28.04.2017 13:47
At least with ZFS this isn't c orrect. A rmdir for example do esn't trigger a zil_commit, as long as you don't speci [...]
Thu, 27.04.2017 22:31
You say: "The following dat a modifying procedures are syn chronous: WRITE (with stable f lag set to FILE_SYNC), C [...]