Tue, 17 Feb 2009


A simple close() on a TCP connection sometimes isn't quite that simple. A TCP close is usually done using a three-way handshake, very similar to the connection setup. In some cases the shutdown is done in four steps which is known as half-close: one side of the connection is done talking but the other is not. The simplified three step flow:

  A                  B
    ------ FIN ---->
    <-- FIN + ACK --

    ------ ACK ---->

Afterwards 'A' is required to keep the socket in TIME_WAIT as the final acknowledgement could be lost, which would lead 'B' to resend FIN+ACK which should be acknowledged again. This state must be maintained for 2 * MSL (twice the maximum time an IP packet can exist on the network). Most implementations use anMSL of 30 seconds up to 2 minutes, resulting in a TIME_WAIT state lasting two to four minutes.

Developers need to care about this due to another implementation detail:
Most network stacks don't allow a port in TIME_WAIT to be reused. This isn't a problem for clients as they tend to use random ephemeral ports. The server however might need to be restarted while a closed connection is still in TIME_WAIT

There are several workarounds:

The first solution is to have the client close the connection. Only the side which closes the connection (i.e. sends out the first FIN has to deal with the TIME_WAIT. A simple solution but usually not sufficient: even if the protocol doesn't require the server to close the connection it may still want to if a client misbehaves.

A second possible solution is to use SO_LINGER. A quote from man 3 socket:

		Lingers on a close() if data is present.  This  option  controls
		the  action  taken  when  unsent  messages queue on a socket and
		close() is performed.  If SO_LINGER is  set,  the  system  shall
		block  the process during close() until it can transmit the data
		or until the time expires. If SO_LINGER is  not  specified,  and
		close()  is  issued,  the  system handles the call in a way that
		allows the process to continue  as  quickly  as  possible.  This
		option   takes   a   linger   structure,   as   defined  in  the
		<sys/socket.h> header, to specify the state of  the  option  and
		linger interval.

The last sentence is the interesting bit. SO_LINGER allows you to reduce the TIME_WAIT interval. The downside is that any further packets (i.e. FIN+ACK) will trigger a RST response instead of the usual ACK.

The recommended solution is to set SO_REUSEADDR.

		Specifies  that  the rules used in validating addresses supplied
		to bind() should allow reuse of local addresses, if this is sup‐
		ported by the protocol.  This option takes an int value. This is
		a Boolean option.

This will simply allow the port to be used again. The only limitation is that any sockets (the combination of source and destination IP and port) which still are in TIME_WAIT can't be reused until they leave the state. This shouldn't pose any problems as clients will reconnect with different source ports.

Interestingly SO_REUSEADDR has caused a few bugs in different operating systems.

The first is quite old and affected a number of systems including Linux. If a process opened a port with SO_REUSEADDR and bound to INADDR_ANY another process could bind to the same port on a specific interface. This allowed the second process to steal traffic destined for the original process. This has been fixed a while ago.

The second is documented by Microsoft. Summarized: it allows a Windows program to intercept traffic meant for a different application by opening the same port with SO_REUSEADDR set. Note that the first application doesn't need to have it set for this to work. Also note the sentence 'No special privileges are required to use this option.' The recommended way to avoid this problem appears to be to use the SO_EXCLUSIVEADDRUSE option (available from Windows 2000 onwards). Unfortunately this requires the user to be a member of the 'Administrators' security group on Windows 2000 and XP.
In other words, Microsoft is counting on application developers to fix an operating system bug and even then it took them two releases to get the workaround usable.

posted at: 22:27 | path: / | [ 0 comments ]