We ran into an odd behavior on Linux in terms of Socket binds last week, and I wanted to share it here with the hopes of saving some time for someone else.
As some of you might know, Java’s Socket class has a reuseAddress property that is supposed to do the following according to JavaDocs:
When a TCP connection is closed the connection may remain in a timeout state for a period of time after the connection is closed (typically known as the TIME_WAIT state or 2MSL wait state). For applications using a well known socket address or port it may not be possible to bind a socket to the required SocketAddress if there is a connection in the timeout state involving the socket address or port. Enabling SO_REUSEADDR prior to binding the socket using bind(SocketAddress) allows the socket to be bound even though a previous connection is in a timeout state.
In more simpler terms, when reuseAddress is enabled on a socket on the server, the server socket can bind to that port even if there’s a client connection in a TIME_WAIT state on that port. This comes up when the client is connected to the server, and the server goes down for whatever reason. In that scenario, the connection gets to a TIME_WAIT state, because it’s not an orderly shutdown, and when you restart the server, the server cannot bind to that port until after a OS specific timeout. This means server restarts can take up a while, and hence the need of a reuseAddress property.
In LCDS, there’s also a “reuse-address-enabled” property to take advantage of this Socket property but we recently realized that it didn’t work as expected on Linux. So, even when reuseAddress property was set to true, server restarts would end up with bind exceptions on Linux and we didn’t know why until a coworker, Alex Glosband, pointed out the following from the Linux man page for socket:
Linux will only allow port re-use with the SO_REUSEADDR option when this option was set both in the previous program that performed a bind() to the port and in the program that wants to re-use the port. This differs from some implementations (e.g., FreeBSD) where only the later program needs to set the SO_REUSEADDR option. Typically this difference is invisible, since, for example, a server program is designed to always set this option.
In LCDS, when “reuse-address-enabled” property was true, LCDS tried to bind to the port with reuse-address=false first. When it failed, it would log that it couldn’t bind to the port, then, it would set reuse-address=true, and bind again. This works fine on Windows or Mac, but not on Linux, because Linux expects that all binds to a port have reuse-address=true for address reuse to work. In the end, the fix was trivial, we changed to logic to use reuse-address=true in the first bind try on Linux, and not worry about logging about the first bind failure.