Recently I helped move a customer to a new datacenter. New for application support guys was that in the DC their app, and it’s inter-connecting systems, were separated by a real FW. I say “real” FW because the app guys will say … we have always be traversing firewalls, so why doesn’t my app work now? The difference in this case was that the DC FW did stateful inspection but in the old environment router ACLs were used.
The problem that arose was intermittent connectivity failures which started showing up in various app logs and user tickets. With some packet captures and FW log analysis it was clear that this was a case of long running idle (no packets) tcp sessions getting dropped by the FW after 1 hour of inactivity. The failure was intermittent from apps guys perspective as it depended on how much user activity there was. I should also add that when we started looking we found that not all connection drops by FW were detected, some apps just auto created a new connections and were using more and more resources in the process, check your FW log.
Thisscenario, connection drops by the FW is a common problem. There are a couple of possible fixes. One is to increase the connection timeout parameter on the FW from 1 hour (the default on most) to something like 4 hours or whatever your app and its activity require. This effectively keeps the tcp connection nailed up for 4 hours after the last packet seen by the FW. This isn’t a good solution since it means more memory usage on the FW as the table of known connection isn’t trimmed as much and also there is a security issue (DoS I think) with increasing the timeout. I keep the FW connection timeout adjustment as a last resort. A better fix is to enable keepalives on the connection.
A keepalive is an empty packet sent at a regular interval when there is no app activity. The empty packet resets the tcp session idle timer on the FW and session stays alive (not dropped) even though there is no actually application traffic. Now there are two ways of generating a keepalive packet, the app could do in its code or the app can ask the OS TCP code to send the keepalive packets on app’s behalf using the socket option keepalive – SO_KEEPALIVE. This is easy to see in the putty app as it supports both methods.
But wait there is more. When you configure your app to use SO_KEEPALIVE the frequency is in the OS config and this is typically 2 hours by default. So, after two hours the keepalive packet is sent but the FW has already dropped the connection. You need to change the interval to something like every 15 minutes. The 2 hour frequency originates in some ancient RFC back in a time when bandwidth was very limited and these keepalive packet were considered unnecessary traffic.
Bottom line is tell the app guy to enable keepalives. Also tell them to search the app vendor KB articles for keepalive and firewall before coming back with the – its not supported – answer. This is a common issue and most vendors have customers that experience it.