10
xonya
6y

The most crazy issue I've fixed was caused by a TCP behavior which I didn't know, called the "half-closed connection".
There was a third-party application installed on a production server which called a LDAP server for retrieving users information. During the day we had several users using the application and all worked fine. During the night, when the application was not accessed, something happened and the first call to the application in the morning was stuck for about 5 minutes before returning a response. I tried to reproduce the issue in a testing environment without success. Then I discovered that the application and the LDAP server were located on two different networks, with a firewall between them. And firewalls sometimes drop old connections. For this reason network applications usually implement a keep-alive mechanism. Well, the default LDAP Java libraries don't set the keep-alive on their connections. So, I found a library called "libdontdie", which force the keep-alive on the connections. I installed the library on the server, loaded it at the startup and the weird stuck behavior in the morning disappeared.

Comments
  • 3
    That must be really satisfying, knowing that it wasn't some dumb error but serious business!
  • 1
    That's a good one
Add Comment