Saturday, September 15, 2007

SSH/NFS/Networking Issues on the Blades

Thanks to our resident (well, long distance) detective, Cyrus, the mystery seems to have been solved.

It turns out that the issue on the blades was being caused by them getting dhcp from Clarkson. It’s been an issue all along, but we didn’t notice it because of the 5 minute lease time (it would only effect us from the time Clarkson’s lease renewed one a day-ish until the time that Righteous’ lease renewed a maximum of 5 minutes later). When we changed our lease to be longer than Clarkson’s, our access became the one that only applied for the period between renewals.

The solution? Adding a prepend statement to the dhclient.conf to make sure our dns (Righteous) is always listed first in the resolv.conf. It turns out that this issue will effect anything that uses dhcp to get an address from Clarkson (the reason we weren’t getting the error on the virtual machines was that they were either statically configured or, in the case of the new ones, networking hadn’t been restarted since modification of the configuration files).

So far, so good. If this wasn’t the only issue, we should know in the next 24 hours.

Edit (Saturday Noon) – Still no issues; looking good so far.

  1. Good to hear that this issue has been fixed.. are there any caveat’s for blade owners? do we need to restart networking or anything for example?

    Comment by Todd Deshane — September 15, 2007 @ 9:30 pm

  2. The only thing is that any machine with dhcp on both networks should have “prepend domain-name-servers″ added to their dhclient.conf and networking restarted if it seems to be experiencing the same symptoms (ssh connection errors).
    The issue will only occur when ssh is configured to only allow connections on the server room network, which would be the most secure way to handle things. It wont have any negative side effects on other machines (just make sure righteous is the primary nameserver).

    Comment by Zach — September 17, 2007 @ 1:02 am

