So I switched from a Linksys BEFSR41 (for firewalling and NAT) combined with a really old Apple Airport (for the laptops) to a D-Link DI-614+. It was on sale as an open box at Best Buy, and a single unit, so it was a good deal. Configuring the Airport from a Linux PC wasn't always the most intuitive task. I'm also in constant danger of running out of plugs. Plus, the D-Link does port address translation, which was a feature I've missed since abandoning my old diskless Linux firewall box. Long story short (Ha! Too Late!), I now have one little, cool-running, silver box that lets me do all my networking stuff, wired or otherwise.
When I made the switch, I noticed a performance improvement. That was a pleasant surprise. The configuration options on the D-Link are numerous, and web-based. More reason to be happy. I also noticed that SSH sessions were timing out after about 15 minutes of "inactivity". That made me a little upset, and very nearly rendered moot all the other good points. I tend to open a lot of SSH sessions to various remote hosts and let them sit around all day so they're close at hand. When I'm done for the day, I turn them all off. Having the D-Link turn them off for me (ungracefully as well; I'd often have to go and kill all the processess from my previous session when I logged back in) was less them optimal. It was really annoying, in fact.
One reason why I like working from home so much is that having everything always open like this for hours on end means I don't "lose my place" like I would if I worked all day and then went home and started back up. I can just leave everything going while I eat or whatever, and I'll be right where I left off at all times (that's also why I like using Opera and it's tabbed browsing). I have a command history for every window, each one is on the right host, etc. I could use regular old job control or screen, and in fact I did for a little bit. But one of my hosting providers doesn't have screen (didn't use to at least) and using jobs for everything can be cumbersome. Besides, ttys are cheap, and konsole has tabbed windows. I like tabbed windows.
After digging around, I found that the D-Links have a TCP timout of 15 minutes, and there's no apparent way to change this setting. The support page for the router like mine (sans wireless capability) has an intersting entry in the changelog for the last firmware upgrade: "Added timeout (7500 sec.) for SSH and Telnet ports". That's an interesting number, because of this:
[wee@hostname wee]$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
See, I had tried setting SSH to send TCP keepalive packets (I set it with: echo 'KeepAlive yes' >> ~/.ssh/config), but it wasn't working. I was still getting dropped. Because it does no good to have OpenSSH send keepalive packets every 7,200 seconds when the D-Link was timing out "dead" sessions every 900 seconds. You can find other solutions to this problem, but they're clunky at best. The real fix is to set the timeout on the router itself.
The added timeout fix exists only for the 604 model, not the 614+. Unfortunately, I can't use the firmware for the 604. Mine has wireless capability, and so needs the firmware for the 614+ (it's not like one had 4 ports, and mine had only a single port). I thought about it, though. The final solution for me is to set the tcp_keepalive_time to something below 900. Easily done:
[wee@hostname wee]$ sudo echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
I used ten minutes just because I wasn't totally sure about the D-Link's 15 minute rule. And while this works, the bummer is that I need to do this on every host from which I wish to use SSH (or anything else that might benefit from not timing out). The right way to solve this is to have a setting in the D-Link. I've written them about it, with no response as of yet. I don't mind setting this on the two or three internal hosts I use for remote logins, but something about it bugs me. While it's nice to have the timeout "problem" gone, there's just something about a sypmtomatic solution that annoys me on a deeper level.
Oh, one more note: If you want to set the tcp_keepalive_time, remember to add a line like the one above to /etc/rc.d/rc.local or some such, because that setting will get overwritten when you reboot.
I have the exact same problem with the DI-641+ and it's a shame they haven't fixed this. Thanks for the info.
Instead of messing with tcp keepalive, I added ClientAliveInterval and ClientAliveCountMax with suitable values to my sshd config. It increases traffic but it's done over the secure channel in ssh instead of tcp. There are several systems that I cannot add this to, so I'm SOL there except to hop through a system that I do control. aRG!
I sure wish they'd fix the hard timeout or add an option to tune it in the web admin pages.
Posted by po at July 24, 2003 1:56 PMerr, my DI-614+. Silly fingers.
Posted by po at July 24, 2003 2:00 PMOn FreeBSD (and probably OpenBSD and NetBSD), the workaround is to add net.inet.tcp.keepidle=600000 to /etc/sysctl.conf. The number is in milliseconds, so 600000 is 10 minutes. You can run 'sysctl net.inet.tcp.keepidle=600000' from the command line, as root, to make it take effect immediately. Works like a charm.
Posted by mjb at September 10, 2003 10:31 PMHmm, I tried changing my net.inet.tcp.keepidle time on Mac OS X (basically fBSD underneath), and still get timeouts.
I've tried this with ssh keepalive on and off, nothing seems to do it - am I missing something obvious? THanks for any help...
Posted by TG at January 16, 2004 4:11 PM
It could be worse. My Netgear MR814v2 at home marks NAT sessions idle after *less than 3 minutes* of inactivity and starts dropping inbound traffic.
Usually outbound traffic "fixes" it unless a packet from the server got dropped, causing the server to timeout the connection.
I.e. this:
$ sleep 180; echo "Wakeup!"
will never wake up unless I come back in 4 minutes and hit enter or something to kick the Netgear.
I have to set ssh keepalives to every 60 seconds if I want to keep any connection where the server might work quietly and then send output.
So if you're having trouble with this type of timeout, you may have to turn your keepalive interval *WAY* down into the realm of the absurd.