Re: Losing Communications with the Mount
toggle quoted messageShow quoted text
I will forward this to our software engineer.
From: alex <groups@...>
Sent: Thu, May 6, 2021 5:30 pm
Subject: Re: [ap-gto] Losing Communications with the Mount
Ok, I opened up the GTOCP4 yesterday and the daughter board seems to be seated fine. I put it back and switched the ethernet cable to a brand new professionally made 15’ cable (ie, I didn’t put the connectors on), and changed the ports on the switch it was plugged into. I rebooted the switch as well in case it was in some weird state. I also hooked up the GTOCP4 to the eagle 2 directly via USB and configured it as the backup port. The primary connection was configured to be the ethernet connection using TCP and a 500ms timeout.
The switch and AP (and all my networking infrastructure) is Ubiquiti UniFi stuff (a prosumer/SOHO brand), so no different brand incompatibility in my network infrastructure. After the initial failures, I configured the router’s DHCP server to assign a fixed IP address assigned to the mount instead of a dynamic one. I’m pretty obsessive about managing my IP address space and am fairly certain there isn’t other devices colliding. The switch is fairly recent, a UniFi US-8-60W and is a fully managed smart switch. I suppose I could configure a separate VLAN for the observatory and put the mount and the eagle 2 on as the only hosts on to make sure there wasn’t interference from other devices on the network, though that seems like overkill.
Last night the ethernet connection failed again but I didn’t notice right away as this time as APCC successfully failed over to the USB connection, so that worked great. I had a perpetual ping repeating once a second the whole night, and showed response times typically between 2 and 9 milliseconds, though occasionally have some 30-50ms ones, and a few 1-2 second ones here and there. Right before the connection failed, the last few pings had 2-7ms ping times, then all the ping requests started timing out. These timeouts have been non-stop from the last 12 hours or so.
While communications was failing, The UniFi controller software didn’t show any abnormal packet loss on the port and I sshed directly into the switch and poked around the internal logs, and didn’t see anything fishy. I tried changing the port the ethernet was plugged into, and power cycling the switch to see if the problem was some bad state the switch was in. Neither woke up the TCP connection. The only thing that fixes it is power cycling the GTOCP4, so to me this seems to be a problem with the state the GTOCP4 is in. If there was some persistent ongoing problem with the network infrastructure, then a reboot of the mount wouldn’t fix the problem.
I’m still mystified as to what’s going on with that ethernet connection. I’m a software engineer and have been programming IP networks professionally over 30 years, and I’ve never seen behavior like this. I haven’t formally done IT/OPs stuff (I program back end web services nowadays), I’ve setup plenty of IP networking equipment over the years.
I could see a transient communications problem starting things off, it wouldn’t explain the inability to ping until the GTOCP4 is power cycled, which magically fixes everything. It’s appearing that some problem occurs, and the GTOCP4 goes into a mode where the network is down and only a power cycle resets it. I’ve never encountered any device with this behavior.
What’s the OS on this thing? It has a reasonably capable ARM processor. Is it running Linux or some embedded OS? Is it possible to SSH into this thing and poke around, check some logs, do something like an ifconfig, netstat, etc ? While the USB connection seems to be working well, I still want to track down what’s going on with the ethernet connection. It’s been my experience that wired ethernet connections are pretty rock solid assuming you avoid problems like long distances or interference with electrical wiring. My cable isn’t near anything like that.
I may try plugging my Mac directly into the same switch and see what Wireshark shows, if anything. It’s been a few years since I’ve messed with it.