Re: Losing Communications with the Mount


Dale Ghent
 

"Interference from other devices"? If you're thinking of interference like it's radio interference, switched ethernet doesn't work that way. VLANs won't save you from any particular layer <=2 issue anyhow. Since you say the pings stop regardless whether it has a DHCP or static address, it seems like a total IP stack wedge of some sort, or the OS is inexplicably downing its interfaces.

It wasn't clear if you tried connecting via USB/serial immediately after this network failure, without rebooting the CP4. If the network craps out but you can still subsequently connect via USB and control the mount, that might help narrow down where things are going wrong.

Wireshark does run on Windows. If you're able reproduce this issue with some reliability, you can run that on the box that's connecting to the CP4 via the ASCOM driver and have it run through your normal uses/reproduction paces while Wireshark is running and dumping packets to a pcap file. If the issue resurfaces, the pcap file can be inspected for the packet flow of the entire session.

I would set Wireshark to capture only ethernet frames that involve the CP4's ethernet MAC address. This is so that the pcap file doesn't grow to an unwieldy size and contains only the traffic source/destination that we're interested in. There's no guarantee that this would help elucidate what's going on - we could very well see a normal, healthy session transpire and then the CP4 suddenly stops responding. Then again, we might see something or some abnormal pattern or packet contents that points to something. I can help look at the pcap file if you'd like.

On May 6, 2021, at 18:30, alex <groups@ranous.com> wrote:

Ok, I opened up the GTOCP4 yesterday and the daughter board seems to be seated fine. I put it back and switched the ethernet cable to a brand new professionally made 15’ cable (ie, I didn’t put the connectors on), and changed the ports on the switch it was plugged into. I rebooted the switch as well in case it was in some weird state. I also hooked up the GTOCP4 to the eagle 2 directly via USB and configured it as the backup port. The primary connection was configured to be the ethernet connection using TCP and a 500ms timeout.

The switch and AP (and all my networking infrastructure) is Ubiquiti UniFi stuff (a prosumer/SOHO brand), so no different brand incompatibility in my network infrastructure. After the initial failures, I configured the router’s DHCP server to assign a fixed IP address assigned to the mount instead of a dynamic one. I’m pretty obsessive about managing my IP address space and am fairly certain there isn’t other devices colliding. The switch is fairly recent, a UniFi US-8-60W and is a fully managed smart switch. I suppose I could configure a separate VLAN for the observatory and put the mount and the eagle 2 on as the only hosts on to make sure there wasn’t interference from other devices on the network, though that seems like overkill.

Last night the ethernet connection failed again but I didn’t notice right away as this time as APCC successfully failed over to the USB connection, so that worked great. I had a perpetual ping repeating once a second the whole night, and showed response times typically between 2 and 9 milliseconds, though occasionally have some 30-50ms ones, and a few 1-2 second ones here and there. Right before the connection failed, the last few pings had 2-7ms ping times, then all the ping requests started timing out. These timeouts have been non-stop from the last 12 hours or so.

While communications was failing, The UniFi controller software didn’t show any abnormal packet loss on the port and I sshed directly into the switch and poked around the internal logs, and didn’t see anything fishy. I tried changing the port the ethernet was plugged into, and power cycling the switch to see if the problem was some bad state the switch was in. Neither woke up the TCP connection. The only thing that fixes it is power cycling the GTOCP4, so to me this seems to be a problem with the state the GTOCP4 is in. If there was some persistent ongoing problem with the network infrastructure, then a reboot of the mount wouldn’t fix the problem.

I’m still mystified as to what’s going on with that ethernet connection. I’m a software engineer and have been programming IP networks professionally over 30 years, and I’ve never seen behavior like this. I haven’t formally done IT/OPs stuff (I program back end web services nowadays), I’ve setup plenty of IP networking equipment over the years.

I could see a transient communications problem starting things off, it wouldn’t explain the inability to ping until the GTOCP4 is power cycled, which magically fixes everything. It’s appearing that some problem occurs, and the GTOCP4 goes into a mode where the network is down and only a power cycle resets it. I’ve never encountered any device with this behavior.

What’s the OS on this thing? It has a reasonably capable ARM processor. Is it running Linux or some embedded OS? Is it possible to SSH into this thing and poke around, check some logs, do something like an ifconfig, netstat, etc ? While the USB connection seems to be working well, I still want to track down what’s going on with the ethernet connection. It’s been my experience that wired ethernet connections are pretty rock solid assuming you avoid problems like long distances or interference with electrical wiring. My cable isn’t near anything like that.

I may try plugging my Mac directly into the same switch and see what Wireshark shows, if anything. It’s been a few years since I’ve messed with it.

Alex

Join main@ap-gto.groups.io to automatically receive all group messages.