Re: Losing Communications with the Mount
Sébastien Doré
Ha ha! You forgot LAYER-10: the GOD layer (aka the man who keeps reminding everybody he knows all there is to know about everything)...
Just FYI.
De : main@ap-gto.groups.io <main@ap-gto.groups.io> de la part de Christopher Erickson <christopher.k.erickson@...>
Envoyé : 6 mai 2021 20:31 À : main@ap-gto.groups.io <main@ap-gto.groups.io> Objet : Re: [ap-gto] Losing Communications with the Mount (The following is a general comment and not intended as a specific reply to anyone on this thread.)
The classic OSI networking model is as follows,
Layer 1 - Physical (cables, connectors, voltages, signals, carriers, etc.)
Layer 2 - Data (Ethernet Frames, Tokens, etc.)
Layer 3 - Network (IP Packets, others)
Layers 4/5 - Transport/Session (TCP, UDP, XTP, etc.)
Layer 6 - Presentation (decryption/encryption, etc.)
Layer 7 - Application (organization & delivery to/from the OS)
For troubleshooting purposes over the years, I have added three more important Layers to the OSI model as well.
Layer 0 - The funding. NOTHING is going to ever work right at any other Layer if this Layer isn't stable and solid.
Layer 8 - THE USER. "Yea, it was definitely a Layer-8 issue! Excessive operator head-space!"
Layer 9 - THE POLITICS. "Ugh. This is a Layer-9 problem and will NEVER get fixed!"
Seriously though, Layer 7 reports the the OS (computer operating system) and the user app interacts with the OS to get the networking services. Layer 7 is NOT the user app. Just FYI.
I hope this helps (and entertains!)
|
|
Re: Losing Communications with the Mount
Sébastien Doré
Certain kinds of broadcast storms are known to disrupt LAN communications with certain ARM processors via comm buffer overflows, depending on the status of a multitude of various internal variables.Broadcast storms? Come on! If it were a broadcast storm, no other device on his network would be able to communicate. Switches CPUs would saturate at 100% usage over time to the point they wouldn’t be able to forward any traffic at all. Also, there are 2 main causes to broadcast storms, loops in the Mac layer (user error) and DoS attack attempts. Loops can’t be created by a device like a CP4 added to a network unless the CP4 acted as a gateway by forwarding data it received on its Ethernet port directly to its WiFi interface (or the other way around) somehow, which I highly doubt is the case (that would be a risky design for such a device). Then, if you suspect any hacking attempts by denial of service, the last thing you want to do is to remove any firewall you have in your network. And next, you’ll want to segment it using Vlans... And, I’m sure Alex already knows all about this given is former experience in IT.
|
|
Re: Losing Communications with the Mount
Sébastien Doré
Strictly
speaking, interference is totally possible over an Ethernet CATx cable. For all intent and purposes, such a cable acts exactly as radio transmission line operating at hundreds and even thousands of MHz. That said, it is true that they are used in a very effective
self-protecting way (using
differential signaling over twisted-pairs which highly reduce any common mode noise) to the extent that from a receiver standpoint (your computer), it seems like there are only discrete bits coming off those wires. Reality is different. Always analog in
nature (unless you are talking about quantum physics qubits-based computer which really only have two possible states, but I'll leave that fascinating subject to someone more knowledgeable than me).
What
is more common in standard computer network cable signal integrity is near-end-cross-talk (that could be defined as a type of self-interference) which occurs when the twisted-pairs are untwisted for too long of a length relative to the signal wavelength before
they enter the crimped RJ-45 connector. At this location, cable immunity is weakened and more prone to picking up RF noise, including from the nearest dataline pairs in the cable. That's why well crimped connectors are important.
Sébastien
|
|
Re: AP1100/CP4 and NINA
Dale Ghent
NINA doesn't necessarily know what is CW up and what is CW down, and it certainly doesn't think in those terms. There is no programmatic way an ASCOM driver can transmit this state to an app.
toggle quoted messageShow quoted text
I'm trying to picture how you're managing to do this conditional flip given what I recall about SGPro's flip options, which aren't fancy at all. Is this a new thing you want to try or is this something you already have working in SGPro? Either way, you can maybe do this on a per target basis in NINA 1.11's advanced sequencer. For the targets that you start in a CW-up orientation, you can simply omit the Meridian Flip trigger and the mount will sail through the meridian without flipping, simply because the trigger command that manages that is missing. When I referred to the APCC-enforced limits (be they horizon or meridian), NINA does not know of these limits as, again, there is no programmatic way for an ASCOM driver to describe these to an app such as NINA. If you have the limit set to park or stop tracking, NINA will not have any idea why this is happening and you'll probably end up with a hung sequence. However you can probably recover from this predicament on a multi-target sequence by bounding the halted sequence with a time or a target altitude so that it eventually moves on to the next target which could have an unpark or set tracking rate command to get the mount moving again.
On May 6, 2021, at 13:47, Michael 'Mikey' Mangieri <mjmangieri@xcalrockets.net> wrote:
|
|
Re: Strange guiding errors with AP1200
Rolando,
Thanks for the prompt response. The 6.4 minutes is close to the 379 seconds I saw. I was taking 10-second guiding exposures, so the problem could have occurred any time during that interval. I have the lube kit and instructions, and will follow your advice. I'll also take a look at the YouTube videos on lubricating and adjusting the gears. Allen
|
|
Re: Losing Communications with the Mount
When I commented about possible comm buffer overflows, I was speaking from the perspective of an embedded (controller, microcontroller) programmer, not as a LAN/WAN administrator or IP wonk. Non-programmers have no access to or awareness of comm buffers and such and don't need to worry about them. Embedded programmers do. Many types of DoS (Denial of Service) attacks are specifically crafted to exploit various hardware and software buffer limitations and overflows scenarios. Observatory Engineer Summit Kinetics Waikoloa, Hawaii
On Thu, May 6, 2021 at 3:18 PM Dale Ghent <daleg@...> wrote:
|
|
Re: Losing Communications with the Mount
Dale Ghent
On May 6, 2021, at 20:16, alex <groups@ranous.com> wrote:Well, to get the Mac to even see the packets between the Eagle and CP4, you'll first need to configure the switch to mirror the traffic on the Eagle's and CP4's port on the switch to the port that the Mac is plugged into. I'm sure the unifi controller will let you do this (I use Unifi gear as well, but only for wireless APs, so I don't know what the control pane for their switches looks like.) The Eagle just runs a Windows OS, you can at least try running Wireshark on it to see how it goes before having to go through the rigamarole of setting up port mirroring for a third host. But whichever way you get it set up, you'll want to get the MAC addresses of the Eagle and the CP4. You can quickly get the CP4's MAC address by looking at the arp cache on the Ealge. You can do this by popping open a Powershell window and running: arp -a This will print out the Eagle's view of the network's IP-MAC Address mapping. If you don't see your Eagle in there, ping it first then run the arp command again, and it should then show up. The Eagle's own MAC address won't show up in this output. To get that, you can run the following from a Powershell prompt: IpConfig /all Just find the ethernet interface you're interested in and note its MAC address, which is specified in the Physical Address field. Once you have the CP4 and Eagle's MAC addresses, you can run Wireshark and use the following display filter: eth.addr==aa-aa-aa-aa-aa-aa and eth.addr==bb-bb-bb-bb-bb-bb aa-aa-aa... and bb-bb-bb... are the MAC addresses of your Eagle and CP4. It's just a boolean filter, so which order they are specified in doesn't matter. Do note that is a double equal sign, per the norm for boolean notation. At that point, you start going about your normal work and wait. When/If the freeze-up happens, you can save the capture by Ctrl-A'ing all the packets in the display area and going to File > Export Specified Packets. Save it as the default pcapng format and then someone can do something with that info. /dale
|
|
Re: Spikes in Dec
Sébastien Doré
Thank you (again), Brian.
Will forward my PHD2 logs of the entire last session privately as well as what Howard has pointed me out.
Sébastien
De : main@ap-gto.groups.io <main@ap-gto.groups.io> de la part de Brian Valente <bvalente@...>
Envoyé : 6 mai 2021 17:58 À : main@ap-gto.groups.io <main@ap-gto.groups.io> Objet : Re: [ap-gto] Spikes in Dec You want to enable DEC compensation here
This adjusts the guiding based on your sky position, which doesn't change (unless you are going to say you have a space telescope)
>>>I realize this is more a question for the PHD forum, (sorry about that) but I thought it could be of general interest here as well if you have the answer
at hand.
that's fine - happy to help. Maybe you can shoot me some of your logs direct via email and I can review them.
I'm not sure I saw what Howard mentioned, but guiding with high quality encoders is a relatively new thing, so anything is possible. lowpass2 was introduced
only a few years ago iirc
Brian
|
|
Re: Losing Communications with the Mount
Dale Ghent
On May 6, 2021, at 19:50, Christopher Erickson <christopher.k.erickson@gmail.com> wrote:What on earth are you even talking about. Comm buffer overflow? Broadcast storms due to certain ARM processors? This makes -zero- sense. - Someone who actually works on OS IP stack code
|
|
Re: ADATRI on Losmandy HD Tripod
KHursh
No, you would need the flat plate that goes on the tripod: LT2APM. You bolt the ADATRI to that
|
|
Re: Losing Communications with the Mount
(The following is a general comment and not intended as a specific reply to anyone on this thread.) The classic OSI networking model is as follows, Layer 1 - Physical (cables, connectors, voltages, signals, carriers, etc.) Layer 2 - Data (Ethernet Frames, Tokens, etc.) Layer 3 - Network (IP Packets, others) Layers 4/5 - Transport/Session (TCP, UDP, XTP, etc.) Layer 6 - Presentation (decryption/encryption, etc.) Layer 7 - Application (organization & delivery to/from the OS) For troubleshooting purposes over the years, I have added three more important Layers to the OSI model as well. Layer 0 - The funding. NOTHING is going to ever work right at any other Layer if this Layer isn't stable and solid. Layer 8 - THE USER. "Yea, it was definitely a Layer-8 issue! Excessive operator head-space!" Layer 9 - THE POLITICS. "Ugh. This is a Layer-9 problem and will NEVER get fixed!" Seriously though, Layer 7 reports the the OS (computer operating system) and the user app interacts with the OS to get the networking services. Layer 7 is NOT the user app. Just FYI. I hope this helps (and entertains!)
On Thu, May 6, 2021, 12:35 PM Seb@stro <sebastiendore1@...> wrote:
|
|
Re: Losing Communications with the Mount
Alex
On Thu, May 6, 2021 at 04:33 PM, Dale Ghent wrote:
"Interference from other devices"? If you're thinking of interference like it's radio interference, switched ethernet doesn't work that way. VLANs won't save you from any particular layer <=2 issue anyhow. Since you say the pings stop regardless whether it has a DHCP or static address, it seems like a total IP stack wedge of some sort, or the OS is inexplicably downing its interfaces.By interference, I was talking about some rogue device on my network trying to grab the same IP address. Putting the observatory on its own private network would eliminate that possibility, though I still think that's overkill. My gut is telling me that something is going wrong with the mount's IP stack. It wasn't clear if you tried connecting via USB/serial immediately after this network failure, without rebooting the CP4. If the network craps out but you can still subsequently connect via USB and control the mount, that might help narrow down where things are going wrong.I configured APCC to use the USB connection as backup if the primary connection failed. It failed over to the USB network w/out me even noticing. It was the next day I noticed it had switched over, and my repeated pings had been failing non-stop for something like 12 hours when I checked the next day. So clearly the CP4 wasn't completely hosed, just it's IP stack. Wireshark does run on Windows. If you're able reproduce this issue with some reliability, you can run that on the box that's connecting to the CP4 via the ASCOM driver and have it run through your normal uses/reproduction paces while Wireshark is running and dumping packets to a pcap file. If the issue resurfaces, the pcap file can be inspected for the packet flow of the entire session.My Eagle 2 computer is a NUC with some specialized hardware to distribute and control power connections, and is somewhat "fragile" to configuration changes. It took me some time to work out some driver issues to get everything to work, so I'm loath to touch it at this point. Besides, I already have wireshark configured on my Mac. Now I just need to remember how to use it. Alex
|
|
Re: Losing Communications with the Mount
Roland Christen
It makes a difference which version of software is in that mount. The version number was not mentioned.
Rolando
-----Original Message-----
From: Christopher Erickson <christopher.k.erickson@...> To: main@ap-gto.groups.io Sent: Thu, May 6, 2021 6:50 pm Subject: Re: [ap-gto] Losing Communications with the Mount Sounds like you really have it narrowed down to the CP4 firmware running in your controller.
Work close with AP and I am confident they will get it resolved for you.
Certain kinds of broadcast storms are known to disrupt LAN communications with certain ARM processors via comm buffer overflows, depending on the status of a multitude of various internal variables.
On Thu, May 6, 2021, 12:33 PM alex <groups@...> wrote:
Oh, I forgot mention that I also tried using the mount's WiFi connection as well, and it also had similar issues, so the ethernet connection doesn't seem to be at fault. -- Roland Christen Astro-Physics
|
|
Re: Losing Communications with the Mount
Sounds like you really have it narrowed down to the CP4 firmware running in your controller. Work close with AP and I am confident they will get it resolved for you. Certain kinds of broadcast storms are known to disrupt LAN communications with certain ARM processors via comm buffer overflows, depending on the status of a multitude of various internal variables.
On Thu, May 6, 2021, 12:33 PM alex <groups@...> wrote: Oh, I forgot mention that I also tried using the mount's WiFi connection as well, and it also had similar issues, so the ethernet connection doesn't seem to be at fault.
|
|
Re: Losing Communications with the Mount
Donald Gaines
Thanks Christopher,
toggle quoted messageShow quoted text
I’m going to put a serial port card into my computer and use the serial cable that comes with the mount. Thanks to all of you for all your help. There are some seriously smart people on this forum. Thanks again, Don Gaines
On Thursday, May 6, 2021, Christopher Erickson <christopher.k.erickson@...> wrote:
|
|
Re: Losing Communications with the Mount
Dale Ghent
"Interference from other devices"? If you're thinking of interference like it's radio interference, switched ethernet doesn't work that way. VLANs won't save you from any particular layer <=2 issue anyhow. Since you say the pings stop regardless whether it has a DHCP or static address, it seems like a total IP stack wedge of some sort, or the OS is inexplicably downing its interfaces.
toggle quoted messageShow quoted text
It wasn't clear if you tried connecting via USB/serial immediately after this network failure, without rebooting the CP4. If the network craps out but you can still subsequently connect via USB and control the mount, that might help narrow down where things are going wrong. Wireshark does run on Windows. If you're able reproduce this issue with some reliability, you can run that on the box that's connecting to the CP4 via the ASCOM driver and have it run through your normal uses/reproduction paces while Wireshark is running and dumping packets to a pcap file. If the issue resurfaces, the pcap file can be inspected for the packet flow of the entire session. I would set Wireshark to capture only ethernet frames that involve the CP4's ethernet MAC address. This is so that the pcap file doesn't grow to an unwieldy size and contains only the traffic source/destination that we're interested in. There's no guarantee that this would help elucidate what's going on - we could very well see a normal, healthy session transpire and then the CP4 suddenly stops responding. Then again, we might see something or some abnormal pattern or packet contents that points to something. I can help look at the pcap file if you'd like.
On May 6, 2021, at 18:30, alex <groups@ranous.com> wrote:
|
|
Re: Losing Communications with the Mount
Roland Christen
I will forward this to our software engineer.
Roland
-----Original Message-----
From: alex <groups@...> To: main@ap-gto.groups.io Sent: Thu, May 6, 2021 5:30 pm Subject: Re: [ap-gto] Losing Communications with the Mount Ok, I opened up the GTOCP4 yesterday and the daughter board seems to be seated fine. I put it back and switched the ethernet cable to a brand new professionally made 15’ cable (ie, I didn’t put the connectors on), and changed the ports on the switch it was plugged into. I rebooted the switch as well in case it was in some weird state. I also hooked up the GTOCP4 to the eagle 2 directly via USB and configured it as the backup port. The primary connection was configured to be the ethernet connection using TCP and a 500ms timeout.
The switch and AP (and all my networking infrastructure) is Ubiquiti UniFi stuff (a prosumer/SOHO brand), so no different brand incompatibility in my network infrastructure. After the initial failures, I configured the router’s DHCP server to assign a fixed IP address assigned to the mount instead of a dynamic one. I’m pretty obsessive about managing my IP address space and am fairly certain there isn’t other devices colliding. The switch is fairly recent, a UniFi US-8-60W and is a fully managed smart switch. I suppose I could configure a separate VLAN for the observatory and put the mount and the eagle 2 on as the only hosts on to make sure there wasn’t interference from other devices on the network, though that seems like overkill.
Last night the ethernet connection failed again but I didn’t notice right away as this time as APCC successfully failed over to the USB connection, so that worked great. I had a perpetual ping repeating once a second the whole night, and showed response times typically between 2 and 9 milliseconds, though occasionally have some 30-50ms ones, and a few 1-2 second ones here and there. Right before the connection failed, the last few pings had 2-7ms ping times, then all the ping requests started timing out. These timeouts have been non-stop from the last 12 hours or so.
While communications was failing, The UniFi controller software didn’t show any abnormal packet loss on the port and I sshed directly into the switch and poked around the internal logs, and didn’t see anything fishy. I tried changing the port the ethernet was plugged into, and power cycling the switch to see if the problem was some bad state the switch was in. Neither woke up the TCP connection. The only thing that fixes it is power cycling the GTOCP4, so to me this seems to be a problem with the state the GTOCP4 is in. If there was some persistent ongoing problem with the network infrastructure, then a reboot of the mount wouldn’t fix the problem.
I’m still mystified as to what’s going on with that ethernet connection. I’m a software engineer and have been programming IP networks professionally over 30 years, and I’ve never seen behavior like this. I haven’t formally done IT/OPs stuff (I program back end web services nowadays), I’ve setup plenty of IP networking equipment over the years.
I could see a transient communications problem starting things off, it wouldn’t explain the inability to ping until the GTOCP4 is power cycled, which magically fixes everything. It’s appearing that some problem occurs, and the GTOCP4 goes into a mode where the network is down and only a power cycle resets it. I’ve never encountered any device with this behavior.
What’s the OS on this thing? It has a reasonably capable ARM processor. Is it running Linux or some embedded OS? Is it possible to SSH into this thing and poke around, check some logs, do something like an ifconfig, netstat, etc ? While the USB connection seems to be working well, I still want to track down what’s going on with the ethernet connection. It’s been my experience that wired ethernet connections are pretty rock solid assuming you avoid problems like long distances or interference with electrical wiring. My cable isn’t near anything like that.
I may try plugging my Mac directly into the same switch and see what Wireshark shows, if anything. It’s been a few years since I’ve messed with it.
Alex
-- Roland Christen Astro-Physics
|
|
Re: Losing Communications with the Mount
Sébastien Doré
Christopher,
As you surely know, much of all Internet communication is TCP-based
nowadays and work arguably 365 days/year and almost 24hours a day (like 2 and 3 nines networks). Not sure what 30 years of experience with TCP has to do with
that or any communication reliability figure for that matter.
PingPlotter does indeed look like a valuable tool (you even convinced
me to download and try it).
In
regards to the old CLI vs GUI debate, I don't
think it's the right place to argue on this forum, but I will only add that network command line tools are way less prone to bugs than any GUI tools. Moreover, you don't even have to download, install or update anything, they come built-in with OSes, even
in 2021. There must be a good reason for that... My guess is that if you happen to be cutoff from the internet because of a communication problem, you have to rely on something to get back on your feet. Might be a good thing to learn the ropes of using simple
command line tools, which anyone here - as you also stated - has the ability to grab. I'd recommend anyone to learn basic command line tools well before trying out any graphical wireshark/pcap/tcpdump analyzer. Well,
it seems like you got me started after all.
😉 Sorry about that everybody. At least,
now you know where I stand...
As
for your other comments, I'd say that I agree it's good practice to look at the plumbering stuff (lower
layers) first,
but really solving (like definitively, not "just make it work" for some time until the next software update) most common communication problems beyond those obvious ones also requires looking at least at the TCP/IP layers these days, especially if you don't
have an infinite budget to replace everything out of trial and errors and/or infinite amount of time to spend on it (I'm sure most of us would rather spend this precious time under clear skies with their latest AP equipment).
That
is also exactly what PingPlotter does BTW: it compiles and presents statistics from - you stated it - ping and tracert requests
which uses a L3 protocol (ICMP) running in the background.
Clear skies,
Sébastien
De : main@ap-gto.groups.io <main@ap-gto.groups.io> de la part de Christopher Erickson <christopher.k.erickson@...>
Envoyé : 6 mai 2021 16:24 À : main@ap-gto.groups.io <main@ap-gto.groups.io> Objet : Re: [ap-gto] Losing Communications with the Mount My experience with TCP comes from 30 years of telecommunications and robotics engineering. My primary concerns are much more with Layer-1 of the OSI model (cables, connectors) and Layer-2 (Ethernet frames), not Layer-3 (IP packets) or Layer-4 (TCP/UDP.)
OSI Layers 1 & 2 are VERY opaque to the average user so consequently they are usually ignored when troubleshooting. I think this is typically a mistake. Sort of like looking for your car keys under a nice streetlight instead of next to your car, where you dropped them. PingPlotter is a very graphical, visual troubleshooting tool that has a free version. It is PROFOUNDLY better and more intuitive than using the DOS prompt command line Ping command. PingPlotter also incorporates a very nice, visual, graphical,
dynamic traceroute. Download it and try it out. You won't go back to the nasty DOS prompt command line ever again, unless forced to on a strange machine.
I agree Wireshark is a complicated tool. I already stated that. However I believe that the typical AP mount owner is more qualified than the average person to gain benefit from it, given some time. I would add that starting with PingPlotter
instead of Wireshark would be good.
It could be bad to have a firewall or router in between the CP4/5 and the observatory PC. If there is, it might have LAN packet filtering capabilities, which I would disable, if I could.
_._,_._,_
|
|
Re: Losing Communications with the Mount
Alex
Oh, I forgot mention that I also tried using the mount's WiFi connection as well, and it also had similar issues, so the ethernet connection doesn't seem to be at fault.
Alex
|
|
Re: Strange guiding errors with AP1200
Roland Christen
The RA worm turns once every 6.4 minutes and so if there was a piece of dirt embedded on the worm teeth, there would be a jump in that time frame. If it repeats forever, then I would suspect a damaged worm. If it happened just once in the same place in the sky I would suspect a damaged tooth on the main worm wheel.
There could also be a piece of dirt embedded in the final spur gear that's attached to the end of the worm. That can be easily cleaned, but that would show up for every 6.4 minute cycle and would not go away.
The fact that it showed up in Dec is probably due to a bad calibration run. Dec axis probably did not move, but the guide software is interpreting part of the RA error as a Dec error.
I would remove the RA gearbox, clean all the grease off both the worm, the main worm wheel and all the transfer gears inside the gearbox. Then re-grease every thing.
Rolando
-----Original Message-----
From: Allen Gilchrist via groups.io <gilchrist.allen@...> To: main@ap-gto.groups.io Sent: Thu, May 6, 2021 4:44 pm Subject: [ap-gto] Strange guiding errors with AP1200 Has anyone else seen anything like this before? During an imaging session with my AP1200, while guiding with an STi on a 400 mm f.l. 80 mm refractor, I noticed an occasional really large guiding error. It took a few cycles to bring the guidestar back to the center of the guide window, and then all was OK until it happened again. I started an autoguider log file, and found that the process repeated every 379 sec. This suggested some problem in the RA drive system, but there were corresponding spikes in the Dec. error log as well. In fact these errors were larger than those in RA. It almost looked like something was binding the RA drive every 379 seconds and then, when the drive slipped free, there was an impact on the Dec axis. Interestingly, after recording five of these events in the autoguider log, the problem vanished. I continued observing for about another hour and a half but the problem did not return. I've attached a couple of plots, one showing the problem, and the second one after the problem went away. The plot scale is in pixels, and each pixel is 3.82 arcseconds. Any ideas?
Allen -- Roland Christen Astro-Physics
|
|