Services & Software
UISP, Unifi, Zabbix, VPN, and the other services hosted on or for the Mesh
- Alerting and monitoring system
- Bookstack wiki software
- Connection Troubleshooting
- Grafana Monitoring
- OSPF Data API
- Overview
- Phone System and Configuration
- Call Termination: Google Voice and Callcentric
- Callcentric Inbound/Call Treatments
- Callcentric IVR Setup
- Security (outdated)
- Slack Support Follow Up Bot
- Software services list
- Wiki (Bookstack)
- Zabbix
- Website
- MeshDB
Alerting and monitoring system
Out of date
Existing systems
- #monitoring-unms/UISP
- Grafana/Prometheus
- public, setup 4 years ago: https://stats.nycmesh.net
- Mesh only, Omni's etc: http://10.70.90.82:3000/dashboards
- support report generator
Zabbix
- IP:http://10.70.73.58/
- Details: Runs on Quincy's server, connected to Beta Slack
Requirements
- Must:
- alert Slack team when key infrasture goes offline within 5 minutes
- Should:
- be easy to update for new equipment
- be easy to configure to notify new volunteers
- be easy to deploy
- be reliable
- be configurable though a version controlled config to enable easy updates
- be editable by multiple volunteers
Questions
- Major
- What key metrics should we alert based on?
- Minor
- frequency? ~1 point/hour
Proposed software
Next Steps
Log
- prompted by this Slack discussion on Grand St. outage
- added Zabbix server during Hack night,
Bookstack wiki software
Adding attachments
Only logged in Editors can add attachments.
Click the paper clip icon on the right menu in edit mode:
Once the file is uploaded click the link button to add a link to the current page:
Bookstack wiki tips
- Enable keyboard shortcuts (ex. press "e" to edit page):
- click profile dropdown on upper right
- click shortcuts
- click checkbox to enable shortcuts
How to create pages
To create a page:
Connection Troubleshooting
Restart Network If Ping Google.Com && 8.8.8.8 Fails 4 Times
Wireless networks have a bit of a reputation for instability. Modern hardware has fixed most hardware problems, but there is work that needs to be done to make the firmware reliable. You can do this with "watchdog" scripts. I haven't had to reboot a router that is running our watchdog script.
Our firmware image (based on qMp) comes with a "bmx6health" script that checks whether the mesh software is running correctly and restarts it if necessary. This script by default runs once per day. I've found it better to run this every 5 minutes. You can do this by editing the crontab-
ssh into the router and in the terminal-
crontab -e
This opens a vi editor and you can change or add different scripts to run at different times. (The vi commands you need are "i" to insert, "esc" to stop editing, and ":x" to save and eXit.)
For some nodes, their main purpose is to be an internet gateway. To ensure that they always try to be online, you can add a watchdog script that pings a known website and calls "network restart" if it fails. These kind of scripts often ping 8.8.8.8, which is Google's DNS server.
I've discovered 3 ways to recover a qMp mesh router that has functioning wifi but has lost internet- network restart
, bmx6 restart
and restarting dnsmasq-killall dnsmasq; dnsmasq start
. Sometimes the dns forwarder, dnsmasq will stop working correctly letting you ping some things and not others. dnsmasq will then forward bad dns info to the other routers too so it needs to be fixed quickly! killall dnsmasq; dnsmasq start
will fix it.
gwck is a qMp utility that is restarted after network restart.
Another problem I've had occasionally is that the wifi will lose connections. Even though the radio is on and the router lights are normal you can't connect. I've written a simple script to restart wifi if both the ad-hoc and access point interfaces have no connections. It is a bit of a hack since the interface may be ok, but since nothing is connected via wifi it doesn't hurt too much to restart it. I've also found that a network restart is necessary to make the wifi stable.
By default wlan0 is the ad-hoc interface that is used to mesh the routers and wlan0ap is the access point. This script checks to see the number of wireless interfaces so it works with dual-band routers and routers that are only ad-hoc or ap.
I'm using "Signal: unknown" to show there is no connection. It seems to work reliably. You could also try iwinfo wlan0 assoclist.
"sleep 5" is usual between "wifi down" and "wifi up". I've found it not necessary when there are no connections, but I'll leave it there in case.
You can download the watchdog here
in the terminal-
vi /root/mesh-watchdog.sh
and paste this:
#!/bin/sh
# mesh-watchdog v1.1.1, NYC Mesh, Brian Hall
restartWifi()
{
wifi down
sleep 5
wifi up
}
restartNetwork()
{
/etc/init.d/network restart
if /etc/init.d/gwck enabled; then
/etc/init.d/gwck restart
fi
/etc/init.d/bmx6 restart
sleep 4
killall dnsmasq
/etc/init.d/dnsmasq start
}
#gets date-time from log and exit if recently run. date-time is first two words of last line
exitIfRecentRestart()
{
if [ -e $LOG ]; then
set -- `tail -1 $LOG`
LASTRUN=`date --date="$1 $2" +%s`
if [ "$?" = "0" ]; then
#don't run for 1200s (20 minutes)
NEXTRUN=$(($LASTRUN + 1200))
NOW=`date +%s`
es=$(($NOW - $LASTRUN))
printf "time since last restartNetwork: "
printf '%dd %dh:%dm:%ds\n' $(($es/86400)) $(($es%86400/3600)) $(($es%3600/60)) $(($es%60))
if [ $NOW -lt $NEXTRUN ]; then
echo "waiting $(($NEXTRUN - $NOW)) seconds, use option -f to force"
exit 1
else
echo "run tests-"
fi
else
echo "invalid date from log, run tests-"
fi
else
echo "no log, run tests-"
fi
}
LOG="/tmp/log/mesh-watchdog.log"
FORCE=0
if [ "$1" = "-n" ]; then
echo "restartNetwork"
restartNetwork
exit 1
elif [ "$1" = "-f" ]; then
echo "force tests-"
FORCE=1
elif [ "$1" = "-w" ]; then
echo "restartWifi"
restartWifi
exit 1
elif [ "$1" = "-b" ]; then
echo "restart wifi, wait, restart network"
restartWifi; wait 60; restartNetwork
exit 1
elif [ "$1" != "" ]; then
echo -e "Usage: `basename $0` [OPTION]\n\nTests wifi and internet connections and restarts if necessary (default)\n\n\t-f\tforce test\n\t-n\trestart network\n\t-w\trestart wifi\n\t-b\trestart both wifi and network\n\t-h\toptions\n"
exit 1
fi
if [ $FORCE != 1 ]; then
exitIfRecentRestart
fi
DATE=`date +%Y-%m-%d\ %H:%M:%S`
IWINFO=`iwinfo`
# find lines containing "ESSID"|get name (previous word)|replace return with ","
WI=`echo "$IWINFO" | grep ESSID | grep -Eo '^[^ ]+' | sed ':a;N;$!ba;s/\n/, /g`
# count the number of wlan interfaces, and number of wlans with 'no signal'
WLAN=`echo "$WI" | wc -w`
NOSIGNAL=`echo "$IWINFO" | grep 'Signal: unknown' | wc -l`
if [ $WLAN -eq 0 ]; then
echo "no wlan interfaces, wifi is probably disabled"
elif [ $WLAN -eq $NOSIGNAL ]; then
# all wlan interfaces are down, so restart wifi
echo "$DATE restart wifi- wlans:$WLAN no-signal:$NOSIGNAL interfaces:$WI" | tee -a $LOG
restartWifi
sleep 60
restartNetwork
exit 1
else
echo "wifi:ok wlans:$WLAN no-signal:$NOSIGNAL interfaces:$WI"
fi
# restart network if ping google.com && 8.8.8.8 fails 4 times
count=1
while [ "$count" -le 4 ]
do
if /bin/ping -c 1 google.com >/dev/null && /bin/ping -c 1 8.8.8.8 >/dev/null; then
echo "wan:ok ping-count:$count"
exit 0
fi
let count++
done
echo "$DATE network restart" | tee -a $LOG
restartNetwork
Make it executable-
chmod +x /root/mesh-watchdog.sh
Afterwards, add the following entry with crontab -e
* * * * * /root/mesh-watchdog.sh
It can run once a minute as it detects whether a network restart has just occurred and will wait 20 minutes before restarting again. I added the 20 minute delay so the router is still functional without an internet gateway.
Thanks to Nitin for help with the wifi problem and Zach for help with dnsmasq.
Email me if you have any questions or suggestions.
Grafana Monitoring
Services
Grafana
- URL: http://10.70.90.82:3000/
- Contents: Dashboards
Prometheus
- URL: http://10.70.90.82:9090
- Contents: Main hub data
- Datesource name: Prometheus
Updating SNMP Data Scraper
- Enable SNMP on the device
- login to RouterOS with the device IP
- in webfig go to IP>SNMP and enable
- save
- in quick set note the router ID
- Update Prometheus config
ssh root@10.70.90.82
nano /opt/prometheus-2.39.1.linux-amd64/prometheus.yml
- add IP for device at the end and use router ID from above in the comments
- save the file
systemctl restart prometheus.service
- if anything is unclear feel free to look at the command history with the
history
command
- Update Grafana
- Go to a Grafana page where you want to add the new panel, ex. http://10.70.90.82:3000/d/EfHFIMWSz/nostrand-5283?orgId=1
- login with standard password
- duplicate panel
- right click panel header > more > duplicate
- right click panel header > more > duplicate
- rename using router id above
- replace the IP
- not all devices have the same metrics, so you may have to select a different one
- Make sure to save the dashboard!
- If everything worked you should see data
Prometheus-1
- URL: http://10.70.76.98:9090
- Contents: Omni data
- Datesource name: Prometheus-1
OSPF Data API
bird
? Look no further than the OSPF JSON API:> curl http://api.andrew.mesh/api/v1/ospf/linkdb {"areas": {"0.0.0.0": {"routers": ... }}}
Data Available
The full state of the entire OSPF network on the mesh is available via this endpoint. The format is as follows:
{
"areas": {
"0.0.0.0": {
"routers": {
"<router_id>": {
"links": {
"router": [
{"id": "<other router id>", "metric": <integer link cost>},
{"id": "<other router id>", "metric": <integer link cost>, "via": "<another router id>"},
...
],
"external": [
{"id": "<external CIDR>", "metric": <integer link cost>},
...
],
"stubnet": [
{"id": "<stubnet CIDR>", "metric": <integer link cost>},
...
],
"network": [
{"id": "<network CIDR>", "metric": <integer link cost>},
{"id": "<network CIDR>", "metric2": <integer link cost>},
...
]
}
},
...
},
"networks": {
"<network CDIR>": {
"dr": "<router id>",
"routers": [
"<router id>",
...
]
},
...
},
}
},
"updated": <integer epoch timestamp>
}
For each router, you can see the OSPF links it is advertising, and which type of link they are. Some links have a metric2
value instead of a metric
value. This represents a semantically meaningful difference in that router's configuration and the OSPF behavior for that node, but one that is is beyond the scope of this document to explain.
Update Frequency
The server refreshes the JSON data blob once per minute, see the updated
field in the top-level JSON object to confirm data freshness.
Authentication
None. This data is publicly available to any OSPF node, so no authentication is needed when accessing from the mesh private IP space.
Source Code?
Contact
The OSPF data API is maintained by Andrew Dickinson. Reach out to @Andrew Dickinson on slack for questions and comments. I'd love to see what you build.
Overview
This page intends to list the services "hosted" on NYC Mesh and available directly to NYC Mesh members. Some may be available only to NYC Mesh members while some may as well be available from the Internet via a Public IP address (or through Public DNS)
They are different type of services. Some are network specific or meant for devices, such as DNS or NTP, others are more people oriented such as an email server or video chat server.
If you do host a service that you would like to make available to the Mesh Community please let us know so we can add it here.
You can also discuss services on our slack channel #mesh-services
Network services
Public services
- NYC Mesh Meet by @Zach
- ExcellentFiles by @Eric Zhu.
It is a free file host hosted on sn3. Anyone can get 10G of free storage. It can support around ~25 users for now.
"I choose Nextcloud because it is very user friendly, and there is a nice mobile app, and desktop sync app. I have also enabled contacts + calendar sync. I use it myself coz i want to rely on other services less; to be more autonomous :)"
- Mastodon on @Daniel Heredia's server at SN3, open to all.
- NYC Building KML Tool by @Daniel Heredia, takes two address and uses NYC DCP and DOB databases to create a KML line between the rooftops to determine LoS (code).
Projects Services that are in development...
- Support Bot on Slack to automatically respond to #support channel inquiries
- Chat app by @George on slack
Phone System and Configuration
Call routing and automation flow spanning two third-party services for Voice-over-IP (VoIP) operation from publicly switched telephone network (PSTN) to virtual and physical endpoints.
Call Termination: Google Voice and Callcentric
Background
Early 2022 the 833-NYC-MESH number was parked at NumberBarn until further research could be done on how to implement the number. In the meantime the Google for Nonprofits platform was used to get access to Google Voice for a preliminary IVR/auto-attendant solution for routing calls to particular Google Workspace users and email to Slack for voicemail.
Google Voice does not support porting-in toll-free numbers for terminating calls and thus an external provider must be used in conjunction with the 833-NYC-MESH number. Currently, Callcentric has been chosen as our SIP provider due to widespread adoption, low rates, and positive testimonial, though the features it provides can all be replicated using a SIP Trunk->IP PBX architecture.
Routing
The 833-NYC-MESH number is ported into Callcentric for inbound termination and outbound origination. Currently, the Call Treatments feature forwards all incoming calls to the Google Voice auto-attendant "Hotline - Main", though the beta IVR setup through Callcentric's internal portal is partially configured and can be switched over at any time for testing.
Pricing
After a rigorous cost-benefit analysis on the plans available through Callcentric compared to the cost of expanding the Google Voice service further, multiple scenarios were drawn out to compare the price per a fixed number of minutes per month.
This Google Sheet has the pricing for pay-as-you-go inbound and outbound calls (which results in a double charge due to the forward to the Google Voice Hotline Root is billed as well as the inbound to the toll-free number), the 500-minute package for outbound calls (to eliminate the double charge for forwarding to Google Voice and regular outbound calls, and eliminating forward to Google Voice all together and handle calls entirely internally.
Based on findings after testing the plans and billing behavior in Callcentric, it was determined that the 500-minute package makes the most sense for preserving Google Voice auto-attendant integration while saving a small amount of funds, while switching entirely to Callcentric for IVR handling is most cost-effective, though there are drawbacks to needing to manually configure SIP endpoints for all users (though those endpoints are more flexibly configured than Google Voice Users.
Callcentric Cost Calculator - Google Sheets
Callcentric Inbound/Call Treatments
Call Treatments in this context are practically synonymous with Inbound Routes, which may be more commonly seen in VoIP configuration software. NYC Mesh owns two direct-inbound-dialing (DID) numbers registered with Callcentric, toll-free DID 833-NYC-MESH (18336926374) and local DID 13475147546. For more information about how calls are routed through the Callcentric portal, see this separate page, but this page outlines both currently used and unused options that can and are be used in routing NYC Mesh calls.
Interface
"Treatments"
- 833 to GV: Catches calls coming from the toll-free DID and forwards to the Google Voice "Hotline - Main" auto-attendant.
- 347 Ring Group: Catches calls coming from the local DID and does a simultaneous ring between the Mesh Room desk-phone and the voice@nycmesh.net Google Voice User, which is responsible for voicemail.
- 833 to IVR: Future setup to catch calls coming from the toll-free DID and forwards to the internal IVR: 1 - Main - Language.
- 347 IVR: Testing route to catch calls coming from the local DID and forwards to the internal IVR: 1 - Main - Language. This is not for production use, but to avoid incurring costs when configuring the Callcentric IVR through the toll-free rate plan.
Parameters
The Callcentric-provided documentation can be found here.
The main nuance with the currently-implemented setup is the use of "Ring for (seconds)". In all Treatments and IVRs, any SIP extensions are set to time out at 20 seconds of ringing, while the forward to the Google Voice endpoints is set to 60 seconds of ringing. This is to allow the Google Voice endpoint to connect the call to its voicemail for now.
Callcentric IVR Setup
Callcentric has a built-in IVR utility that allows for practically infinite permutations of menus, auto attendant scripts, and scriptable interactive call flows using Call Treatments. The flow itself can be found here, but this page is solely to contain the scripts for uploaded recordings for all announcement and menu audio files when read through NaturalReader software "Guy Online (Natural) (Free)" voice read at x1 speed extracted using Audacity software using the Windows WASAPI loopback device as an audio recording into a WAV file, which can be easily uploaded through the portal as described below.
Setup
On the left, you can add and configure IVRs and their menu trees which follow the structure and naming convention listed in the below section. On the right, MP3 or WAV audio files below 1Mb can be uploaded to be used within the IVRs. There is a built-in validator to ensure there are audio files in the mandatory places for calls to be handled correctly.
When adding an IVR or clicking "modify" on an existing IVR, the Edit IVR screen will open. On the left, the Announcement Audio selection is for audio files to be played only once when entering the IVR, and the Menu Audio selection is for audio files to be played after the Accountment Audio, and repeatedly after User error events such as timeout or invalid entry. This audio can be controlled in the User error audio selection, which currently only plays a built-in female voice "Sorry".
On the right, there are multiple options to route calls based on user entry, between direct transfers to extensions, sending to other IVRs, or connecting to other menus through a transfer. Depending on the setting of Repeat on error, after the error limit is reached the call will terminate.
Call Tree Key
Audio file names are based off of the menus where they are used, either as a menu option or as an announcement. Files that begin with 0 refer to common elements shared among multiple root hotlines. Items in red are options are either planned but not implemented or ideas pending discussion.
IVRs in the 0 zone:
- 0. a. iv. Common - English - Org Info
Non-Default Parameters: Timeout: 0 sec
Comment: with no Last Route setting configured, the call just drops per the documentation. It would be nice to send this back "up" a menu but unfortunately it doesn't appears that there is any option that allows you to select the previous IVR menu.
Hotline Roots:
- Main - Language (Root Hotline)
a. Main - English - Menu
Non-Default Parameters: Repeat on error: 3, User error audio: Sorry
i. To Get Connected: Simultaneous ringing to Mesh Room and Marco/VM
ii. Tech Support: Single forward to Marco/VM
iii. Buildings Projects Fiber: Single foward to Mesh Room
iiii. Org Info: Special forward to IVR 0.a.iv - Org Info
b. Grand - Spanish- Menu (doesn't exist yet!)
c. Grand - Chinese- Menu (doesn't exist yet!) - Grand - Language (Root Hotline)
a. Grand - English - Menu
Non-Default Parameters: Repeat on error: 3, User error audio: Sorry
i. To Get Connected: Simultaneous ringing to Mesh Room and Marco/VM
ii. Tech Support: Single forward to Marco/VM
iiii. Org Info: Special forward to IVR 0.a.iv - Org Info
b. Grand - Spanish- Menu (doesn't exist yet!)
c. Grand - Chinese- Menu (doesn't exist yet!)
Text-to-Speech Audio Files and Scripts
Comment: The pound keys are not truly configurable in Callcentric, and despite the script advising its use to repeat the menu, it triggers the User error audio and subsequently the Last route if pressed after the third failure, which disconnects the call.
0a - Root - Language - English
To continue in English, press 1.
0ai - Root - English - Get Connected
To get connected to the mesh, press 1.
0aii - Root - English - Tech Support
For technical support, press 2.
0aiv - Root - Org Info
For more information about our organization, press 4.
0aiv - Root - Org Info - Info
NYC Mesh is a community network offering fast, affordable, and fair access to the Internet for all New Yorkers. By joining NYC Mesh, you can access the Internet while helping your neighbors get better and more accessible internet access. NYC Mesh is a neutral network and we do not monitor, collect, or store any user data or content.
For more information about our community network, visit our website at n y c mesh dot net, and find a list of frequently asked questions and answers at n y c mesh dot net slash f a q.
1 - Main - Language - Thank You
Thank you for calling NYC Mesh.
1aiii - Main - English - Buildings Projects Fiber
For buildings, projects, and fiber installs, press 3.
2 - Grand - Thank You
Thank you for calling NYC Mesh at Grand Street Guild.
Incomplete Recordings
The Callcentric call handling only has IVRs with English. The entry points for other languages would be formatting along the lines of the below:
0b - Root - Language - Spanish
0C - Root - Language - Chinese
Security (outdated)
Security
The goal of this document is to provide the most useful information for anyone interested in the security of the network. If there is missing information that would help understand and improve our network, please reach out to contact@nycmesh.net or join our slack.
We are actively looking for ways to improve the security, resiliancy, and ease-of-use of the network to help the widest range of use cases. If you have ideas on how to improve anything, please join our slack
Our current threat landscape is most concerned with in-mesh security - once traffic is routed over an IXP, provider gateway, or peer, its equivalent to what people are used to.
In mesh threats include:
- DoS by announcement of bogus routes
- MiTM attacks on SSL servers using letsencrypt (should be alleviated by multiroute verification if we interconnect in more places)
- Visibility of who you talk to when using unencrypted HTTP, DNS queries, SNI, etc for someone along the route chain
Data
- We do not keep logs of anything in-mesh. However anyone along the route chain could view unencrypted data or metadata (just like any ISP can).
- The organizers of nyc mesh can see a spreadsheet of signup information volunteered by participants on the join nycmesh page (name, email, phone, address all but email are optional)
- We create a map using map-nodes, from the above spreadsheet
Wifi
A typical home install creates two wireless networks - one open 802.11 access point (with a captive portal), and one WPA2 encrypted upstream gateway. You can change the open access point to be encrypted if you wish.
DNS
The default setup routes .mesh
tld DNS requests to 10.10.10.10, which is anycast. Multiple people are running our knot-dns setup available on github (including supernode 1 at 10.10.10.11), but a malicious actor that is closer could take advantage of this.
Slack Support Follow Up Bot
Features
-
- ticket created on 1st support thread
- - subject includes “follow-up-bot: ”
- if no slack response - every 48 hr nag up to 3 times and then reopen osticket and email nag with auto-re-close.
Problems to solve
- member has issue that gets forgotten about after reporting on slack thread
- support thread is never responded to by a volunteer
- atypical support threads
- volunteer message to many people
- Is this a community announcement?
- if no or no response, then run support bot
- if no or no response, then run support bot
- Is this a community announcement?
- volunteer message to many people
Complication
- slack threads are not structured causing false positives, identical treatment for different types of threads
- someone responds out of thread
Programming
- need database
- ignore multiple threads
Process
- content matches goes to funnel
- automated follow up after 48 hours
- is issue resolved?
- if no response after 3 cycles then reopen ticket
- reopen in OS ticket and send message and recloses
- false positives
- if yes, then say thank you and do nothing (maybe record analytics somewhere)
- stop nagging
- if no
- should stay in slack
- if no response after 3 cycles then reopen ticket
- is issue resolved?
Diagram
out of date
Software services list
Incomplete list - add your service!
Name | Purpose | Link | Active? | Maintained by |
Supportbot | Help diagnose support issues | Yes | Andrew + Andy | |
Grafana Private |
http://10.70.90.82:3000/dashboards | Olivier + Andy | ||
Millimeter Outages |
On Grafana Private (dashboard link) | Andy Baumgartner | ||
Grafana Public |
https://stats.nycmesh.net | Zach | ||
Mastodon |
Self hosted Twitter alternative | Yes | Daniel | |
Wiki |
Evergreen docs, etc. | Yes | Andy Baumgartner | |
UISP |
Manage Ubiquity devices | https://uisp.mesh.nycmesh.net | Yes | Olivier |
Access OSPF Link DB data without running an OSPF node | http://api.andrew.mesh.nycmesh.net/api/v1/ospf/linkdb | Yes | Andrew Dickinson | |
OS Ticket |
Support and install tickets | https://support.nycmesh.net/scp/login.php | Jason | |
Node Explorer |
Shows OSPF Graph | http://node-explorer.andrew.mesh.nycmesh.net/explorer | Yes | Andrew Dickinson |
Node Impact Analyzer |
Show downstream nodes affected by outage | http://outage-analyzer.andrew.mesh.nycmesh.net/ | Yes | Andrew Dickinson |
Contacts Map |
Shows emails of nodes associated with a hub | http://10.70.178.21:5000/ | Yes | Andy Baumgartner |
Uptime Kuma |
Monitor Mesh services uptime and alert in slack | http://10.70.178.21:3001/ | Yes | Andy Baumgartner |
Status Page |
Mesh Status Page |
http://status.mesh.nycmesh.net |
Yes |
Willard + Andy + Lydon |
Zabbix |
Metrics and Alerting |
http://zabbix.mesh.nycmesh.net |
Yes |
Willard |
UISP2Zabbix |
UISP -> Zabbix Broker |
zabbix.mesh.nycmesh.net |
Yes |
Willard |
OSPF2Zabbix |
OSPF Device Enroller and Noise Report Generator |
zabbix.mesh.nycmesh.net |
Yes |
Willard |
Mesh DB |
Database to hold mesh network information | https://db.grandsvc.mesh.nycmesh.net/ | Yes |
Willard + Andrew |
Wiki (Bookstack)
Bookstack is a user friendly Wiki software which the NYC Mesh Wiki is built on.
MVP Wiki Launch Features
Before the Wiki can be more broadly used (and and possibly replace docs.nycmesh.net) we must add some key features:
Required
- Simple introduction
- Ensure everything a new volunteer/stakeholder would benefit from seeing is linked in the introduction book.
- Hardware page to show extent of documentation and another "book".
- Ensure everything a new volunteer/stakeholder would benefit from seeing is linked in the introduction book.
- Support for old users
- Old to new mapping
- 1. sitemap mapping of old page to new?
- 2. update all docs pages with a link to the wiki equivalent.
- Page redirects on docs
- Old to new mapping
- Full duplicate of docs.nycmesh.net content - to avoid information fragmentation, we should migrate a full copy of all existing docs content
- Merge any dupe Docs and Wiki pages (ex. hubs?)
- Replace high level links to docs (nycmesh.net header "Docs" replaced with "Wiki"
Nice to have
- Mesh hosting - Currently on AWS, but we could move to Mesh hosting fairly easily.
- Mesh LAN based user creation - Ideally a member on the mesh should be allowed to create an editor account without involving admins. This is possible with a custom theme but needs more development. A disabled in-development theme file is included in the themes directory.
Zabbix
Zabbix lives at http://zabbix.mesh.nycmesh.net
Zabbix is used primarily for historical data collection and Slack. There are a handful of dashboards configured for a few devices, but for the most part, the rest of its configuration is unused.
Data Collection
Zabbix is fed through the following sources:
- Data gathered via SNMP from various OSPF devices (mainly OmniTiks) discovered through OSPF2Zabbix
- Data forwarded from the UISP API by UISP2Zabbix
Custom Templates
We have a variety of custom templates, some of which were set up manually at one point, the rest either auto-generated or managed by one of the above tools.
Alerting
The main purpose of Zabbix is Alerting. Alerting can be found in the #zabbix-alerts channel. Alerts need to be tuned to what we really care about, such as the antennas on the larger links.
To make a trigger show up in Slack, add the slack
tag to it.
The trigger can be any severity level. By default, many triggers are straight-up disabled. Alerting is, unfortunately, a manual process. We're still figuring out what is important and what isn't.
Weekly reports of noisy triggers are published in #zabbix-reports, where the top 20 noisiest triggers are aggregated. This can help us identify problems over time.
Todos:
- There is a plan to use certain triggers to automatically switch over links. For example, we'd like to disable the AF60xr on Vernon and use a backup link when it rains.
- (Willard): I was working on a service to generate Zabbix templates from MIB files using the Zabbix API. I'd like to tailor it towards specific Ubiquiti devices and use it + the UISP API to discover compatible antennas and use the SNMP data to enrich our DataLink data.
- (Willard): Expand UISP2Zabbix to cover more than just DataLinks. It would be cool to get all kinds of data out of it and into Zabbix for analysis
- (Willard): Problem heatmap. If I could overlay problems on top of Andrew's Node Explorer, we could see problem areas within the mesh.
- (Willard): Integrate Grafana with Zabbix. I know this is possible, the question is what's the best way to do this? And, if we're primarily doing this for UISP, then why not build something that integrates with UISP? (Couldn't be that hard to just query UISP's database directly, right?)
More Info
For (outdated-ish) information on how this was set up, including how Slack alerting was configured, refer to this doc: https://docs.google.com/document/d/1mJI8DWe882P6GCEGdT0xazxwrrCQZD7qEBcsDEjDU7Q/edit?usp=sharing
Website
https://www.nycmesh.net/
Website Update Ideas
Add your website update ideas!
Branding
- Clean but playful design: Incorporate hand drawn graphics using standard Mesh color palette
- Photo integration: show our physical network
- Clear graphics for most important topics (also good for non English speakers)
Consistency
- avoid repeating information (e.g., the $290 install fee is mentioned on a number of pages, sometimes referred to as a donation and sometimes as an equipment cost). use links to an authoritative page instead, so updates will be reflected.
Interactive interface
Users could benefit from interactive design
- pre-join/new member presentation
- troubleshooting flowchart
- install-team sign-up
Media Ideas
- [add your photo, graphic, video, etc. ideas here!]
MeshDB
MeshDB Schema Design
Background
MeshDB is an under-development software application with the goal of replacing the New Node Responses Google Sheet (the spreadsheet) as the source of truth for NYCMesh member, install, geolocation, device, and connection information via a proper SQL database. It is built in the Django ORM, using Python Model objects to represent underlying database schema structures. The schema used for development up to this point is unable to faithfully represent some edge cases that occur at atypical NYC mesh sites. In this document, we propose a modified schema and explain each edge case, detailing how the edge case will be represented under the proposed schema
The Schema (Simplified)
The following diagram depicts the proposed schema, showing the relationships between models (SQL tables), and some key attributes of each model. For clarity, non-essential attributes are omitted (see appendix A for a comprehensive diagram).
We propose the following models:
- Member - Represents a single NYC Mesh Member (even if they have moved between multiple addresses and therefore have multiple installs or "own" multiple active installs ). Tracks their name, email address, and other contact details
- Install - Represents the deployment (or potential deployment) of NYC Mesh connectivity to a single household. This most closely maps to the concept of a row in the spreadsheet. Tracks the unit number of the household, which member lives there, which building the unit is located within. It is keyed by install number, which corresponds to row number on the spreadsheet. With foreign keys to Member, Building, and Device, it acts as the central model, tying the entire schema together. Many objects have a status field, but the install status field maps most closely onto the status tracked in the spreadsheet today. Completed Installs have a foreign key to the device field (via_device) which keeps track of the device they use to connect to the mesh
- Building - Represents a location in NYC identified by a single street address (house number and street name). In the case of large physical structures with more than one street address, we will store one Building object for each address that we have received Install requests for. Buildings track a primary network number, to represent the way the site is referred to colloquially. In the case that a building has more than one network number, the primary network number will be set to the one volunteers designate as the “primary” (usually the first assigned, busiest router, etc.)
- Device - Represents a networking device (router, AP, P2P antenna, etc.). Most closely corresponds to a “dot” on the map. Not comprehensive of all devices on the mesh, only those that need a map dot. For big hub sites, this may be only the core router. Contains a mandatory field for “network number” (NN) which will be set to the NN of the device, or of the “first hop” router used by this device (for devices like APs which have no NN assigned). It contains optional lat/lon override fields, which can be used to refine the exact location of this device (e.g. for map display). When no lat/lon are provided for a device, is it assumed to reside at the lat/lon of the building it is associated with (via the Install model). Devices can optionally track which install delivers them power, via a powered_by_install foreign key to the Install model, which tells us which unit has the PoE injector.
- Sector - A special type of device (using Django Model Inheritance to inherit all fields from device) which adds additional fields related to the display of sector coverage information on the map (azimuth, width, and radius)
- Link - A connection between devices, which represents a cable or wireless link, whether directly between the devices or via other antennas not represented with their own device objects
Example 1 - NN492 - Typical Multi-Tenant Install
In this simple example, we have two tenants in a single building with a single address, both connected via cables directly to an omni on their shared roof. They are connected to the rest of the mesh via an LBE to Saratoga. The database tables for this scenario look like this:
Example 2 - NN 4734 - Cross-Building Installs
In this example, members in 3 adjacent buildings, each with their own address, are connected via a single omni, with cable runs across the roofs directly to the member’s apartments. They are connected to the rest of the mesh via an mant 802.11 sector at 4507. The database tables for this scenario look like this:
Example 3 - 7th Street (NN 731) - Multiple Omnis on one building
In this example, we have one regular tenant in a single building with a single address. However there is also a rooftop office with its own omni, connected wirelessly to the primary one. They are connected to the rest of the mesh via a GBELR to Grand. The database tables for this scenario look like this:
Example 4 - Vernon (NN 5916) - Courtyard APs
In this example, we have a core hub site in a single building with a single address. However, there are many Access Points (APs) on light poles in the building’s courtyard. These light-poles are unquestionably associated with the same building/address as the core router of this hub, but need to be shown separately on the map.
In this scenario, we treat the light poles as if they are “apartments” in the Vernon building. They each get their own install #, but imagining a tenant living in the light pole, we say that this imaginary install is “connected via” a device object representing the AP. The network number for these APs is set to 5916, reflecting their first hop router (and the fact they are not themselves assigned NNs). Links between the courtyard APs and the core router are included so that they are rendered on the map
The database tables for this scenario look like this:
Example 5 - Prospect Heights (NN 3461) - Multiple NNs for one building
In this example, we have a core hub site in a single building with a single address. The primary NN 3461, also serves a member’s apartment as install #3461. However, there is another apartment which could not due to practical considerations be connected via a cable, and had to be connected via an antenna in their window to a sector on the roof. This antenna needed an NN for configen and naming, and so this building received multiple NNs.
The database tables for this scenario look like this:
Example 6 - Jefferson (NN 3606) - Multiple NNs for multiple buildings
In this example, we have a building with 4 addresses and 3 omnis on the roof, each with its own network number. There is no clean mapping between NNs and addresses, since each omni serves installs in multiple buildings. The omni of the primary NN, 3606, provides the uplink to Hex House (NN 1417).
The database tables for this scenario look like this:
Appendix A - Full Schema Diagram
The following is a complete schema diagram, showing all fields. New additions from the current implementation are shown in yellow, and removed fields are shown in red
How to onboard applications to MeshDB
Adding a new user for an application
Make a new user specifically for the application, not just the author of the application. For example, if Andy is creating an application to measure member distance to link NYC kiosks, don't create a user called AndyB
, create a user called AndyB-LinkNYCKioskTool
. For the password, enter something secure, like a random password generated by your browser, but there is no need to save this password, we will use a token to authenticate this user.
Save the user, and then click on the username in the Users list to add the necessary permissions directly on the user object. Do not add the user to any groups. Do not grant the user Staff or Superuser permissions
Use the arrows or double click to select permissions from the list of all possible permissions the application could be granted. Most applications do not need change/delete/add permissions. In this example, we grant Andy's tool "view" access to the Install
, Building
, and Member
tables. Save the changes you've made to the user object.
Adding an API token
Follow the instructions under Adding a new user for the application above. Then select "Add" next to Tokens. Select the user you just created in the dropdown provided
Save the new token, then send it to the author of the application. For more information on using this token to query the API, see the API docs here: https://db.grandsvc.mesh.nycmesh.net/api-docs/swagger/
Adding a new web hook recipient
Follow the instructions under Adding a new user for the application above. You may use the same "User" object for both tokens and web hooks if they are for the same application.
Select the "Add" button next to Webhook Targets, then use the magnifying glass icon to select the user you created for this application. Enter the target URL for the notification delivery (will be provided by the application owner). This URL will receive an HTTP POST request every time the selected event is fired.
Select the appropriate event in the dropdown based on the event the application needs to receive, and save. If the application needs to receive more than one event type, add a separate webhook target for each event they need to receive.