NYC Mesh OSPF Routing Methodology

This guide gives an overview of the OSPF methodology and topology currently deployed at NYC Mesh. If you are looking for specific configuration instructions, please refer to the OSPF Configuration Guide page.

Positives and negatives

OSPF is an interesting choice as an in-neighborhood routing protocol because of its ease of setup (auto convergence, no ASNs), and how ubiquitous it is -- nearly every cheap and expensive commercial and open device supports it. These two positives alone make OSPF worth considering.

On the down-side, it is not specifically designed for an adhoc mesh, it trusts blindly, and has very few tuneables. Additionally, there are a few technical challenges such as the lack of link-local address use, only advertising connected networks (not summaries), and some common defaults on various platforms.  

Many of these challenges can be overcome by taking some care to make good choices for options when setting up a network.

OSPF Selection

NYC Mesh has chosen to use OSPF as the standard mesh routing protocol of choice. This may be a controversial choice, as _most_ mesh networks in Europe are using custom mesh routing protocols, or encrypted routing protocols. We have chosen this path because:

NYC Mesh utilizes a wide range of hardware with differing capacities and weather resiliency characteristics. Being volunteer-driven and operated, it's important that the network be resilient, but also easy to maintain and scale. OSPFv2 Point-to-Multipoint allows us to modify routing tables and plan for expansion without overly-complicated configuration planning.

Important Note: making changes to default OSPF costs can have unintended effects, up to and including network-wide outages, and frequently requires modifying Hub configurations. Testing and learning should not be done in production environments. Before making changes to NYC Mesh routing configurations, discuss in our Slack #architecture channel and/or applicable hub channel and check the NYC Mesh Node Explorer tool for current routing information.


Designing the basic architecture

To standardize across the network, each router has a Mesh Bridge Interface on the OSPF Area with default cost of 10 to all adjacent neighbors. This ensures symmetry in link costs on both ends of the link, keeping bi-directional traffic following the same path. For each "hop" to an internet exit, each router incurs its link cost to transit to the next hop. By calculating the lowest cost to an internet exit, the local router sends its traffic on an Internet exit in (usually) the most efficient manner, automatically.

Example: Node path to internet exit with all default costs

Node A > 10 Hub > 10 > Supernode > 1 > Public Internet

In the above example, the Node incurs cost 10+10+1=21 to exit to the Public Internet. Unless a lower cost exit becomes available, this will be the preferred route for all internet traffic to and from Node A.

Now that we've standardized route costs, we need to design priority and redundancy to take advantage of nodes clustered around each other while preferring higher-capacity links.

The WDS bridge: ensuring Hub-and-spoke routes are preferred over WDS routes

NYC Mesh uses Omnitik wireless routers at almost all member nodes to automatically connect to each other, providing numerous backup routes in case of hardware failure or network changes, but these connections are often slower and less reliable than point-to-point and point-to-multipoint connections in our Hub-and-Spoke model. To account for this, we put the Omnitik<>Omnitik WDS links on a separate "WDS Bridge" on every Omnitik router with default cost of 100

Example: Node preferring Mesh Bridge over "shorter" WDS links

Node A > 100 (WDS) > Node B > 10 > Supernode X > 1 > Public Internet 
Node A10 > Microhub > 10 > Hub > 10 >  Supernode Y > 1 > Public Internet

In this example, Node A prefers to exit via Supernode Y as the cost it incurs is 10+10+10+1=31, versus 100+10+1=111 via Supernode X. If we did not have higher WDS costs, Node A would instead prefer the shorter link to Supernode X, but would very likely experience poorer performance.

For more details on the hybrid Hub-and-Spoke + Mesh model we deploy, see the Mesh page.

Example: Prospect Lefferts Garden

image.png

In Figure 1, we see many nodes (marked as red dots) clustered around 2 Microhubs (marked as blue dots) in Prospect Lefferts Garden, as well as multiple exit routes to the north. While most of the nodes will automatically find the best exit, there are some that may have equal costs through multiple exits. To mitigate this, we set preferred routes (via hardware like SXTs, or software with virtual wireless interfaces) on the Mesh Bridge, as illustrated by the green lines in Figure 2. This ensures each node selects its fastest and most stable route to send and receive internet traffic.

image.png

We can see this in action on the Omnitik: 10.69.45.7 is on the the "Mesh" bridge interface, meaning it incurs cost 10 to transit. All other adjacent routers are on the "WDS" bridge interface, and incur cost 100 to transit. This setup ensures the local node prefers the 4507 Microhub as its exit route, but also has backup routes in case 4507 goes offline or one of its upstream links is broken.

By implementing this architecture across all routers on the network, we now have high resiliency to outages, scalability, and minimal configuration effort.

Bridge Filters

Before we go further, its important to mention bridge filters. These are required to make OSPF work properly and also ensure members within the same building are isolated from each other. There are 3 filters applied to standard Omnitik configurations.

[admin@nycmesh-xxxx-omni] > interface bridge filter print
Flags: X - disabled, I - invalid, D - dynamic 
 0   chain=forward action=drop in-bridge=mesh log=no log-prefix=""
 1   chain=forward action=drop in-bridge=wds
 2   chain=forward action=drop in-interface=wlan2 

It is critically important to ensure these filters stay enabled on each router to ensure individual routers don't "bridge" connected nodes and Hubs, leading to unintended routing paths. Further explanation and examples of bridging scenarios are covered below.

Sidebar: The case against OSPF automation and summarization

Given the scale of the problem and continued growth, we must consider an important question: why not implement automation to dynamically adjust OSPF costs based on link quality, and/or utilize summarization and redistribution to simplify planning?

As mentioned above, our network design is meant in part to balance the following three goals:

Further, NYC Mesh has no CEO, directors, or employees, and the board intentionally does not have decision-making authority over non-financial/legal matters; as documented in the NYC Mesh Commons License, the design, planning, maintenance and support of NYC Mesh is done solely by community members and volunteers. While we do have highly-skilled volunteer network engineers, the day-to-day maintenance and monitoring of the network is done by members with varied skill levels; we generally prefer easy-to-maintain solutions over highly customized configurations requiring extensive knowledge and training.

Finally, as we primarily rely on member donations to maintain and expand the network, we generally avoid high-end enterprise-grade hardware or software requiring recurring subscription fees and support contracts to minimize operational expenses. As NYC Mesh continues to grow, we may need to adopt more robust and dynamic routing and load-balancing techniques, and will look to our community to collectively decide on the path forward.


Scaling out the Hub-and-Spoke model

This baseline architecture works great in individual neighborhoods and on relatively linear routes, but with over a thousand nodes connecting to 60+ Hubs with links crisscrossing New York City, some planning and manual intervention is required to ensure stability and speed for all connected members. 

image.png

In the Bed-Stuy, Bushwick, Ridgewood, and Crown Heights neighborhoods show above in Figure 3, we have over a dozen Hubs serving hundreds of members. Efficient routing and redundancy across multiple wireless links requires further options for route cost between 10 and 100.

In efforts to minimize single points of failure in our network (hubs having only 1 exit route) and provide dedicated backup routes in cases of weather impacting high-frequency links, we deploy redundant links in a "triangle schema" so that each hub has multiple low-cost routes to exit. To see this deployed, let's remove the nodes from the above photo and focus on the Hubs.

image.png

As we can see in Figure 4, most Hubs have 2 or more exit routes so that an outage of an individual link or Hub will not isolate any other Hub. Additional routes leading off Figure 4 allow multiple exits from both Vernon and Hex House, as well as other lower-capacity links through smaller Microhubs and nodes.

image.png

In Figure 5, we observe a similar trend as we move southwest towards Prospect Park and Supernode 3 at Industry City.


Load-balancing across varied hardware 

NYC Mesh uses a broad range of purchased and donated Ubiquiti and Mikrotik hardware with varying capacity, capabilities, and rain fade resilience, and members are allowed to extend the network at will pursuant to the Network Commons License. Because our OSPF link costs are static and do not automatically increase or decrease based on link quality, limiting ourselves to just two options for link cost will quickly cause issues as the network grows. Here are just a few use cases to consider:

To meet these goals, we need to set up custom link costs on backup routes as well as between high-traffic Hubs.

Example: Microhubs between Major Hubs

Our Vernon and Prospect Heights Hubs collectively carry more than 80of NYC Mesh network traffic in Brooklyn. By design, each Hub's primary exit is through different Supernodes to the public Internet (Vernon through Supernode 10 in Manhattan, and Prospect Heights through Supernode 3 in Industry City). To allow redundancy between their exits, a dedicated 60GHz link (in teal) is deployed between the two, but Vernon and Prospect Heights also have more preferrable secondary links (illustrated further below in Figure 7). This requires the link to have a slightly higher cost (in this case, 15) so that each Hub prefers other backup routes in case of primary exit link outages.

To make matters more complicated, Microhubs in between Vernon and Prospect Heights connect to both Hubs to provide their own redundancy, as illustrated in Figure 6.

image.pngNote: nodes and sector coverage have been omitted

To ensure each Microhub prefers the fastest route and they don't bridge Vernon and Prospect Heights by having additive link costs lower than the Vernon <> Prospect Heights 60GHz link, we need to manually set the backup links with higher costs.

Reminder: before setting up secondary links, double-check that the appropriate Bridge Filters are enabled on the local router. A misconfigured or disabled Mesh Bridge Filter will result in 0-cost links between neighboring nodes and Hubs, risking major network congestion or outages.

Note that there is no firm methodology or formula for calculating optimal custom link costs in this model, though backup links are generally set between 20 and 80 depending on upstream impacts. Sufficient buffer should be allocated between primary and secondary routes to allow expansion and updates with minimal changes required to upstream OSPF costs or routes.

The NYC Mesh Node Explorer tool generates live and historical mappings of nodes and link costs, allowing us to quickly determine primary exit routes and costs, as well as an outage simulator tool that is extremely helpful in validating configuration of secondary routes.

You can also determine current exit cost of any Mikrotik router running RouterOS v6 with the following command:

[admin@nycmesh-xxxx-core] > routing ospf route print where dst-address =0.0.0.0/0
 # DST-ADDRESS        STATE          COST          GATEWAY         INTERFACE     
 0 0.0.0.0/0          ext-1          20            10.70.253.xx    bond1.1010    

Note: identifying information has been removed from this terminal export

When selecting a custom link cost that may bridge segments of the network, the following factors should be taken into account:


Planning for Outages

Ok, to summarize, we've done the following:

To determine route costs to set on primary and backup routes, we need to understand more about the wireless hardware used in the Mesh. The table below details characteristics of the common equipment currently in use on the Mesh.

Brand Antenna Band Advertised Capacity* Typical Capacity* Rain Resilience Preferred Link Distance**
Ubiquiti Litebeam 5AC Gen2
Litebeam LR
5GHz 225 Mbps+
(40 MHz width)
75-175 Mbps

Extremely High < 3.5km (PtP)
< 2km (PtMP)
Ubiquiti airFiber 5XHD
LTU Long-Range
5GHz 425 Mbps+
(80MHz width)
125-300 Mbps

Extremely High < 5km (PtP)
< 3km (PtMP)
Ubiquiti

airFiber 24

24GHz 750 Mbps
(100MHz width)
750 Mbps Very High < 5km (PtP)
Ubiquiti

airFiber 60 LR
Wave LR

60GHz 1Gbps
(1080MHz width)
1Gbps Medium < 3.5km (PtP)
< 2km (PtMP)
Ubiquiti airFiber 60 XR 60GHz 2.7Gbps
(2160MHz width)
2.7Gbps Low/Medium < 5km (PtP)
Mikrotik SXTsq 5AC 5GHz 200 Mbps+
(40 MHz width)
75-100 Mbps Extremely High < 1.5km (PtP)
< 500m (PtMP)
Mikrotik LHG 5AC 5GHz 200 Mbps+
(40 MHz width)
75-125 Mbps Extremely High < 3.5km (PtP)
< 1.5km (PtMP)
Mikrotik LHG 60 60GHz 1Gbps
(1080MHz width)
300-600Mbps Low/Medium < 1.5km (PtP)
< 500m (PtMP)
Siklu Etherhaul Kilo 8010
(licensed band)
70GHz
80GHz
10Gbps
(2160MHz width)
10Gbps High < 5km (PtP)

*  Capacity listed is single-direction speed; Typical Capacity indicates observed performance in New York City
** Preferred Link Distance is a subjective estimate of maximum distance in dense urban areas before performance is significantly degraded

As we can see, decisions on route priority depend on the capacity of individual links, as well link distance (for rain resiliency) and count of hops (for latency) to an internet exit. That's a lot of factors to consider! Let's see what this looks like in the real world.

Determining Primary and Backup Costs

To see how these factors are taken into account when planning for real-world deployments, let's return to Brooklyn.

image.pngNote: some additional links and hubs omitted for clarity; listed link speeds are production single-direction actuals; distances between Hubs are not to scale

Figure 7.1 illustrates links between larger Hubs in Brooklyn, with notations indicating deployed hardware, directional link capacity and physical distance. Supernodes in green serve as Internet Exits; all other Hubs are marked in blue. As mentioned earlier in this article, all OSPF link costs are symmetrical to ensure consistent bi-directional traffic flow and ease of configuration.

Because OSPF will always prefer the lowest-cost route to an exit, its easier to work from the outside-in (i.e., starting with the Supernodes adjacent to the Internet Exits and working our way deeper into the network). To begin planning our link costs, let's start by identifying the Public Internet exits:

image.png

With the Public Internet exits and costs identified in red in Figure 7.2, the network will operate mostly ok with default costs of 10 (noted in grey) for all other links... until we experience outages, whether from heavy rain or rare hardware failure. 

To optimize the network, we'll need to make some changes to some of our primary link costs.

With these changes in place, Vernon now prioritizes traffic correctly over its highest-capacity link, and has a dedicated high-capacity secondary failover link in case of hardware failure or a Grand outage. Additionally, Saratoga will now also follow Vernon's exit to Grand.

image.png

We now have traffic moving more efficiently, and can operate with plenty of overhead for traffic spikes and growth. However, if we experience heavy rainfall or suffer outages, we may still have problems. We'd also like to make sure we don't have to redesign the entire OSPF schema from the ground up very time a new Hub is stood up. We'll work to address that now:

Let's see how our network looks in Figure 7.4 now that we've put these changes in place.

image.png

We now have efficient routing in place with high-resiliency backups for rain as well as high-capacity backups for hardware failures and outages!

If you've made it this far, you should have a good understanding of how to safely manage routes and costs on the Mesh, and be able to plan for new nodes and Hubs armed with a better understanding of traffic flow across the network.

To learn how to configure OSPF routes on our Mikrotik routers, see the OSPF Configuration Guide to get started.


Appendix: Brooklyn Hub OSPF Costs

Note: the routing details in this document are accurate as of April 11th, 2024. Before making production routing changes, check the NYC Mesh Node Explorer tool and discuss in our Slack #architecture channel.

1340 - Saratoga

3461 - Prospect Heights:

5151- President

5916 - Vernon

image.png


Revision #34
Created 20 January 2024 20:37:59 by Daniel Heredia
Updated 9 August 2024 02:33:34 by Matthew Boyd