NYC Mesh OSPF Routing Methodology

Positives and negatives

OSPF is an interesting choice as an in-neighborhood routing protocol because of its ease of setup (auto convergence, no ASNs), and how ubiquitous it is -- nearly every cheap and expensive commercial and open device supports it. These two positives alone make OSPF worth considering.

On the down-side, it is not specifically designed for an adhoc mesh, it trusts blindly, and has very few tuneables. Additionally, there are a few technical challenges such as the lack of link-local address use, only advertising connected networks (not summaries), and some common defaults on various platforms.  

Many of these challenges can be overcome by taking some care to make good choices for options when setting up a network.

OSPF Selection

NYC Mesh has chosen to use OSPF as the standard mesh routing protocol of choice. This may be a controversial choice, as _most_ mesh networks in Europe are using custom mesh routing protocols, or encrypted routing protocols. We have chosen this path because:

NYC Mesh utilizes a wide range of hardware with differing capacities and weather resiliency characteristics. Being volunteer-driven and operated, it's important that the network be resilient, but also easy to maintain and scale. OSPFv2 Point-to-Multipoint allows us to modify routing tables and plan for expansion without overly-complicated configuration planning.

Important Note: making changes to default OSPF costs can have unintended effects, up to and including network-wide outages, and frequently requires modifying Hub configurations. Testing and learning should not be done in production environments. Before making changes to NYC Mesh routing configurations, discuss in our Slack #architecture channel and/or applicable hub channel and check the NYC Mesh Node Explorer tool for current routing information.

Designing the basic architecture

To standardize across the network, each router has a Mesh Bridge Interface on the OSPF Area with default cost of 10 to all adjacent neighbors. This ensures symmetry in link costs on both ends of the link, keeping bi-directional traffic following the same path. For each "hop" to an internet exit, each router incurs its link cost to transit to the next hop. By calculating the lowest cost to an internet exit, the local router sends its traffic on an Internet exit in (usually) the most efficient manner, automatically.

Example: Node path to internet exit with all default costs

Node A > 10 Hub > 10 > Supernode > 1 > Public Internet

In the above example, the Node incurs cost 10+10+1=21 to exit to the Public Internet. Unless a lower cost exit becomes available, this will be the preferred route for all internet traffic to and from Node A.

Now that we've standardized route costs, we need to design priority and redundancy to take advantage of nodes clustered around each other while preferring higher-capacity links.

The WDS bridge: ensuring Hub-and-spoke routes are preferred over WDS routes

NYC Mesh uses Omnitik wireless routers at almost all member nodes to automatically connect to each other, providing numerous backup routes in case of hardware failure or network changes, but these connections are often slower and less reliable than point-to-point and point-to-multipoint connections in our Hub-and-Spoke model. To account for this, we put the Omnitik<>Omnitik WDS links on a separate "WDS Bridge" on every Omnitik router with default cost of 100

Example: Node preferring Mesh Bridge over "shorter" WDS links

Node A > 100 (WDS) > Node B > 10 > Supernode X > 1 > Public Internet 
Node A10 > Microhub > 10 > Hub > 10 >  Supernode Y > 1 > Public Internet

In this example, Node A prefers to exit via Supernode Y as the cost it incurs is 10+10+10+1=31, versus 100+10+1=111 via Supernode X. If we did not have higher WDS costs, Node A would instead prefer the shorter link to Supernode X, but would very likely experience poorer performance.

For more details on the hybrid Hub-and-Spoke + Mesh model we deploy, see the Mesh page.

Example: Prospect Lefferts Garden


In Figure 1, we see many nodes (marked as red dots) clustered around 2 Microhubs (marked as blue dots) in Prospect Lefferts Garden, as well as multiple exit routes to the north. While most of the nodes will automatically find the best exit, there are some that may have equal costs through multiple exits. To mitigate this, we set preferred routes (via hardware like SXTs, or software with virtual wireless interfaces) on the Mesh Bridge, as illustrated by the green lines in Figure 2. This ensures each node selects its fastest and most stable route to send and receive internet traffic.


We can see this in action on the Omnitik: is on the the "Mesh" bridge interface, meaning it incurs cost 10 to transit. All other adjacent routers are on the "WDS" bridge interface, and incur cost 100 to transit. This setup ensures the local node prefers the 4507 Microhub as its exit route, but also has backup routes in case 4507 goes offline or one of its upstream links is broken.

By implementing this architecture across all routers on the network, we now have high resiliency to outages, scalability, and minimal configuration effort.

Bridge Filters

Before we go further, its important to mention bridge filters. These are required to make OSPF work properly and also ensure members within the same building are isolated from each other. There are 3 filters applied to standard Omnitik configurations.

[admin@nycmesh-xxxx-omni] > interface bridge filter print
Flags: X - disabled, I - invalid, D - dynamic 
 0   chain=forward action=drop in-bridge=mesh log=no log-prefix=""
 1   chain=forward action=drop in-bridge=wds
 2   chain=forward action=drop in-interface=wlan2 

It is critically important to ensure these filters stay enabled on each router to ensure individual routers don't "bridge" connected nodes and Hubs, leading to unintended routing paths. Further explanation and examples of bridging scenarios are covered below.

Sidebar: The case against OSPF automation and summarization

Given the scale of the problem and continued growth, we must consider an important question: why not implement automation to dynamically adjust OSPF costs based on link quality, and/or utilize summarization and redistribution to simplify planning?

As mentioned above, our network design is meant in part to balance the following three goals:

Further, NYC Mesh has no CEO, directors, or employees, and the board intentionally does not have decision-making authority over non-financial/legal matters; as documented in the NYC Mesh Commons License, the design, planning, maintenance and support of NYC Mesh is done solely by community members and volunteers. While we do have highly-skilled volunteer network engineers, the day-to-day maintenance and monitoring of the network is done by members with varied skill levels; we generally prefer easy-to-maintain solutions over highly customized configurations requiring extensive knowledge and training.

Finally, as we primarily rely on member donations to maintain and expand the network, we generally avoid high-end enterprise-grade hardware or software requiring recurring subscription fees and support contracts to minimize operational expenses. As NYC Mesh continues to grow, we may need to adopt more robust and dynamic routing and load-balancing techniques, and will look to our community to collectively decide on the path forward.

Scaling out the Hub-and-Spoke model

This baseline architecture works great in individual neighborhoods and on relatively linear routes, but with over a thousand nodes connecting to 60+ Hubs with links crisscrossing New York City, some planning and manual intervention is required to ensure stability and speed for all connected members.