While there has been much coverage of Sandy's devastation of NYC's transit systems and coastal communities, there has been an ongoing - and invisible - herculean effort to maintain and repair the Internet. A common realization for people without power is how important it is for absolutely everything in their daily life, and it is for this same reason that Internet is absolutely critical. Social media in particular, although often derided as frivolous, has become vital for those in search of food, medical help, fuel, and updates on the safety of loved ones, as well as a primary news source.
This storm has caused widespread outages for residential users as well as hosting for some pretty major websites, including Gizmodo, BuzzFeed, Jezebel, Gawker and Huffington Post.
Since NYC law prohibits large amounts of fuel above ground in high-rise buildings, many datacenters are forced to store their fuel tanks in the basement, while the generators are on higher floors. Due to limitations of pump design, the pumps must be within 1-2 floors of the tanks, though do not necessarily need to be in the basement. This was a major source of generator failures as once the pumps were submerged by floodwater, they were destroyed, and the generators lost access to fuel.
Unfortunately for one New York City datacenter, Peer1, their pumps were located in the basement, which was flooded. Peer1, however, did not accept the inevitabliltiy of the outage that was shortly coming as their mezzanine level diesel tank ran dry. Teams from Peer1, Fog Creek Software and SquareSpace organized a backbreaking effort hand carrying 5-gallon buckets of diesel up to the 18th floor generators keeping it supplied with the required 200 gallons per hour of fuel. As of their last updates they have fortunately aquired pumps capable of pumping their fuel up to their generators, though downtime may be required in the near future to remove contaiminants from their fuel filter likely introduced by the manual diesel filling process.
Even the prospect of utility power returning is not risk-free. When power initially returned at 111 8th Avenue, the voltage was fluctuating. Several suites switched from generator to Utility too soon, their UPS's were unable to handle the voltage irregularities, and powered off. This caused an outage for Telehouse's NYIIX Peering exchange.
Our thoughts and hearts go out to all the people at the affected companies working long tireless hours to keep the Internet running.
ISPrime is a privately held professional managed service company which operates several data-center facilities. Our Weehawken, New Jersey facility is located about 5 minutes from Manhattan with multiple redundant fiber paths to a variety of transit providers and major Internet peering points in Manhattan and beyond. Through careful datacenter design and extensive planning, ISPrime has suffered no disruptions but was running without grid power on our diesel generators from late Monday until utility power was restored late Saturday.
Because of our proximity and our stability, we have an uninterrupted up-close view of how the Internet has handled this event and would like to help visualize some of these issues for the public.
Generally we show two types of graphs on this article. One is showing the total number of networks reachable from an ISP, or a group of ISPs, or the internet as a whole. The other is a measurement of the amount of updates from an ISP, or a group of ISPs -- An "update" means that we have learned that a particular network is reachable, unreachable, or that the route to reach that network has changed.
These updates use CPU cycles on every router of every medium-to-large ISP in the world. An excessive number of updates can cause routers elsewhere in the world to exhaust CPU capabilities, and lose connectivity themselves.
This is a graph of networks reachable on the Internet (BGP Prefixes) measured from several of ISPrime's transit providers, as you can tell, the internet is a "noisy" place, with networks coming and going constantly, but the impact of the storm is immediately obvious.
Below is the same graph, just showing the past few days, All future graphs will show just the past few days to highlight the effects of the storm alone.
As you can see, a large number of networks have disappeared from the Internet on Monday night/Tuesday morning, and few have returned as of Thursday night.
You will also notice from these graphs, ISPrime has been fortunate to not suffer a single direct impact to our Internet transit connectivity
Another way of looking at this same data is to look at the amount of BGP Updates over the same time period, this is a measurement of all routing updates within our network during this time period.
This is likely caused by networks being brought offline from power failures, flood damage, and fiber cuts, as well as systems coming back online, in some cases only to be lost again after generators failed or other problems we can only speculate about.
For comparison, you can see the same event as viewed from our router in Amsterdam -- somewhat less noise, but it is clear that every router in the world has been working overtime to process the constant updates of what networks are reachable (or not).
One of the major issues for many ISP's in the NYC area is that a critical peering point, Equinix at 111 8th Avenue, lost power and cooling for various portions of the day on Wednesday. This is a measurement of the total number of updates from one of our core routers at 111 8th Avenue
This includes update traffic from all around the world, but the effects of NYC's issues are clearly visible, as well as Equinix's problems on Wednesday. Each of the massive spikes accounts for the entire facility being powered up, or losing power. This facility houses most of the large Cable companies in the USA.
While we do not know the specifics of what caused all of this noise, here is a measurement of updates from a large nationwide tier-2 ISP
This is a measurement of updates from the "TIE-NYC" Peering exchange, again clearly showing the storms effect on the participants
In some cases, some Datacenters didn't even properly plan for utility power returning. On Friday night, utility restoration caused a UPS malfunction, and power loss to a suite at 111 8th Avenue. This suite houses portions of one of the largest regional internet exchanges, NYIIX. This graphs clearly shows the flurry of incoming BGP updates that resulted as peers rejoined that exchange.
This is a very large regional provider in New Jersey (Net Access Corporation), a full service internet, datacenter and managed services provider. While Net Access has not experienced any power failures or connectivity problems at their primary facilities, many of their DSL, T1, and T3 customers lost connectivity due to failures elsewhere. Net Access facilities facilities at Parsippinay and Cedar Knolls have even become a home to displaced workers and companies.
Net Access experienced a total outage to customers served through Covad's DSL/T1 network, as well as seeing entire Verizon Central Offices go entirely offline.
A comment from Alex Rubenstein, CEO of Net Access, relayed that while remote access customers did lose service due to failures in Verizon's network, even after Verizon's network was brought back into service, most of those customers were unreachable initially anyway due to power losses of their own. As of Monday, November 5, Alex estimates approximately 50-75% of those customers have returned to service. Alex also reports that due to the massive failures at other facilities, Net Access experienced a large influx of emergency customer installations at their data centers in the week immediately following the storm.
This is a measurement of one of the largest residential/business Internet providers in Staten Island, Brooklyn, and New Jersey.
We have shown three graphs here. The first should already be recognizable, the second graph is showing the total number of networks they are announcing, indicating that half of the damaged sections of their network are still unreachable. The third is showing total traffic to their network.
There have also been some surprising international effects.
This is a measurement of one of the largest cable companies in Europe, from their peering point in NYC
We suspect this is due to transoceanic cables being disrupted at their landing stations in Tuckerton, NJ and Long Island, NY.
Another international problem with a major German ISP losing a large number of routes
Here we see one of the largest ISP's in Russia -- note that for a period of time, their network's connectivity to the USA was completely disrupted.