The Cloud: Why it's not scary, and why meteorologists should embrace it.

Most of the time, meteorologists get into weather because they are fascinated by a particular weather phenomena. A good portion of those phenomena require clouds, and every once in a while, you can spot a weather enthusiast if their heads are constantly looking up at them. There’s even a Cloud Appreciation Society people can join, which includes names of many types of clouds.

Unfortunately, this is not the cloud I am talking about today. Cloud Computing is a term that has become catchy and trendy these days. Having skills in this field can be quite desirable in the analytical world. But what is it exactly, and why should you care? I should start out by saying there are multiple cloud options out there, from Amazon to Google to Microsoft to a lot more. For the purposes of this post, I am going to refer to the Amazon Web Services (AWS). Also, while the cloud can do many different things, I am only going to talk about two features of interest to weather and climate enthusiasts: storage and computing.

First off, the cloud architecture doesn’t exist in the sky. In fact, there are no literal clouds involved, but rather large computer farms around the United States and Globe. It’s a way to access data in a safe and secure area, without the user having to worry about buying their own hardware. This can be very helpful for small companies, who cannot afford their own computer server, and also handle the costs of upkeep.

STORAGE

Cloud storage is not something remotely new. If you’ve ever used dropbox or Google Drive or iCloud Photos, you’ve essentially have stored data in the cloud. It’s an easy way to free up storage on your device, and gives you nearly instantaneous access to it. In the meteorological world, there are vast amounts of data, from ground stations to weather satellites, to model data going to the end of the 21st century. According to NOAAs National Environmental Centers for Information, users have been able to access 9 petabytes of environmental information over the internet. A petabyte is equal to roughly 1,000 terabytes, and 1 terabyte is about 1,000 gigabytes. For context, a standard laptop contains about 512 GB of data, which means it would take the equivalence of about 18,000 laptops to hold this information NCEI has. And that’s not even all the weather and climate data out there. A recent reanalysis dataset from ECMWF, is scheduled to be an additional 9 petabytes in size by the end of 2019. That would be another 18,000 laptops for one more dataset! Given the cost of laptops, this would be unfeasible to store data this way. One could build a computer server to hold a good chunk of this data, but the cost to maintain the hardware, and even update it every few, can be considerable amounts of money. We’re talking tens, if not hundreds of thousands of dollars. Many research and private institutions can afford this cost, but a start up might not be able to.

This is where the cloud comes in. The environmental data is stored on a computer server somewhere, where the user can access the data pretty quickly. Amazon stores this information on a specific node, or as they call it, a “bucket.” A user can point to this bucket and download the data they need pretty quickly. NCEI, along with the NOAA Big Data Partnership, have begun recognizing the need for their data to be stored in the cloud, and have worked over the past few years to do just that. A good number of datasets are being updated and provided by Amazon in their buckets. Some examples are below:

  • NEXRAD. Rainfall data from 150+ radar’s across the US, going back to the early 2000s

  • GHCN-Daily. Database of 100,000+ weather stations around the world. We recently highlighted this dataset in an Alphabet Soup piece.

  • GOES-16. Satellite data depicting cloud information for the eastern part of the United States.

COMPUTING

A good question I get when I talk about the cloud is “SO WHAT?” Many institutions have the ability to store data, and numerous public entities (such as NOAA and NASA) open their datasets so that users can access them free of charge. While this is all true, there’s one component these people might not be thinking about.

Let’s say you’ve developed a script that does an analysis on a weather station using GHCN-D. It takes approximately 2 seconds to run. No big deal right? But let’s say you want to run it on every station in the database. Currently, GHCN-D has 108,081 stations. Running the 2 second script on every station would take 216,162 seconds, or approximately 60 hours. That would take 2 and a half days to run the whole thing! What if it needs to be updated each day (the D in GHCN-D stands for daily by the way). Also, what if you found a bug in your script halfway through the run? It would be a nightmare to kill the process, fix the bug, and then wait another 2.5 days for the update.

This is where I think cloud computing will help out meteorologists and climatologists. Not only is the data stored there, but you can build an environment that can run this process in a parallel fashion. Instead of running the script 100,000 times on one server, it could in theory be run once on 100,000 servers, given the total processing time of only 2 seconds! What’s also nice is you can configure a server to your specifications. For example, if you want a linux server with a certain amount of RAM or CPU to run a process, or if you want a Windows server that has SQL and GIS capabilities built in. If you’ve built it locally, it can be built in the cloud!

SUMMARY: EMBRACE IT!

Lot’s of people (myself included) are often hesitant to try something new. Maybe because it’s a learning curve, or because what they are currently doing is good enough. In the case of cloud computing, I would argue that not only is it here to stay, but will become the norm very soon in the physical science community.

The biggest question people ask is the cost. While it does vary by company and by hardware / software requirements, in the end, it’s not as expensive as people think. My interactive page was processed in the cloud and is currently stored in an Amazon S3 bucket. When my first bill came, I was surprised that it came to a whopping $0.04.

If that doesn’t convince you, I would suggest giving it a try yourself. The big companies like Microsoft, Google, and Amazon provide a free tier that will let you play with it for a year. It’s a pretty sweet deal, and I would suggest checkin it out!