by David Simmons
It’s no longer a question that the Internet of Things (IoT) is going to be huge. In fact, it already is. For years there have been predictions about the number of devices that would be connected to the Internet as part of IoT. It’s clear that these numbers will likely become reality. There will be billions of devices connected to the internet collecting sensor and other data over the next few years. There will be billions more connected every year after that. The number of devices connected to the internet is not really the whole story though, is it? The real story is the data generated by those devices.
Imagine that each device connected to the internet as part of IoT generates one byte per second. That’s an astonishingly low number, but it makes the calculations easy. For every billion devices that connect to the internet, that’s a gigabyte of data per second that is generated. That’s a terabyte of data every 15 minutes or so (well, 16.66 minutes, to be precise). That’s a Petabyte of information every 10 days or so. That’s a staggering amount of data. Now multiply that by 25 billion devices.
The real challenge with the IoT is not the devices. It’s the data. There’s a famous quote “Big Data, for better or worse: 90% of world’s data generated over last two years.” It’s used all the time, and has been since it was first published in 2013.  IoT shrinks that horizon from years to months, to weeks, and finally to hours. We are going to be generating that much data.
How are we going to deal with Data?
We’re going to have to find new ways to deal with IoT data if we have any hope of managing the tidal wave of data that is headed our way. The old way of just stuffing it into a database somewhere simply won’t work. The storage costs alone would bankrupt companies. Processing that much data in any meaningful way would be beyond the computing power we currently have. And yes, I know that no one entity would collect and process all that data, but if all of it were sent to, say, AWS IoT it would cripple them.
That is to say nothing of the network load that will be generated by IoT data. The scales of these problems are staggering, and not well thought out right now. Everyone is focused on device numbers, not data numbers.
And let’s remember the purpose of all those devices in the first place: the data. IoT is first and foremost about gathering sensor data in order to make faster, better, more targeted and more accurate business decisions about things that can be measured. IoT data is for real-time analysis and action in order to improve efficiency and lower costs.
Data collection, processing, and analysis is going to have to be highly distributed, since devices are already producing much larger data volumes than 1-byte per second. We’re going to have to start collecting, processing and analyzing data at the edge, not in the data center, if we are to have any hope of keeping up.
Where is the edge?
There’s a lot of discussion about where “the edge” is when it comes to IoT. Is the sensor device itself the edge? Technically, yes. It’s the furthest point from a data center or processing hub, so it is the absolute edge. But most sensors are not designed to be capable of processing and analyzing their own data. In addition, much of the data analysis that needs to occur requires synthesizing data from multiple sensors. A temperature from one sensor, the flow-rate from another sensor, and the pressure from a third might be what’s required to make an informed decision on whether there is a critical situation or just a small anomaly.
So the edge is not the sensor. But it’s also not the edge of the cloud. Instead, the true “edge” is the first-hop from the sensors themselves. It’s as close as you can get to where the data is generated where you have the possibility of having enough processing power to effectively do at least some data processing.
The edge needs to be data-aware, and capable of handling data and the response to data independently and while disconnected from the larger internet, at least for a time.
About the cloud
Cloud-based IoT data solutions are great, but IoT deployments cannot reliably depend on a cloud IoT solution. Why is that? Because there are many manufacturing and other IoT deployment scenarios where the exposure of the system to the wider Internet — which would be required for a public cloud infrastructure to succeed — is just too dangerous. Especially in the area of manufacturing, where exposing IoT projects to the internet can have devastating, even deadly consequences.
The most glaring example of what can happen when IoT devices are disrupted in manufacturing is the well-known STUXNET virus. The STUXNET virus was a highly targeted, highly destructive virus that targeted specific PLC controllers used inside Iran’s Nuclear centrifuges. It caused them to spin violently out of control and destroy themselves. Imagine if a similar IoT virus got loose in a manufacturing plant causing machines to spin wildly out of control where nearby workers could be hurt or killed. Still think the Industrial IoT can just use the cloud?
A Data layer for IoT
Coming full-circle, what IoT needs is a data layer. This is a layer that pervades the entire IoT deployment allowing for data collection, analysis, alerting, responses and even dashboards at any point along the data chain. A data layer that can exist with or without a cloud implementation. IoT needs a data layer that is scalable, easily deployed, and resilient to transient failures.
An IoT Data Layer would be deployable at all layers, on virtually all nodes, and ideally would be derived from a single codebase to reduce cost and complexity. An IoT Data Layer would be a pervasive, fault-tolerant, easily segmented, and enable nearly autonomous collection and response to data events throughout the system. A true IoT data layer would be equally as powerful on an embedded gateway device, or Wi-Fi router, as it is on an enterprise server or cloud-based cluster.
When looking for a data layer for your IoT deployment we recommend specifically looking for the following features:
- High efficiency data ingestion that can be distributed across the deployment from edge devices to public- or private-cloud instances
- Purpose-built high performance storage engine for the storage and querying of your IoT data – again, one that can be deployed at all levels of the architecture
- A distributed dashboard system that allows for customized visualization of your IoT data at any point across the infrastructure – give the people that most need access to the data access to it in an easy to understand format as close to them as possible.
- A high-performance data analysis and event processing engine that can automate processes where appropriate and handle the downsampling and forwarding of data and events upstream for permanent storage
- An overall data layer that incorporates automatic data-expiration for aging data which still maintains data integrity and historical trends in your IoT data for historical analysis.
Make sure that you have a true layer across your entire infrastructure dedicated to the collection, processing, analysis, display and reaction to your data in real time. Remember, if you’re not taking action on your IoT data, then why are you collecting and storing it in the first place?
David Simmons is a senior developer and IoT evangelist at InfluxData.