Data Management Aspects in IoT is a critical task as billions of IoT devices which are connected to each other generate a huge amount of data. This opened up several new challenges on (IoT) data management, giving rise to data sciences and BigData technologies.
The enormous data offers big opportunities fuelling digital economy with new directions such as Cloudonomics and IoTonomics, where data can be considered as a utility, a commodity to properly manage, curate, store, and trade. Therefore, to properly manage data in IoT contexts is not only critical but also of strategic importance for business players as well as for users, evolving into prosumers (producers-consumers).
For more information, watch this video clip: https://youtu.be/8NbP07OEGsQ
Following paragraphs are based on Introduction to the IoT.
Data generation and production are relevant parts of IoT, involving sensors probing the physical system. In a cyber-physical-social system view, such sensors could also be virtual (e.g. software), or even human (e.g. citizens, crowdsensing). Main issues to deal with in data production are related to the type and format of data, heterogeneity in measurements and similar issues. Semantics is the key to solve these issues, also through specific standards such as Sensor Web Enablement and Semantic Sensor Network.
Once data is generated, it should be gathered and made available for processing. The collection process needs to ensure that the data gathered is both defined and accurate so that subsequent decisions based on the findings are valid. Some types of data collection include census data collection about everything in a group or statistical population. Sample survey collection method that includes only part of the total population and administrative by-product data collection is a byproduct of an organisation’s day-to-day operations. Usually, wireless communication technologies such as Zigbee, BlueTooth, LoRa, NB-IoT, Wi-Fi and 3-4G and 5G networks are used by IoT smart objects and things to deliver data to collection points.
Is a specific preprocessing activity, usually performed at data source or data collector (IoT) nodes e.g. motes, base stations, hotspots, gateways, aiming at cleaning noisy data, filtering noise and not useful information.
In order to reduce bandwidth before sending data to processing nodes, these are further elaborated, compressed, aggregated and fused to reduce the overall volume of raw data to be transmitted and stored.
Once data is properly collected, filtered, aggregated, and fused, it can be processed. Processing can be both local and remote, and usually, also include preprocessing activities aiming at preparing data for real processing. Local processing, when possible, is mainly tasked at a fast, lightweight computation on edges as Edge computing, quickly providing results and local analytics. More complex computation is usually demanded to remote physical or virtual servers, either provided by local nodes e.g. communication servers in a Fog computing fashion, or by Cloud providers as virtual machines hosted in data centers. This kind of computation can also involve historical data, providing global analytics, but hardly meets time-constrained applications and real-time requirements.
Remote servers are also used for permanently storing and archiving data, making these available for further processing, even to third parties. The database is often used for that, mainly based on distributed, NoSQL key-store technologies to improve reliability and performance. Time-series data warehouses are also becoming more common. Such are, for example, InfluxDB.
The results of processing activities have to be then delivered to requestors and users. These must be therefore properly organized and formatted, ready for end-users. IoT data visualization is becoming an integral part of the IoT. Data visualization provides a way to display this avalanche of collected data in meaningful ways that clearly present insights are hidden within this mass amount of information.
Data privacy and security are among the most critical issues to address in IoT data management. Good results and reliable techniques for secure data transmission, such a TLS and similar, are available. This way, IoT data security issues mainly concern securing IoT devices, since they are usually resource constrained and therefore do not allow to adopt traditional cryptography scheme to data encryption/decryption. Data privacy and integrity should also be enforced in remote storage servers, anonymizing data as well as allowing owners to properly manage (monitoring, removing) them while ensuring availability. Indeed, security and privacy issues vertically span into the whole IoT stack. A promising technique to address IoT security issues, attracting growing interests from both academic and business communities, is blockchain.