At its simplest the Internet of Things is about devices, data and connectivity. There is no point filling your factory or workplace with intelligent devices unless they’re linked up.
Equally there’s no point in those connections unless you do something with the data that is collected.
There are two main challenges to any IoT project:
How to deal with the avalanche of information and how to turn that data into knowledge and insight into your organisation.
The first challenge is about connectivity and computing power – linking all those devices so that the data can be collected.
This is a relatively straightforward infrastructure project, especially if you put at least some of the intelligence at the edge of your network – near where the sensors and devices are.
This is integral to the second point – turning data into decisions.
The dirty secret of big data projects, and of data science more generally, is that the vast majority of data collected is not useable when it is collected.
The process of turning data collected into data which a system can use is called ‘data cleaning’ or ‘data cleansing’.
This is the part of big data projects which tends to get ignored but can actually take up more than half the time and resources of a typical big data project. The popular perception of feeding raw data into a magic box which produces brilliant business decisions is quite wrong. Some experts say as much as 80 per cent of time and resources will be absorbed in preparing the data versus just 20 per cent consumed by the actual analysis.
The old computing saying of “rubbish in, rubbish out” remains true – if this crucial step is not carried out properly then the analysis will create wrong decisions.
This might be as simple as a formatting issue, putting all the data into a format which your intelligent system can understand and use.
But it is more likely to involve much more complex cleaning processes.
Some of this, like formatting, can be quite easily automated. But data cleansing remains as much an art as it is a science.
There is a risk from removing all human oversight. For example a common way to quickly clean experimental data is to ignore the outliers. These are results from a repeated experiment which are very different to the average results. The assumption is that something just went wrong so these numbers can be safely ignored.
But in business those outliers can be the very things which can lead to insight.
The difficulty is separating what is just dodgy data and what is crucial to improving business decisions. If your data cleansing process is removing inconvenient data then the decisions made will be worthless.
The increasing quantity of data collected by IoT devices makes this challenge ever more difficult to solve.
Machine learning can play a vital role in making this process work in a timely manner. These systems can focus the kaleidoscope of collected data into glimmers of insight into business processes.
But if you are starting out with a big data project don’t be disheartened if the first data set your system provides seems to be filled with background noise.
It is far better to be collecting everything and discarding what is not immediately useful than creating a system which ignores data which does not fit a pre-determined template.