IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • IEEE Xplore Digital Library
  • IEEE Standards
  • IEEE Spectrum

IEEE

Publications

IEEE Talks Big Data - Check out our new Q&A article series with big Data experts!

Call for Papers - Check out the many opportunities to submit your own paper. This is a great way to get published, and to share your research in a leading IEEE magazine!

Publications - See the list of various IEEE publications related to big data and analytics here.

Call for Blog Writers!

IEEE Cloud Computing Community is a key platform for researchers, academicians and industry practitioners to share and exchange ideas regarding cloud computing technologies and services, as well as identify the emerging trends and research topics that are defining the future direction of cloud computing. Come be part of this revolution as we invite blog posts in this regard and not limited to the list provided below:

  • Cloud Deployment Frameworks
  • Cloud Architecture
  • Cloud Native Design Patterns
  • Testing Services and Frameworks
  • Storage Architectures
  • Big Data and Analytics
  • Internet of Things
  • Virtualization techniques
  • Legacy Modernization
  • Security and Compliance
  • Pricing Methodologies
  • Service Oriented Architecture
  • Microservices
  • Container Technology
  • Cloud Computing Impact and Trends shaping today’s business
  • High availability and reliability

Call for Papers

No call for papers at this time.

IEEE Publications on Big Data

computer

Read more at IEEE Computer Society.

  

computer

IEEE Computer Magazine Special Issue on Big Data Management

  • Big Data: Promises and Problems

institutebigdata

Connecting the Dots With Big Data

  • Better Health Care Through Data
  • The Future of Crime Prevention
  • Census and Sensibility
  • Landing a Job in Big Data

Read more at The Institute.

Download full issue. (PDF, 5 MB)

IEEE Internet Computing - July/August 2014

IEEE Internet Computing July/August 2014

Web-Scale Datacenters

This issue of Internet Computing surveys issues surrounding Web-scale datacenters, particularly in the areas of cloud provisioning as well as networking optimization and configuration. They include workload isolation, recovery from transient server availability, network configuration, virtual networking, and content distribution.

Read more at IEEE Computer Society .

IEEE Network - July 2014

Networking for Big Data

The most current information for communications professionals involved with the interconnection of computing systems, this bimonthly magazine covers all aspects of data and computer communications.

Read more at IEEE Communications Society .

ieeemicro_bigdata

Special Issue on Big Data

Big data is transforming our lives, but it is also placing an unprecedented burden on our compute infrastructure. As data expansion rates outpace Moore's law and supply voltage scaling grinds to a halt, the IT industry is being challenged in its ability to effectively store, process, and serve the growing volumes of data. Delivering on the premise of big data in the post­Dennard era calls for specialization and tight integration across the system stack, with the aim of maximizing energy efficiency, performance scalability, resilience, and security.

Advertisement

Advertisement

A comprehensive and systematic literature review on the big data management techniques in the internet of things

  • Original Paper
  • Published: 15 November 2022
  • Volume 29 , pages 1085–1144, ( 2023 )

Cite this article

big data related research papers

  • Arezou Naghib   nAff1 ,
  • Nima Jafari Navimipour 2 , 3 ,
  • Mehdi Hosseinzadeh 4 , 5 , 6 &
  • Arash Sharifi 1  

11k Accesses

16 Citations

Explore all metrics

The Internet of Things (IoT) is a communication paradigm and a collection of heterogeneous interconnected devices. It produces large-scale distributed, and diverse data called big data. Big Data Management (BDM) in IoT is used for knowledge discovery and intelligent decision-making and is one of the most significant research challenges today. There are several mechanisms and technologies for BDM in IoT. This paper aims to study the important mechanisms in this area systematically. This paper studies articles published between 2016 and August 2022. Initially, 751 articles were identified, but a paper selection process reduced the number of articles to 110 significant studies. Four categories to study BDM mechanisms in IoT include BDM processes, BDM architectures/frameworks, quality attributes, and big data analytics types. Also, this paper represents a detailed comparison of the mechanisms in each category. Finally, the development challenges and open issues of BDM in IoT are discussed. As a result, predictive analysis and classification methods are used in many articles. On the other hand, some quality attributes such as confidentiality, accessibility, and sustainability are less considered. Also, none of the articles use key-value databases for data storage. This study can help researchers develop more effective BDM in IoT methods in a complex environment.

Similar content being viewed by others

big data related research papers

Systematic Literature Review on Data Provenance in Internet of Things

big data related research papers

Data quality and the Internet of Things

big data related research papers

Recent Research on Data Analytics Techniques for Internet of Things

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The Internet of Things (IoT) is an emerging information technology model and a dynamic network that enables interaction between self-configuring, smart, and interconnected devices and humans [ 1 ]. The IoT's ubiquitous data collection devices (such as Radio-Frequency Identification (RFID) tags, sensors, Global Positioning Systems (GPS), Geographical Information Systems (GIS), drives, Near-Field Communication (NFC), actuators, and mobile phones) collect and share real-time, mobile, and environmental data for automatic monitoring, identification, processing, maintenance, and control in real-time [ 2 , 3 , 4 ]. The IoT ecosystem has five main components generally: IoT devices, including sensors and actuators that collect data and perform actions on things; IoT connectivity, including protocols and gateways, that is responsible for creating communication in the IoT ecosystem between smart devices, gateways, and the cloud; an IoT cloud that is responsible for data storage, processing, analysis, and decision-making; IoT analytics and data management are responsible for processing the data; and end-user devices and user interfaces help to control and configure the system [ 5 ]. The most important applications of IoT include environmental monitoring, disaster management, smart homes/buildings, smart farms, healthcare, smart cities, urban, smart manufacturing, intelligent transport systems, smart floods, financial risk management, supply chain management, water management, enterprise culture, cultural heritage, smart surveillance, military tracking and environment, digital forensics, underwater environments, and understanding social phenomena [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 ]. The IoT devices and sensors in the Wireless Sensor Networks (WSN) generate large data. According to the international data corporation Footnote 1 forecast, the number of IoT devices will be 41.6 billion and generate 79.4 zettabytes of data in 2025. This massive structured, semi-structured, and unstructured data, which is expanding rapidly with time, results in "Big Data" [ 23 ]. "Big data" technologies are a new generation of distributed architectures and technologies that provide distributed data mining capabilities to inexpensively, valuable, and effectively extract value from a huge dataset with characteristics such as volume, velocity, variety, variability, veracity, and value [ 24 ]. Big data provides both opportunities and problems for organizations and enterprises. Big data can improve data precision, be used for forecasting and decision-making, and give stakeholders more in-depth analytical findings [ 2 ]. Traditional data processing systems cannot collect, process, manage, and interpret data effectively using conventional mechanisms. Therefore, it requires a scalable architecture or framework for effective capture, storage, management, and analysis [ 25 ].

A major challenge in implementing IoT in real and complex environments is analyzing heterogeneous data volumes that contain a wide variety of knowledge content [ 26 ]. Various platforms, tools, and technologies have been developed for big data monitoring, collecting, ingesting, storing, processing, analysis, and visualization [ 10 , 27 ]. These platforms and tools are Apache Hadoop, MapReduce, 1010data, Apache Storm, Cloudera, Cassandra, HP-HAVEn, SAP-Hana, Hortonworks, MongoDB, Apache Kafka, Apache Spark, Infobright, etc. Industries and enterprises use Big Data Analytics (BDA) with IoT technologies to handle the timely analysis of information streams and intelligent decision-making [ 28 , 29 , 30 ]. BDM in the IoT involves different analytic types [ 31 ]. Marjani et al. [ 29 ] discussed analytical types in real-time, offline, memory, business intelligence, and at massive levels. Singh and Yassine [ 28 ] divided analytical types into preprocessing, pattern mining, and classification. Gandomi and Haider [ 32 ] divided big data processing into two major phases: data management and data analytics. Also, Ahmed et al. [ 33 ] provided five aspects of big data: acquisition and storage; programming model; benchmark process; analysis; and application. Finally, ur Rehman et al. [ 34 ] divided BDA into five main steps: data ingestion, cleaning, conformation, transformation, and shaping.

However, despite the importance of BDM in the IoT and the rising challenges in this area, as far as we know, there is not any complete and detailed systematic review in this field. Hence, this paper tries to analyze the mechanisms of BDM in the IoT. The main contributions of this paper are as follows:

Presenting a study of the existing methods for BDM in the IoT.

Dividing BDM methods in the IoT are divided into four main categories: BDM processes, BDM architectures/frameworks, quality attributes, and big data analytics types.

Dividing the BDM process in the IoT into six main steps, including data collection, communication, data ingestion, data storage, processing and analysis, and post-processing.

Dividing the BDM architecture/framework in the IoT into two main subcategories: BDM architectures/frameworks in IoT-based applications and BDM architectures/frameworks in the IoT paradigms.

Exploring the primary challenges, issues, and future works for BDM in the IoT.

The following subsection discusses related work to show the main differences between this review and similar studies. Also, the abbreviations used in this paper are presented in Table 1 .

1.1 Related work and contributions of this review

This section studies some reviews and survey articles that work on BDM in the IoT to highlight the need for reviewing them. In addition, this section describes the main advantages and disadvantages of this article to distinguish this one.

Ahmed et al. [ 27 ] analyzed several techniques for IoT-based big data. This article categorizes the literature based on parameters, including big data sources, system components, big data enabling technologies, functional elements, and analytics types. The authors also discussed connectivity, storage, quality of services, real-time analytics, and benchmarking as the critical requirements for big data processing and analytics.

Constante Nicolalde et al. [ 35 ] overviewed the technical tools used to process big data and discussed the relationship between BDA and IoT. The big data challenges are divided into four general categories: data storage and analysis; the discovery of knowledge and computational complexities; information security; and scalability and data visualization.

Talebkhah et al. [ 36 ] investigated the architecture, challenges, and opportunities of big data systems in smart cities. This article suggested a 4-layer architecture for BDM in smart cities. The layers of this architecture are data acquisition, data preprocessing, data storage, and data analytics. This article also considered the opportunities and challenges for smart cities, such as heterogeneity, design and maintenance costs, failure management, throughout, etc.

Bansal et al. [ 37 ] investigated state-of-the-art research on IoT and BDM. This article proposed a taxonomy based on BDM in the IoT applications, including smart transport, smart cities, smart buildings, and smart living. BDM steps are considered as data acquisition, communication, storage, processing, and retrieval. Also, the related surveys on BDM were divided into three general categories: surveys on IoT BDA, domain-specific surveys on IoT big data, and surveys on challenges in IoT big data. The authors classified the articles based on four major vendor services (Google, Amazon, Microsoft, and IBM) to integrate IoT and IoT big data with case studies. The big data management challenges in the IoT are considered based on 13 V’s challenges.

Marjani et al. [ 29 ] investigated state-of-the-art research efforts directed toward big IoT data analytics and proposed a new architecture for big IoT data analytics. This article discusses big IoT data analytic types under real-time, offline, memory-level, business intelligence, and massive level analytics categories.

Simmhan and Perera [ 38 ] presented the analytics requirements of IoT applications. They defined the relationship between data volume capacity and processing latency of new big data platforms. This article divided decision systems into visual analytics, alerts and warnings, reactive systems, control and optimization, complex systems, knowledge-driven intelligent systems, and behavioral and probabilistic systems.

Shoumy et al. [ 39 ] discussed frameworks and techniques for multimodal big data analytics. They divided multimodal big data analytics techniques into four topics: affective framework; multimodal framework; big data and analytics framework; and fusion techniques. Furthermore, Ge et al. [ 40 ] discussed the similarities and differences among big data technologies used in IoT domains and developed a conceptual framework. This article interpreted big data research and application opportunities in eight IoT domains (healthcare, energy, transportation, building automation, smart cities, agriculture, industry, and military) and discussed the advantages and disadvantages of big data technologies. In addition, it examined four aspects of big data processes: storage, cleaning/cleansing, analysis/analytics, and visualization.

Siow et al. [ 41 ] considered the analytics infrastructure from data generation, collection, integration, storage, and computing. This article presented a comprehensive classification of analytical capabilities consisting of five categories: descriptive, diagnostic, discovery, predictive, and prescriptive analytics. In addition, a 3-layered taxonomy of data analytics was presented, including data, analytics, and applications.

Fawzy et al. [ 42 ] investigated the techniques and technologies of IoT systems from BDA architectures and software engineering perspectives. This article proposed a taxonomy based on BDA systems in the IoT, including smart environments, human, network, energy, and environmental analytics. The BDA target, approach, technology, challenges, software architecture and design, model-driven engineering, separation of concerns, and system validation and verification. The authors presented the IoT data features as multidimensional, massive, timely, heterogeneous, inconsistent, traded, valuable, and spatially correlated. The proposed domain-independent BDA-based IoT architecture has six layers. The layers of architecture are data manager, system resources controller, system recovery manager, BDA handler, software engineering handler, and security manager.

Zhong et al. [ 43 ] investigated using BDA and data mining techniques in the IoT. This article divided the review articles into four categories: architecture and platform, framework, applications, and security. The data mining methods for BDA in the IoT were discussed in these four categories. The challenges investigated in the article are as follows: data volume, data diversity, speed, data value, security, data visualization, knowledge extraction, and real-time analysis.

Hajjaji et al. [ 44 ] discussed applications, tools, technologies, architectures, current developments, challenges, and opportunities in big data and IoT-based applications in smart environments. This article divided the benefits of combining the IoT and big data into six categories: multi-source and heterogeneous data; connectivity; data storage; data analysis; and cost-effectiveness.

Ahmadova et al. [ 45 ] discussed big data applications in the IoT. They proposed a taxonomy of big data in the IoT that includes healthcare, smart cities, security, big data algorithms, industry, and general view. In the article, the authors discussed big data technologies' advantages and disadvantages for IoT domains. Also, the evaluation factors that are considered in the article are security, throughput, cost, energy consumption, reliability, response time, and availability.

Table 2 shows the summary contributions of related survey articles. The publication year, methodology, discussion, and other disadvantages are shown for each article in this table. Due to the existing weaknesses in the review articles, this paper presents a systematic literature review and a proper categorization of BDM mechanisms in the IoT that addresses the shortcomings as follows:

This paper provides a complete research methodology that includes research questions and the article selection process.

This paper discusses the newly proposed mechanisms for BDM in the IoT between 2016 and August 2022.

This paper considers the architectures/frameworks of IoT-based applications, including healthcare, smart cities, smart homes/buildings, intelligent transport, traffic control and energy, urban planning, and other IoT applications (smart IoT systems, smart flood, smart farms, disaster management, laundry, digital manufacturing, and smart factory).

This paper investigates the quality attributes and categorizes the review articles based on the quality attributes used and the reference model of standard software quality attributes, i.e., ISO 25010.

This paper classifies the review articles based on BDA types in the IoT and their tactics.

This paper considers the big data storage systems and tools in the IoT based on relational databases, NoSQL databases, distributed file systems, and cloud/edge/fog/mist storage.

This paper discusses the BDM process in six steps: data collection, communication, data ingestion, data storage, processing and analysis, and post-processing, and proposes the main tools in each step.

This paper presents open issues and challenges on BDM in the IoT and divides challenges into two categories: BDM in the IoT and quality attributes challenges.

The rest of the paper is structured as follows: Sect.  2 explains the research methodology and the article selection process. The categories of the BDM methods in the IoT and their comparison are described in Sect.  3 . Section  4 discusses the challenges and some open issues. Finally, Sect.  5 represents the conclusion and the paper’s limitations.

2 Research methodology

Systematic literature review (SLR) is a research methodology that examines data and findings of the researchers relative to specified questions [ 46 , 47 ]. It aims to find as much relevant research on the defined questions as possible and to use explicit methods to identify what can reliably be said based on these studies [ 48 , 49 ]. This section provides an SLR to understand the BDM techniques in the IoT. The following subsection will explain the research questions and the article selection process.

2.1 Research questions

This study focuses more explicitly on the articles related to BDM in the IoT, focusing on their advantages and disadvantages, architectures, processing and analysis methods, storage systems, evaluation metrics, and tools. To achieve the goals mentioned above, the following research questions are presented.

RQ1: What is BDM in IoT?

Section  1 answered this question.

RQ2: What is the importance of BDM in the IoT?

This question aims to show the number of published articles about BDM in IoT between 2016 and August 2022.

Section  2 answers this question.

RQ3: How are the articles searched and chosen to be assessed?

Section 2.2 discusses the question.

RQ4: What are the classifications of BDM methods in the IoT?

This question aims to show the existing methods of BDM in the IoT environment. Section  3 will discuss this answer.

RQ5: What are the challenges and technical issues of BDM in the IoT?

This question identifies the challenges for BDM in the IoT and provides open issues for future research. Section  4 will discuss this answer.

2.2 Article selection process

In this study, the article’s search and selection process consists of three stages. These stages are shown in Fig.  1 . In the first stage, the articles between 2016 and August 2022 were searched based on the keywords and terms (presented in Table 3 ). These articles are the results of searching popular electronic databases. These electronic databases include Google Scholar, Elsevier, ACM, IEEE Explore, Emerald Insight, MDPI, Springer Link, Taylor and Francis, Wiley, JST, Dblp, DOAJ, and ProQuest. The articles include journals, chapters, conference papers, books, notes, technical reports, and special issues. 751 articles were found in Stage 1. In Stage 2, there are two steps to select the final number of articles to review. First, the articles are considered based on the inclusion criteria in Fig.  2 . There are 314 articles left at this stage. Next, the review articles are removed; of the remaining 314 articles in the previous stage, 85 (27.07%) were review articles. Elsevier has the highest number of review articles (31.76%, 27 articles). EMERALD and Taylor and Francis have the lowest number of reviewed articles (2.35%, one article). The highest number of published review articles is in 2019 (24.71%), and the lowest is in 2022 (8.24%). The number of remaining articles at this stage is 229. In Stage 3, the title and abstract of the articles are reviewed. Also, to ensure that the articles are relevant to the study, we reviewed the methodology, evaluation, discussion, and conclusion sections. The number of selected articles retained at this stage is 110. Elsevier publishes most of the selected articles (30.91%, 34 articles). The lowest number is related to ACM (0.91%, one article). 2018 has the highest number of published articles (26.36%, 29 articles). The Future Generation Computer Systems journal publishes the highest number of articles (11.82%, 13 articles).

figure 1

Articles search and selection process stage

figure 2

Inclusion criteria in the articles selection process

3 Big data management approaches in the IoT

This section presents four different categories for the reviewed articles. These categories include the BDM process in the IoT (Sect. 3.1 ), BDM architectures/frameworks for IoT applications (Sect. 3.2 ), quality attributes (Sect. 3.3 ), and big data analytics types (Sect. 3.4 ). Each category has subcategories that will be considered in its relevant section. Figure  3 shows this taxonomy.

figure 3

Taxonomy of the selected articles

3.1 Big data management process in the IoT

This section categorizes articles based on BDM process mechanisms and presents a comprehensive framework for BDM in the IoT. The comprehensive framework for BDM in the IoT is shown in Fig.  4 . The steps of BDM in IoT include data collection, communication, data ingestion, storage, processing and analysis, and post-processing.

figure 4

Big data management framework in IoT

3.1.1 Data collection

A variety of sources generates IoT data. There are different mechanisms for IoT data collection, but there is still no fully efficient and adaptive mechanism for IoT data collection [ 50 ]. This paper divides IoT sources into sensors, applications, devices, and other resources. Figure  5 shows the classification of the sources based on these four categories.

figure 5

Big data sources categories in IoT

3.1.2 Communication

The data sources are located on various networks, such as IoT sensor networks, wired and wireless sensor networks, fiber-optic sensor networks, and machine-to-machine communications. Communication technologies are required to process and analyze these data sources [ 51 , 52 ]. There are several communication technologies and protocols in the IoT. The communication protocols used in the articles are IPV6, RPL, MQTT, CoAP, SSL, AMQP, Websocket, 6LowPANIPV6, Alljoyn, TCP/IP, HTTP/IP. Communication technologies are compared based on frequency, data rate, range, power usage, cost, latency, etc. There are several categories of these communication technologies. This paper divides big data communication technologies in the IoT based on distance criteria into three categories: pan, local, and WAN. Table 4 shows the articles' classification based on these three categories. Wi-Fi, ZigBee, Bluetooth, and 4G LTE are of the utmost importance in communication technology, with a total number of 29, 19, 17, and 17 articles, respectively.

3.1.3 Data ingestion

Data ingestion is the process of importing and transporting data in different formats from various sources (shown in Fig.  4 ) to a storage medium, processing and analyzing platform, and decision support engines [ 93 , 94 ]. The quality of the dataset used by ML-based prediction models (classification) plays a vital role in BDM in the IoT. A prediction model requires a lot of correctly labeled data for correct construction, assessment, and accurate result generation [ 95 ]. Therefore, the data ingestion layer should handle the enormous volume, high speed (velocity), variety, value, variable, and validated data for the processing and analysis step. In different articles, this layer has multiple tasks. The data ingestion layer in [ 96 ] includes identification, filtration, validation, noise reduction, integration, transformation, and compression. The data ingestion layer in [ 97 ] provides data synchronization, data slicing, data splitting, and data indexing. Also, the data ingestion layer in [ 98 ] includes data stream acquisition, data stream extraction, enrichment, integration, and data stream distribution. Finally, the data ingestion layer in [ 99 ] includes data cleaning, data integration, and data compression.

There are three categories of data ingestion technologies: real-time data ingestion, batch data ingestion, and both. Real-time data ingestion is used for time-sensitive data and real-time intelligent decision-making. Batch data ingestion is used for data collection from sources at regular intervals (daily reports and schedules) [ 100 ]. There are many tools and platforms for data ingestion, such as Apache Kafka, Apache NIFI, Apache Storm, Apache Flume, Apache Sqoop, Apache Samza, Apache Minifi, Confluent Platform, and Elastic Logstash. These tools can be compared based on throughput, latency, scalability, and security [ 98 ]. The data ingestion layer in this paper includes data cleaning, data integration, data transformation/ discretization, and data reduction. Each of these steps uses special tools, methods, and algorithms. Table 5 shows the categorization of articles based on the tools that are used for data ingestion. Data ingestion tools have been compared based on ingestion type, throughput, reliability, latency, scalability, security, and fault tolerance. Platforms in some articles use a combination of these tools, such as the Horton data flow platform in [ 101 ], including Apache NiFi/MiNiFi, Apache Kafka, Apache Storm, and Druid tools. As you can see in Table 5 , Apache Kafka is of utmost importance to the data ingestion tool, with a total of 8 articles. Also, Table 6 shows the categorization of articles based on the big data preprocessing stage in the IoT.

3.1.4 Data storage

This subsection categorizes articles based on storage mechanisms. The articles use various methods and tools to store big data. This study divides these mechanisms into four categories: relational, NoSQL, Distributed File Systems (DFS), and cloud/edge/fog/mist storage. Each of these categories has subcategories. One of the most critical big data challenges is the categorization and scalability that traditional relational databases such as MySQL, SQL Server, and Postgres cannot overcome. Therefore, NoSQL databases are used to store big data. NoSQL technologies are divided into four categories: key-value, column-oriented, document-oriented, and graph-oriented [ 102 ]. These NoSQL technologies have many platforms to support their operations. Key-value storage is the most straightforward and highly flexible type of NoSQL database and stores all the data as a pair of keys and values. A document-oriented database stores data as a set of columns. In a relational database, data is stored in rows and read row-by-row. A graph database focuses on the relationships between data elements, and each element is stored as a node. Tables 7 and 8 show the types of storage methods used in articles. Table 7 shows the classification of articles based on relational databases, NoSQL databases, and DFS. As you can see, any of the 110 selected articles do not use the key-value databases. In relational databases, Hive, NoSQL databases, Hbase, and distributed file systems, HDFS is most commonly used. Table 7 compares these storage tools and platforms based on in-memory database/storage or disk-based, data type, scalability, security, availability, flexibility, performance, fault-tolerant, easy to use, and replication.

Table 8 shows the classification of articles based on cloud/edge/fog/mist storage. Cloud computing provides scalable computing, high data storage, processing power, and ensures the quality of the applications. However, it has main challenges such as latency, network overhead, bandwidth, data privacy, lower real-time responsiveness, location awareness, security, reliability, data availability, and accessibility [ 103 ]. Network architectures came into existence to overcome these challenges, such as fog, edge, and mist computing, that move the data and computation closer to the consumer and reduce some of the workloads from the cloud [ 104 ].

Fog computing is a type of decentralized computing that is between cloud storage and IoT devices. Fog computing reduces service latency, bandwidth, energy consumption, storage, and computing costs and improves the QoS [ 149 ]. The fog computing for the IoT model supports real-time services, mobility, and geographic distribution [ 150 ]. Another alternative approach to cloud computing is edge computing. Data storage and processing in edge computing occur closer to the device or data source to improve data locality, performance, and decision-making [ 151 ]. Edge computing is less scalable than fog computing but provides near real-time analytics and high-speed data access and reduces data leakage during transmission [ 104 , 152 ]. Mist computing is an intermediate layer between fog/cloud and edge computing. It can improve the fog/cloud challenges, such as response time, location awareness, data privacy, local decision-making, network overhead, latency, and computing and storage costs. Mist nodes had low processing power and storage [ 153 ]. In some articles, in addition to using cloud/edge/fog/mist storage, HDFS and NoSQL databases are used alongside these technologies. The goal is to overcome the disadvantages of these technologies by using them together.

3.1.5 Processing and analysis

Big data processing and analysis in the IoT are techniques or programming models for extracting knowledge from large amounts of data for supporting and providing intelligent decisions [ 154 ]. Efficient big data processing and analysis in IoT can help mitigate many challenges in event management, action management, control and monitoring, improved customer service, cost savings, improve business relationships [ 155 ], etc. This paper divides the big data processing and analysis step in IoT into a set of sub-steps: batch and stream processing, query processing, statistical and numerical analysis, graph processing, ML, resource management, and infrastructure/containers. Table 9 shows the articles' classification and comparison of the tools based on criteria: throughput, reliability, availability, latency, scalability, security, flexibility, ease of use, and cost-effectiveness. Big data processing in the IoT is generally done at both batch and stream levels. Many tools, platforms, and frameworks exist for batch and stream processing. The tools used in the articles are Apache Hadoop, Apache Spark, Map Reduce, Apache Storm, Apache Flink, Anaconda, Apache S4, Weka, streaming analytics manager, and CEP.

As you can see in Table 9 , Apache Hadoop, MapReduce, and Apache Spark are the most critical quality attributes, with a total number of 45, 32, and 31 articles, respectively. Some of these tools include a set of libraries and procedures for efficient processing and analysis. In the study, the libraries and functions used by the articles are Hadooppcap-lib, Hadoop-pcap-serde, Hadoop-pcap-input (Apache Hadoop), MLlib, GraphX, Spark Streaming, Spark SQL, Spark Core (Apache Spark), Map, FlatMap, Filter, Reduce, Shuffle (Map Reduce), Gelly, FlinkML, Table and SQL, FlinkCEP (Apache Flink), NumPy [ 132 ], Keras [ 108 ], Pandas [ 59 ], and Scikit-Learn, Paho-MQTT (Anaconda). Also, various algorithms and methods are used to process and analyze data, such as classification, clustering, regression, optimization algorithms, and SVM. Most of these tools have these algorithms.

3.1.6 Post-processing

The post-processing step is another vital task in knowledge discovery from big data in the IoT. This paper divides the post-processing step into evaluation and selection (data governance), virtualization/dashboard, intelligent decision, and service and application. The evaluation and selection stage evaluates results obtained using test methods on different types of datasets. There are various criteria for assessing the results. In this section, the articles are categorized based on the methods they used for the test. These methods are divided into four categories, including test methods, classification, clustering, and regression. Each of them uses various criteria for evaluation. Table 10 shows the articles’ classification based on these four categories. The virtualization/dashboard stage uses tools, graphs, tables [ 75 ], graphical user interface [ 59 ], and charts [ 92 ] to display the results. Intelligent decisions can be made using stochastic binary decisions [ 156 ], ML, pattern recognition, soft computing, and decision models [ 51 , 53 , 74 ]. These tools are Kibana, Plotly, Tableau, Microsoft Power BI, Grafana, vSphere, NodeJS, and Matplotlib [ 59 , 105 , 106 , 109 , 110 , 113 , 140 ].

Tables 11 and 12 show the relevant datasets that the articles used for investigating/numerically assessing techniques for BDM in the IoT. These datasets are divided into two categories: 1) categorized based on characteristics including dataset name, repository, dataset characteristics, attribute characteristics, number of instances/size, and number of attributes 2) categorized based on characteristics including dataset name, website address, and size. As you can see, the UCI machine learning repository has been repeatedly used in articles as a repository to access techniques for BDM in the IoT.

3.2 Big data management architectures/frameworks in the IoT

This subsection investigates and analyzes the articles that (71 articles) presented the frameworks and architectures for BDM techniques in the IoT. These articles are divided into two categories: BDM architectures/frameworks in the IoT-based applications (63 articles) and BDM architectures/frameworks in the IoT paradigms (8 articles).

3.2.1 Big data management architectures/frameworks in the IoT applications

The architectural models used in the selected articles are layered, component-based, and cloud/fog-based architecture. A layered architecture is organized hierarchically, and each layer performs a service. The layered architecture ensures the system is more adaptable to emerging technologies at each layer and improves the acquisition and integration of data processes [ 167 ]. Component-based architecture is a framework that decomposes the system into reusable and logical components. The advantages of component-based architecture are increased quality, reliability, component reusability, and reduced time. Operations and components related to processing or storage in cloud-based or fog-based architectures are placed in the cloud or fog. Most of the proposed architectures are layered, and the most common types of BDM architectures in the IoT are 3-layer and 4-layer (22 and 20 articles). Also, most of the proposed architectures are in IoT-based healthcare, equivalent to 33.33%, followed by IoT-based smart cities, which equals 22.22%. The selected articles in this study used nine different OS for BDM in the IoT. Ubuntu is the most important OS, with 18 articles. Articles used programming languages to analyze and process big data in the IoT. Java, Python, and MATLAB are the major programming languages. In the following, these architectures and frameworks will be examined. For a better presentation, we have divided these architectures and frameworks into seven categories in terms of IoT applications (healthcare, smart cities, smart home/building, intelligent transport, traffic control and energy, urban planning, and other IoT applications (smart IoT systems, smart flood, smart farms, disaster management, laundry, digital manufacturing, and smart factory)). Then we review the attributes of the architectures and frameworks, including layers, the functions of the layers, the operating system, the programming language, and the advantages and disadvantages of each.

3.2.1.1 BDM architectural/framework for IoT-based healthcare

Predicting health and disease and preventing deaths are essential in our modern world [ 168 , 169 ]. Healthcare IoT (e.g., electronic and mobile health) uses wireless body sensor networks for monitoring the patients’ environmental, physiological, and behavioral parameters [ 170 ]. Wearables and other IoT devices within the healthcare industry generate a large amount of data. The health data must be collected, stored, processed, and analyzed for future intelligent decision-making. BDA plays a vital role in minimizing computation time, predicting the future status of individuals, providing reliable health services, prevention, healthy living, population health, early detection, and optimal management [ 133 , 158 , 171 ]. There are the BDM mechanisms’ objectives and requirements for different types of medical data [ 172 ]. Various research has presented many mechanisms for BDM in IoT-based healthcare that have advantages and disadvantages. Therefore, this subsection examines the articles (21 articles; 33.33%) that discussed the architectures or frameworks of BDM in IoT-based healthcare.

Rathore et al. [ 58 ] proposed Hadoop-based intelligent healthcare using a BDA approach. This system collected the big data and directed them to a 3-unit smart building for storing and processing. The units of this system are big data collection, Hadoop processing, and analysis and decision. This system used the 5-layer architecture for parallel, real-time, and offline processing. The layers of this architecture are the data collection, communication, processing, management, and service. The data collection layer includes data sensing, acquisition, buffering, and filtration. The big data are divided into small pieces in the processing layer, processed in parallel using HDFS and MapReduce, and stored. The management layer uses medical expert systems for processing the results and recommending corresponding actions.

Chui et al. [ 126 ] proposed a 6-layer architecture for patient behavior monitoring based on big data and IoT. Message queue, Apache Hadoop, behavior analytics, Mongo database, distributed stream processing, and exposer are the layers of this architecture. This architecture uses Hadoop for processing (descriptive, diagnostic, predictive, and prescriptive analytics), MongoDB for storing, Spark/Flink/Storm for stream processing, and Apache Kafka for breaking up the data stream into several partitions. Also, the authors have discussed the challenges of trust, security, privacy, and interoperability in the healthcare research field.

Ullah et al. [ 140 ] proposed a lightweight Semantic Interoperability Model for Big-Data in IoT (SIMB-IoT). The SIMB-IoT model has two main components: user interface and semantic interoperability. The semantic interoperability component is divided into three subcomponents: semantic interoperability, cloud services, and big data analytics. IoT data is collected and directed into an intelligent health cloud for online storage and processing. After processing, it sends suitable medicines to the patient’s IoT devices. This article used the SPARQL query to find hidden patterns.

Elhoseny et al. [ 173 ] presented a Parallel Particle Swarm Optimization (PPSO) algorithm for IoT big data analysis in cloud computing healthcare applications. This article aims are: optimize virtual machine selection and storage by using GA, PSO, and PPSO algorithms; real-time processing; and reducing the execution time. This architecture has four components: stakeholders’ devices; tasks; cloud broker; and network administrator. The cloud broker sends and receives requests to the cloud. The network administrator finds the optimal selection of virtual machines in the cloud for task scheduling.

Manogaran et al. [ 141 ] proposed a secured cloud-fog-based architecture for storing and processing real-time data for health care applications. This architecture has two sub-architectures: meta fog-redirection and grouping and choosing architectures. The meta fog-redirection architecture has three phases: data collection, data transfer, and big data storage. The data collection phase collected data from sensors in fog computing. The data transfer phase used the ‘s3cmd utility’ method for transferring data to Amazon S3.The big data storage phase used Apache Pig and Apache HBase for storage. The grouping and choosing architecture protects data and provides security services in fog and cloud environments. Also, this architecture used MapReduce to predict.

García-Magariño et al. [ 156 ] is an agent-based simulation framework for IoT BDA in smart beds. This framework has two layers: the primary mechanism for simulating sleepers' postures and the information's analyzer. The first layer provides the simulation of the poses of sleeper mechanisms. The second layer analysis collected data from the first layer. The agent types in this framework are sleeper agent, weight sensor agent, bed agent, observer agent, analyzer agent, stochastic sleeper agent, bed sleeper agent, restless sleeper agent, and healthy sleeper agent. This framework helps researchers to test different sleeper posture recognition algorithms, discusses other sleeper behaviors, and performs online or offline detection mechanisms.

Yacchirema et al. [ 59 ] proposed a 3-layer architecture for sleep monitoring based on IoT and big data at the network's edge. The layers of this architecture are the IoT layer, the fog layer, and the cloud layer. The IoT layer collected and aggregated the big data and directed them to the fog layer. The fog layer is responsible for connectivity and interoperability between heterogeneous devices, preprocessing the collected data, and sending notifications to react in real-time. The big data is stored, processed, and analyzed in the cloud layer for intelligent decision-making. This layer has three modules: data management, big data analyzer, and web application. This architecture used HDFS for data storage and Spark for offline and real-time processing.

BigReduce [ 137 ] is a cloud-based IoT framework for big data reduction for health monitoring in smart cities that focuses on reducing energy costs. This framework has two schemes: real-time big data reduction and intelligent big data decision-making. The big data reduction is made in two phases: at the time of acquisition and before transmission using an event-insensitive frequency content process.

Ma et al. [ 33 ] proposed a 3-layer architecture for the IoT big health system based on cloud-to-end fusion. The layers of this architecture are the big health perception layer, transport layer, and big health cloud service layer. In the big health perception layer, data are collected and preprocessed. The transport layer sends data to sensor nodes and receives data from the perception layer using network technologies. The big health cloud service layer has two sub-layers: the cloud service support and the cloud service application. The cloud service support sub-layer is responsible for compressing, storing, processing, and analyzing the real-time data. The cloud service application sub-layer is the interface between users and health networking. This sub-layer controls the sensor nodes and visualizes the big data.

Rathore et al. [ 61 ] proposed the 5-layer architecture for big data IoT analytics-based real-time medical emergency response systems. The data collection layer is responsible for data sensing, acquisition, buffering, filtration, and processing. This layer collected and aggregated data using a coordinator or relay node and transmitted them to a polarization mode dispersion. The communication layer provides device-to-device communication to various smart devices. The processing layer divides big data into small chunks. Each chunk is processed separately, aggregated, and stored. This article used MapReduce, HDFS, and Spark for data processing and analysis. The management layer is responsible for managing all types of outcomes using a medical expert system. The service layer is the interface between end-users and health networking. This architecture minimized the processing time and increased the throughput.

El‐Hasnony et al. [ 84 ] proposed a hybrid real-time remote patient monitoring framework based on mist, fog, and cloud computing. This article provided the 5-layer architecture for near real-time data analysis. The layers are the perception layer, the mist layer, the fog layer, the cloud layer, and the service provider layer. The mist layer is responsible for data filtering, data fusion, anomaly detection, and data transmission to the fog layer. The fog layer has done local monitoring and analysis, data aggregation, local storage, data pre-analysis, and data transmission to the cloud layer. The cloud layer implemented several data analytics techniques for intelligent decision-making and storage. This article presented a case study comparing traditional data mining techniques, including REPtree, MLP, Naive Bayes (NB), and sequential minimal optimization algorithms. The results showed that the REPtree algorithm achieved better accuracy, and the NB achieved the least time.

Harb et al. [ 106 ] proposed the 4-layer architecture for real-time BDA for patient monitoring and decision-making in healthcare applications. The layers of this platform are real-time patient monitoring, real-time decision and data storage, patient classification, and disease diagnosis, and data retrieval and visualization. The first layer is responsible for data ingestion using Kafka and Sqoop tools. The second layer processes and stores data using Spark and Hadoop HDFS. This layer preprocesses data and finds the missing records using MissRec (a script for Spark). The third layer is responsible for classification data using stability-based K-means, an adapted version of K-means clustering, and disease diagnosis using a modified version of the association rule mining algorithm. The last layer retrieves and visualizes data to understand the patient’s situation using Hive, SparkSQL, and Matplotlib.

Zhou et al. [ 62 ] proposed a data mining technology based on the IoT. The layers of the proposed functional architecture are the data acquisition layer, data transmission layer, data storage layer, and cloud service center layer. This article used the WIT120 system for data collection, the adaptive k-means clustering method based on the MapReduce framework for data preprocessing, HDFS for storing, and the GM (1,1) grey model for users’ health status prediction.

Hong-Tan et al. [ 90 ] proposed a real-time Ambient Intelligence assisted Student Health Monitoring System (AmIHMS). The data required by time ambient intelligence environments are collected from the WSN and sent to the cloud for handling. Their work developed a framework for real-time effective alerting of student health information. The AmIHMS architecture has three layers. The IoT layer collects health data from medical devices and sensors and saves it on one mobile computer or smartphone. The cloud layer receives the data through internet platforms such as 4G, 5G, LTE, etc., and executes the mining algorithms to extract relevant data for processing. The student health monitoring layer performs four stages to provide information and warnings about student health status. These stages include data retrieval, preprocessing, normalization, and classification/health status recognition.

Li [ 30 ] designed the fog-based Smart and Real-time Healthcare Information Processing (SRHIP) system. SRHIP architecture has three layers. IoT body sensor network layer performs data collection (health, environment, and locality), aggregation, compression, and encryption. Fog processing and computation layer use Spark and Hadoop ecosystem for information extraction, data normalization, rule engine, data filtration, and data processing. This layer performs the classification using the NB classifier. The cloud computation layer performs in-depth data analysis, storage, and decision-making. SRHIP minimizes the delay, transmission cost, and data size. This article uses hierarchical symmetric key data encryption to increase confidentiality.

The Improved Bayesian Convolution Network (IBCN) was proposed for human activity recognition [ 87 ]. The system architecture includes Wi-Fi and clouds onboard applications. The combination of a variable autoencoder with a standard deep net classifier is used to improve the performance of IBCN. This article used the convolution layers to extract the features and Enhanced Deep Learning (EDL) for security issues. IBCN provided the ability to download data via traditional radio frequency or low-power back-distribution communication. According to the experimental analysis, the proposed method allows the network to be continuously improved as new training sets are added and distinguishes between data-dependency and model-dependency. This architecture has high accuracy, versatility, flexibility, and reliability.

Sengupta and Bhunia [ 88 ] implemented a 3-layer IoT-enabled e-health framework for secure real-time data management using Cloudlet. The IoT layer uses IoT Hub for communicating with IoT devices. The Cloudlet layer is an intermediate layer between the IoT and cloud layers. This layer performs in-depth healthcare data analytics and processes. The cloud layer performs various analytics applications and processes queries. This framework uses SQLite for data storage in IoT Hub and Cassandra for future storing of sensed data. The result demonstrated that this framework has high efficiency, low data transmission time, low communication energy, data-packet loss, and query response time.

IBDAM [ 133 ] is an Intelligent BDA Model for efficient cardiac disease prediction in the IoT using multi-level fuzzy rules and valuable feature selection. This article used the open-source UCI database. First, it performs preprocessing on the UCI database, and the next step uses multi-level fuzzy rule generation for feature selection. IBDAM uses an optimized Recurrent Neural Network (RNN) to train the features. Finally, the features are classified into labeled classes according to the risk of evaluation by a medical practitioner. The results of this article demonstrate that this architecture has high performance and is quick and accurate.

Ahmed et al. [ 158 ] proposed an IoT-based health monitoring framework for pandemic disease analysis, prediction, and detection, such as COVID-19, using BDA. In this framework, the COVID-19 data set is collected from different data sources. Four data analysis techniques are performed on these data, including descriptive, diagnostic, predictive, and prescriptive. The experts opine on the results, and then users receive the results of these analyses through the internet and cloud servers. This article uses a neural network-based model for diagnosing and predicting the pandemic. The results of this article indicated that the accuracy, precision, F-score, and recall of the proposed architecture are better than AdaBoost, k-Nearest Neighbors (KNN), logistic regression, NB, and linear Support Vector Machine (SVM).

Ahanger et al. [ 71 ] proposed an IoT-based healthcare architecture for real-time COVID-19 data monitoring and predicting based on fog and cloud computing. This architecture has four layers. The data collection layer collects data from sensors and uses protocols to guarantee information security. The information classification layer classifies the information into four classes: health data, meteorological data, location data, and environmental data. The COVID-19-mining and extraction layer is responsible for splitting information into two groups using a fuzzy C-means procedure in the fog layer. The COVID-19 prediction and decision modeling layer use temporal RNN for estimating the results of the COVID-19 measure and a self-organization map-based technique to increase the perceived viability of the model. This article, in contrast to the existing methods, has high classification efficiency, viability, precision, and reliability.

Oğur et al. [ 109 ] proposed a real-time data analytics architecture for smart healthcare in IoT. This architecture has two domains. The software-defined networking-based WSN and RFID technology are used in the vertical domain, and data analytics tools, including Kafka, Spark, MongoDB, and NodeJS, are used in the horizontal domain. The collected data from WSN using RFID transmit to the Kafka platform using TCP sockets. The Kafka sends data to three consumers: The Apache Spark analysis engine that analyzes data in real-time; the NodeJS web application that visualizes patient data; and the MongoDB database that stores data. This article uses logistic regression and Apache spark MLlib for data classification. The result demonstrated this architecture has high performance and accuracy and is appropriate for a time-saving experimental environment.

Table 13 shows the result of the analysis of the articles. This table shows each article's architecture or framework name, OS name, programming language, advantages, and disadvantages. As you can see, layered architecture is the most important, with 14 articles.

3.2.1.2 BDM architectural/framework for IoT-based smart cities

According to the United Nations forecasting, about 67% of the world population will live in urban areas by 2050, resulting in environmental pollution, ecosystem destruction, energy shortage, emission reduction, and resource limitation [ 36 , 174 , 175 ]. Smart cities are large-scale distributed systems that could be a solution to overcoming these problems and improving intelligent services for residents [ 112 , 176 ]. Smart cities have many implemented sensing devices that generate large amounts of data. These data must be stored, processed, and analyzed to extract valuable information [ 177 ]. BDM plays a significant role in this context and facilitates better resource management and decision-making [ 176 ]. Many research focused on BDM mechanisms in IoT-based smart cities with different objectives, including improving monitoring and communication, real-time controlling, and increased quality attributes (such as reliability, throughput, energy conservation, accuracy, scalability, delay, bandwidth usage, etc.). Therefore, this subsection examines the articles (14 articles; 22.22%) that have discussed the architectures or frameworks of BDM in IoT-based smart cities.

Jindal et al. [ 85 ] propose a tensor-based big data processing technique for energy consumption in smart cities. This article aims to reduce the dimensionality of data and decrease the overall complexity. The proposed framework has two phases. The first phase is the 3-layer data gathering and processing architecture. The layers of this architecture are data acquisition, transmission, and processing. In the second phase, the collected data was represented in tensor form, and SVM was used to identify the loads to manage the demand response services in smart cities. The technique reduces data storage by 38%.

ESTemd [ 105 ] is a distributed stream processing middleware framework for real-time analysis using big data techniques on Apache Kafka. The layers of this framework are the data ingestion layer, the data broker layer (source), the stream data processing engine and services, the data broker layer (sink), and the event hub. The data broker layer is responsible for data processing and transformation, with the support of multiple transport protocols. The third layer does stream processing and consists of the predictive data analytics model and Kafka CEP operators. This framework helps with performance improvement through data integration and distributed applications' interoperability.

CPSO [ 115 ] is a self-adaptive preprocessing approach for big data stream classification. This approach handles four mechanisms: sub-window processing; feature extraction; feature selection; and optimization of the window size and feature picking. CPSO uses clustering-based PSO for data stream mining; the sliding window technique for data segmentation; statistical feature extraction for variable partitioning; correlation feature selection, and information gain for feature selection. The proposed approach improves its accuracy.

Rani and Chauhdary [ 72 ] proposed a novel approach for smart city applications based on BDA and a new protocol for mobile IoT. They presented the 5-layer architecture where the layers are: data source, technology, data management, application, and utility programs. The data source layer collects, compresses, and filters data. The technology layer is responsible for communication between sensor nodes, edge nodes, and base station. The management layer used MapReduce, SQL, and Hbase for analyzing, storing, and processing. The utility program layer used WSN and IoT protocols to work with the other layers. Also, this article presented a new protocol that reduces energy consumption, increases throughput, and reduces the delay and transmission time.

SCDAP [ 107 ] is the 3-layer BDA architecture for smart cities. The first layer is the platform that includes hardware clusters, the operating system, communication protocols, and other required computing nodes. The second layer is security. The last layer is the data processing layer that supports online and batch data processing. This layer has ten components: data acquisition; data preprocessing; online analytics; real-time analytics; batch data repository; batch data analytics; model management; model aggregation; smart application; and user interface. This architecture used Hadoop and Spark for data analysis. Also, this article presented a taxonomy of literature reviews based on six characteristics: focus, goal, organization, perspective, audience, and coverage.

Chilipirea et al. [ 80 ] proposed a data flow-based architecture for big data processing in smart cities. The architecture has seven steps: data sources, data normalization; data brokering; data storage; data analysis; data visualization; and decision support systems. This article used Extract, Transform, and Load (ETL) and Electronic Batchload Service (EBS) for normalizing the real-time and batch data. The data brokering step created the links between the collected data and the relevant context. This architecture used Hadoop for batch data processing and Storm for real-time data processing.

Gohar et al. [ 92 ] proposed a four-layer architecture for analyzing and storing data on the Internet of Small Things (IoST). The layers of this architecture are the small things layer, the infrastructure layer, the platform layer, and the application layer. The first layer collected data by using the LoRa gateway from LoRa devices. The infrastructure layer provides connectivity to devices by using the Internet. The platform layer is responsible for data preprocessing. For processing, this layer employs Max–Min normalization, the Kalman filter, the Round-Robin load balancing technique, the Least Slack Time algorithm (LST), the divide-and-conquer approach for aggregation, and NoSQL databases for storage. In the last layer, data is visualized for decision-making. This article implemented the architecture by using Hadoop, Spark, and GraphX. In this article, throughput has increased with the rise in data size.

Farmanbar and Rong [ 113 ] proposed an interactive cloud-based dashboard for online data visualization and a data analytics toolkit for smart city applications. The proposed architecture has three layers: the data layer, application and analysis layer, and presentation layer. The data layer is the core of the architecture and contains data acquisition units, data ingestion, data storage, and data access. This architecture used Logstash for data ingesting, Elasticsearch for storing, and Kibana for accessing and real-time monitoring. This platform has been tested on five datasets, including transportation data, electricity consumption, cargo e-bikes, parking, vacancies, and energy. The results showed this architecture is robust, scalable, and improves communication between users and urban service providers.

He et al. [ 116 ] proposed a big data architecture to achieve high Quality of Experience (QoE) performance in smart cities. This architecture has three plans: the data storage plane, the data processing plane, and the data application plane. This article used MongoDB and HDFS for data storing and Spark and the deep-learning-based greedy algorithm for data processing. The simulation result indicated that the proposed architecture's accuracy, precision, and recall are better than SVM and KNN.

Khan et al. [ 128 ] proposed an SDN-based 3-tier architecture that includes data collection, data processing and management, and an application layer for real-time big data processing in smart cities with two intermediate levels that work on SDN principles. This architecture uses Spark and GraphX with Hadoop for offline and real-time data analysis and processing. Also, this article proposed an adaptive job scheduling mechanism for load balancing and achieving high performance. The results showed that when clusters and processing time increase, the proposed system's performance also increases.

IoTDeM [ 73 ] is the IoT big data-oriented multiple edge-cloud architectures for MapReduce performance prediction with varying cluster scales. This architecture consists of three parts: multiple edge cloud redirectors, an edge cloud-based big data platform, and a centralized cloud-based big data platform. This architecture used historical job execution records and Locally Weighted Linear Regression (LWLR) techniques for predicting jobs' executing times and Ceph for storing them. Because of Ceph, there was no need to transfer data to the newly added slave node. This article validated the accuracy of the proposed model by using the TESTDFSIO and Sort benchmark applications in a general implementation scenario based on Hadoop2 and Ceph and achieved an average relative error of less than 10%.

Ahab [ 112 ] is a generic, scalable, fault-tolerant, and cloud-based framework for online and offline big data processing. This framework has four components: the user API, repositories, messaging infrastructure, and stream processing. The API directs the published data streams from different sources. Ahab uses the component, stream, policy, and action repositories for storing data streams, management policies, and actions. Ahab uses distributed messaging for handling data streams, minimizing unnecessary network traffic. Also, it allows the components to choose an appropriate communication point freely. The Ahab architecture has two layers: the streaming and service layers. The streaming layer is implemented as a lambda architecture. This layer has three sub-layers for data stream processing: the batch layer, the speed layer, and the serving layer. The HDFS and Apache Spark are used for data storing and stream processing. The service layer is responsible for analyzing, managing, and adapting components.

Mobi-Het [ 81 ] is a mobility-aware optimal resource allocation architecture for remote big data task execution in mobile cloud computing. This article uses the SMOOTH random mobility model to propound the free movement of mobile devices and estimate their speed and direction. Mobi-Het has three layers: mobile devices, cloudlets, and the master cloud. The mobile devices component has a decision-maker module that decides whether tasks should be executed remotely or locally. The master cloud component implements the resource allocation algorithm. This article has a low execution time, high execution reliability, and efficiency in timeliness.

Hossain et al. [ 132 ] proposed a knowledge-driven framework that automatically selects the suitable data mining and ML algorithms for a dynamic IoT smart city dataset. The system architecture has four units: data Knowledgeextraction, extactGoalKnowledge, extractAlgoKnowledge, and matchKnowledge. The framework's inputs are three key factors: datasets, goals, and data mining and ML algorithms. This article discussed both supervised and unsupervised data mining. The results show that this framework reduces computational time and complexity and increases performance and flexibility while dynamically choosing a high-accuracy solution.

Table 14 shows the result of the analysis of the articles. This table shows the architecture or framework name, OS name, programming language, advantages, and disadvantages of each article. As you can see, layered architecture is the most important, with 13 articles.

3.2.1.3 BDM architectural/framework for IoT-based smart home/ building

BDM mechanisms and IoT (architecture/ frameworks) have a crucial role in smart home/building, including processing data collected by the home sensors; analyzing, classifying, monitoring, and managing energy consumption and saving; intelligently identifying user behavior patterns and home activities; and increasing safety and comfort at home [ 76 ]. This subsection presents a review of the articles (8 articles; 12.70%) that have discussed the architectures or frameworks of BDM in the IoT-based smart home/ building.

Al-Ali et al. [ 68 ] proposed a smart home energy management architecture using IoT and BDA approaches. This architecture is divided into two sub-architectures: hardware architecture and software architecture. The hardware architecture includes sensors and actuators, high-end microcontrollers, and server blocks. The software architecture comprises the data acquisition module on the edge device, a middleware module, and a client application module. The first module monitors and collects data and transmits them to the middleware module. The second module uses several tools to provide different services, including facilitating communication between edge devices and middleware, data storage, data analysis, and sending results to the requester. The third module develops the front-end mobile user interface using a cross-platform integrated development environment. This article is evaluated using a prototype. The results showed the proposed architecture has high scalability, security, privacy, throughput, and speed.

Silva et al. [ 55 ] proposed a real-time BDA embedded architecture for the smart city with the RESTful web of things. This article integrated the web and smart control systems using a smart gateway system. The proposed architecture consists of four levels: data creation and collection; data processing and management; event and decision management; and application. The data processing and management level utilized HDFS for primary data storing, MapReduce for processing, Hbase to speed up the processing, and HIVE for data querying and managing. The event and decision management level classified two events as service and resource events based on the processed information. The application level remotely provides access to the smart city services and has three sub-layers: departmental layer, services layer, and sub-services layer. This article has high performance and throughput, low processing time, and minimizes energy consumption.

Khan et al. [ 57 ] proposed a scheduling algorithm, an IoT BDA architecture, and a real-time platform for managing sensors' energy consumption. This architecture has four steps: appliance discovery, sensor configuration and deployment, event management and scheduling, and information gathering and processing. Appliances are identified and classified in the first step based on user availability and usage time. The second step used Poisson distribution for sensor distribution in an IoT environment. In the third step, the appliance sleep-scheduling mechanism is presented for job scheduling. In the last step, the collected data from sensors were directed to Hadoop, Spark, and GraphX for processing and analysis. This step used HDFS for data storage. This article minimized total execution time and energy consumption.

HEMS-IoT [ 76 ] is a 7-layer architecture based on big data and ML for in-home energy management. The layers of this architecture are the presentation layer, IoT services layer, security layer, management layer, communication layer, data layer, and device layer. The management layer uses the J48 ML algorithm and the Weka API for energy consumption reduction and user behavior pattern extraction. This layer also classifies the data and houses based on energy consumption using the C4.5 algorithm. The IoT services layer provides different REST-based web services. The security layer guarantees data confidentiality. This layer has two components, namely authorization and authentication. This article uses RULEML and Apache Mahout to generate energy-saving recommendations.

Yassine et al. [ 56 ] proposed a platform for IoT smart homes based on fog and cloud computing. The components of the proposed platform are smart home components, IoT management and integration services, fog computing nodes, and cloud systems. The smart home component is divided into three tiers. The three tiers are: 1) the cyber-physical tier is responsible for interacting with the outside world through the second tier; 2) the connectivity tier is responsible for communicating with the smart home; and 3) the context-aware tier consists of user-defined rules and policies that create a privacy and security configuration. The IoT management and integration services component is in charge of providing interoperability, handling requests, authentication, and service registration. The fog computing nodes performed preprocessing, pattern mining, event detection, behavioral and predictive analytics, and visualization functions. The cloud system is responsible for storing and performing historical data analytics.

Luo et al. [ 131 ] proposed a 4-layer ML-based energy demand predictive model for smart building energy demands. Firstly, the sensitization layer collected data and transferred them to the storage layer. The storage layer performed data cleaning and storing. The model’s smart core is in the analytics support layer, where Artificial Neural Network (ANN) and k-means clustering are used for identifying features in weather profile patterns. The service layer is an interface between the proposed model and the smart building management system. The proposed model improved accuracy and decreased mean absolute percentage error.

Bashir et al. [ 110 ] proposed an Integrated Big Data Management and Analytics (IBDMA) framework for smart buildings. The reference architecture and the metamodel are two phases of this framework. The reference architecture has eight layers: data monitoring, sourcing, ingestion, storage, analysis, visualization, decision-making, and action. People, processes, technology, information, and facility are the components of the metamodel phase. The core component of the metamodel is people (IoT policymakers, developers, and residents of intelligent buildings). The process component includes data monitoring, sourcing, ingesting, storage, decision-making, analytics, and action/control. The technology component consists of the tools and software packages to implement the IBDMA. Some of these tools are Apache Flume for data ingesting; HDFS for data storing; Apache Spark for data analysis; Microsoft Power BI for static data visualization; and Elasticsearch and Kibana for near-real-time data visualization. The information element manages disasters and controls various facilities based on results obtained by using the technology stack. The last element is the facility that improves the comfort, safety, and living conditions for the people of the building.

Table 15 shows the result of the analysis of the articles. This table shows each article's architecture or framework name, OS name, programming language, advantages, and disadvantages. As you can see, layered architecture is the most important, with five articles.

3.2.1.4 BDM architectural/framework for IoT-based intelligent transport

Safety, reliability, fault diagnosis, data transmission, and early warning in the intelligent transport system are critical for decision-making [ 178 ]. The intelligent transport system uses digital technologies, sensor networks, ML, and BDA mechanisms to overcome the challenges, including accident prevention, road safety, pollution reduction, automated driving, traffic control, intelligent navigation, and parking systems [ 179 ]. This subsection presents a review of the articles (2 articles; 3.17%) that have discussed the architectures or frameworks of BDM in IoT-based intelligent transport.

SMART TSS [ 129 ] is a BDA modular architecture for intelligent transportation systems. This architecture has four units: a big data acquisition and preprocessing unit, a big data processing unit, a big data analytics unit, and a data visualization unit. The big data processing unit stored the offline data in the cloud system for future analysis. The online data is sent to the extraction and filtration unit for load balancing on NoSQL databases. The big data analytics unit uses the map-reduce mechanism for analysis. This article uses Hadoop, Spark, and GraphX for big data processing and analysis. The throughput of the proposed system increases with increasing data size and has low accuracy and security.

Babar and Arif [ 89 ] proposed a real-time IoT big data analytics architecture for the smart transportation system. This architecture has three phases: big data organization and management, big data processing and analysis, and big data service management. The first phase performed data preprocessing, including big data detection, logging, integration, reduction, transformation, and cleaning. This phase used the divide-and-conquer technique for data aggregation, the Min–Max method for data transformation, and the Kalman filter technique for data cleaning. The second phase used Hadoop for big data processing, HDFS, Hive, and Hbase for data storage, and Spark for data stream analysis. This phase performed load balancing that caused increased throughput, minimized processor use, and reduced response time. The third phase is responsible for intelligent decision-making and event management.

Table 16 shows the result of the analysis of the articles. This table shows the architecture or framework name, OS name, programming language, advantages, and disadvantages of each article. As you can see, layered architecture is the most important, with two articles.

3.2.1.5 BDM architectural/framework for IoT-based traffic control and energy

Two reviewed articles discussed the architectures or frameworks of BDM in IoT-based traffic control and energy and used the ML for this purpose. ML4IoT [ 108 ] is a container-based ML framework for IoT data analytics and coordinating ML workflows. This framework aims to define and automate the execution of ML workflows. The proposed framework uses several types of ML algorithms. The ML4IoT framework has two layers: ML4IoT data management and ML4IoT core. The ML4IoT core layer trains and deploys ML models and consists of five components: a workflow designer, a workflow orchestrator, a workflow scheduler, container-based components, and a distributed data processing engine. ML4IoT data management is responsible for data ingesting and storing and has three sub-components: a messaging system, a distributed file system, and a NoSQL database. The results of this article reveal that this framework has high elasticity, scalability, robustness, and performance. Furthermore, Chhabra et al. [ 111 ] proposed a scalable and flexible cyber-forensics framework for IoT BDA analytics with high precision and sensitivity. This framework consisted of four modules: the data collector and information generator; feature analytics and extraction; designing ML models; and analyzing models on various efficiency matrices. This article used Google’s programming model, MapReduce, as the core for traffic translation, extraction, and analysis of dynamic traffic features. Also, they presented a comparative study of globally accepted ML models for peer-to-peer malware analysis in mocked real-time.

Table 17 shows the result of the analysis of the articles. This table shows the architecture or framework name, OS name, programming language, advantages, and disadvantages for each article. As you can see, the component-based architecture is the most important, with two articles.

3.2.1.6 BDM architectural/framework for IoT-based urban planning

To improve the quality, plan, design, sustainability, living standards, dynamic organization, mobility of urban space and structure, and maintain the urban services, BDM is responsible for offline and online aggregation, managing, processing, and analyzing the large amounts of big data in urbanization [ 180 , 181 , 182 ]. Rathore et al. [ 51 ] proposed the 4-layer IoT-based BDA architecture for smart city development and urban planning. The first layer generated, aggregated, registered, and filtrated data from various IoT sources. Using communication technologies, the second layer created communication between sensors and the relay node. The third layer used HDFS, Hbase, Hive, and SQL for storage; MapReduce for offline analysis; and Spark, VoltDB, and Storm for real-time analysis. The last layer is responsible for showing the study results for intelligent and fast decision-making. The results show that the architecture provides efficient outcomes even on IoT big data sets. Throughput has increased with the rise in data size, and the processing time has decreased.

Silva et al. [ 63 ] proposed a reliable 3-layer BDA-embedded architecture for urban planning. The layers of this architecture are data aggregation, data management, and service management. The purpose of this article is to increase throughput and minimize processing time. The real-time data management layer is the main layer and performs data filtration, analysis, processing, and storing. This layer used data filtration and min–max normalization techniques to improve energy data. This architecture used MapReduce for offline data processing, Spark for online data processing, and Hbase for storing.

Table 18 shows the result of the analysis of the articles. This table shows the architecture or framework name, OS name, programming language, advantages, and disadvantages for each article. As you can see, layered architecture is the most important, with two articles.

3.2.1.7 BDM architectural/framework for other IoT-based applications

This subsection presents a review of the articles (14 articles) that have discussed the architectures or frameworks of BDM in other IoT-based applications. These IoT applications are smart IoT systems (4 articles), smart flood (1 article), smart farms (2 articles), disaster management (1 article), laundry (1 article), smart pipeline (1 article), network traffic (1 article), digital manufacturing (1 article), smart factory (2 articles).

Al-Osta et al. [ 121 ] proposed an event-driven and semantic rules-based approach for IoT data processing. The main levels of this system are sensor, edge, and cloud levels. This article has two purposes: reducing the required resources and the volume of data before transfer to the cloud for storage. The collected data is first aggregated, filtered, and classified at the gateway level. This causes a saving in bandwidth and minimizes the network traffic. This approach used semantic rules for data filtering. It also employed a complex event processing module to analyze input events and detect processing priority.

Wang et al. [ 148 ] proposed a 3-layer edge-based architecture and a dynamic switching algorithm for IoT big data analytics. The layers of this architecture are the cloud layer, edge layer, and IoT layer. The edge layer performed some functions, including identifying IoT applications, classifying them, and sending classification results to the cloud layer. The LibSVM method is used for IoT application identification and classification based on system status and requirements. Also, this article presented a new algorithm, namely the dynamic switching algorithm, for task offloading from cloud to edge based on the delay and network conditions. This algorithm performed task offloading based on classification results. The results showed the proposed architecture reduced delay, processing time, and energy consumption.

IODML-BDA [ 124 ] is a model for Intelligent Outlier Detection in Apache Spark using ML-powered BDA for mobile edge computing. This model performs four steps: data preprocessing, outlier detection, feature selection, and classification. This article employs an Adaptive Synthetic Sampling (ADASYN)-based technique for outlier detection, the Oppositional Swallow Swarm Optimization (OSSO) for feature selection, and a Long Short-Term Memory (LSTM) model for classification. This model has high performance and accuracy in BDA.

Kumar et al. [ 3 ] presented a novel 4-layer architecture for IoT big data management in cloud computing networks and a collaborative filtering recommender system. The information layer collects data and transmits them to the second layer. The transport layer uses GPRS/CDMA, wireless RFID, or Ethernet channels for communication and data uploading in the data mining layer. The data mining layer utilizes the ML method for data analysis. The application layer is responsible for data visualization based on extracted information from the data mining layer. The article also proposed a collaborative filtering algorithm to improve the prediction accuracy based on the time-weighted decay function and asymmetrical influence degree. The result of this article demonstrated that this architecture has high accuracy.

Sood et al. [ 75 ] proposed a 4-layer flood forecasting and monitoring architecture based on IoT, High-Performance Computing (HPC), and big data convergence. The IoT layer is responsible for IoT device installation and data collection. The fog computing layer reduces the latency of application execution when predicting the real-time flood. The data analysis layer received, stored, and analyzed the collected data. This layer used Singular Value Decomposition (SVD) for data reduction and a K-mean clustering algorithm to estimate the flood situation and rating. Also, Holt-Winter’s forecasting method is utilized to forecast the flood. The last layer is the presentation layer, which generates information for decision-making. The results showed the proposed architecture reduced latency, complexity, completion time, and energy consumption.

Muangprathub et al. [ 79 ] proposed a WSN system for agriculture data analysis based on the IoT for watering crops. This system consists of three components. The hardware component collected data and sent them to the web application for real-time analysis. This component is responsible for data preprocessing, data reduction by the equal-width histograms technique, data modeling/discovery by association rules mining technique, and solution analysis. The web application manages real-time information. The mobile application component controlled crop watering remotely. The architecture of this system has three layers: the environmental data acquisition layer, the data, and communication layer, and the application layer. This system can help to reduce costs and increase agricultural productivity.

Al-Qurabat et al. [ 65 ] proposed a two-level system for data traffic management in smart agriculture based on compression and Minimum Description Length (MDL) techniques. The first level is the sensor node level. This level monitors the features of the environment using a lightweight lossless compression algorithm based on Differential Encoding (DE) and Huffman techniques. The second level is the edge gateway level. This level is responsible for processing, analyzing, filtering, storing, and sending the data to the cloud, and minimizes the first level dataset using MDL and hierarchical clustering. The results demonstrated the suggested method has a high compression ratio and accuracy and decreases data and energy consumption.

Shah et al. [ 53 ] proposed the 5-layer architecture for IoT BDA in a disaster-resilient smart city. The purpose of this architecture is to store, mine, and process big data from IoT devices. This architecture's layers include data resource, transmission, aggregation, analytics and management, and application and support services. This architecture used Apache Flume and Apache Sqoop for unstructured and structured data collection; Hadoop and Spark for real-time and offline data analysis; and HDFS for data storage. The proposed implementation model comprises data harvesting, data aggregation, data preprocessing, and a big data analytics and service platform. This article used a variety of datasets for validation and evaluation based on processing time and throughput.

Liu et al. [ 14 ] proposed a cloud laundry business model based on the IoT and BDA. This model used big data analytics, intelligent logistics management, and ML techniques for big data analytics. This model minimized human interference and increased system efficiency.

Tang et al. [ 7 ] proposed the 4-layer distributed fog computing-based architecture for big data analysis in smart cities. The layers of this architecture are the data center on the cloud layer, intermediate computing nodes layer, edge devices layer, and sensing networks on the critical infrastructure layer. This architecture reduces the communication bandwidth and data size. First, data was collected from the fiber sensor network and transmitted to the edge computing nodes layer. This layer performed two tasks: identifying potential threat patterns and feature extraction using supervised and non-supervised ML algorithms. The intermediate computing nodes layer used the hidden Markov model for big data analysis and hazardous event detection. The results showed the proposed architecture reduced the service response time and the number of service requests submitted to the cloud.

Kotenko et al. [ 136 ] introduced a framework for security monitoring mobile IoT based on big data processing and ML. This framework consists of three layers: 1) extraction and decomposition of a data set using the heuristic approach; 2) compression of feature vectors using Principal Component Analysis (PCA); and 3) learning and classification using the SVM k-nearest neighbor’s method, Gaussian NB, artificial neural network, and decision tree. This framework has high performance and accuracy in the detection of attacks.

Bi et al. [ 157 ] proposed a new enterprise architecture that integrates IoT and BDA for managing the complexity and stability of the digital manufacturing system. This article used Shannon entropy to measure the complexity of a system based on the number of events and the probabilities of event occurrences. This architecture performs three processes: data acquisition, management, and utilization. The result of this article demonstrated that this architecture decreases the system complexity and increases flexibility, resilience, responsiveness, agility, and adaptability.

Yu et al. [ 118 ] presented a BDA and IoT-based framework for health state monitoring in a smart factory. This framework consists of four phases. The data ingestion phase is responsible for extracting different data types, managing data collection, data security, data transformation using a secure file transfer protocol, and data storage issues. The big data management phase uses optimized HDFS for data storage on the cloud nodes and processing using Apache Spark. The data preparation phase performs sensor selection and noise detection processing to produce high-quality data. This phase uses the high-variance feature removal method for feature selection and a novel method for noise detection. The predictive modeling phase has four stages: PCA model training, streaming anomaly detection, contribution analysis, and alarm sequence analysis.

Kahveci et al. [ 183 ] proposed a secure, interoperable, resilient, scalable, and real-time end-to-end BDA platform for IoT-based smart factories. The platform architecture has five layers and several components that perform data collection, data integration, data storing, data analytics, and data visualization. The layers of architecture are the control and sensing layer, the data collection layer, the data integration layer, the data storage and analytics layer, and the data presentation layer. All kinds of sensing and control activities are performed in the first layer. The data collection layer communicates with the first layer through a multi-node client/server architecture. The data integration layer uses the RESTful application program interface to transfer data collected to the data storage layer. The data storage layer uses InfluxDB for industrial metrics and events. Using this architecture, production line performance is improved, bottlenecks are identified, product quality is improved, and production costs are reduced.

Table 19 shows the result of the analysis of the articles. This table shows the architecture or framework name, OS name, programming language, advantages, and disadvantages for each article. As you can see, layered architecture is the most important, with 14 articles.

3.2.2 BDM architectural/framework for IoT paradigms

Another category presented in this article is BDM architectures and frameworks in two important IoT paradigms, i.e., Social Internet of Things (SIoT) and Multiple Internet of Things (MIoT). SIoT is the integration of the IoT with social networking that leads to improved scalability in information and service discovery, trustworthy relationships, security, performance, and high network navigability [ 91 , 184 ]. The SIoT establishes relationships and interactions between human-to-human, human-to-object, and object-to-object social networks in which humans are considered intellectual and relational objects [ 185 , 186 ]. The types of relationships.

between smart, complex, and social objects in SIoT are parental object relationships, co-location object relationships, co-work object relationships, ownership object relationships, social object relationships, stranger object relationships, guest object relationships, sibling object relationships, and service object relationships [ 187 , 188 ]. A MIoT is a collection of connected things that are different kinds of relationships and objects.

In contrast to SIoT, the number of relationships in MIoT is not predefined. Therefore, SIoT is a specific case of MIoT where the number of possible relationship types is limited [ 187 ]. The MIoT paradigm has advantages over the IoT and SIoT. IoT can be divided into multiple networks of interconnected smart objects through MIoT. The MIOT can handle situations where the same objects behave differently in different networks and allows objects from various networks to communicate without being directly connected [ 189 ]. Social objects in the SIoT and MIoT can perform tasks, including physical condition detection, data collection, information exchange, big data processing and analysis, and visualization for decision-making, predicting human behavior, and increasing efficiency and scalability. Due to the heterogeneous nature of communication and social networks, which generate high volume, multi-source, dynamic, and sparse data from SIoT and MIoT objects, the BDA is a vital issue in these paradigms. For BDA in SIoT and MIoT, a large amount of memory, power processing, and bandwidth are required to store, define, process, predict, and assist humans for a limited time [ 64 , 91 ]. Different researchers have examined BDA in these paradigms in various ways.

Paul et al. [ 91 ] proposed a system called SmartBuddy that performs the BDA for SIoT-based smart city data to define real-time human dynamics. This architecture has three domains: the object domain, the SIoT server domain, and the application domain. The object domain collects the data and sends them to the SIoT server for balancing, storing, querying, processing, defining, and predicting human behavior. The application domain has four main components: security, cloud server, results in storage devices, and data server. This domain compilation is the result of the SIoT server domain. This article uses MapReduce programming for offline data analysis and Apache Spark for real-time analysis. SmartBuddy has high throughput and applicability.

HABC [ 52 ] is a Hadoop-based architecture for social IoT big data feature selection and analysis. This architecture has four layers: data collection, communication, feature selection and processing, and service. The data collection layer collected, registered, and filtered data. The communication layer provided end-to-end connectivity to various devices and used the Kalman filter to remove noise. The feature selection and processing layer used MapReduce for data analysis and HDFS, HBSE, and HIVE for manipulation and storing. The Artificial Bee Colony (ABC) is used for feature selection. The results indicate that the architecture increases throughput and accuracy and is more scalable.

Lakshmanaprabu et al. [ 64 ] proposed a hierarchical framework for feature extraction in SIoT big data using the MapReduce framework and a supervised classifier model. This framework has five steps: SIoT data collection, filtering, database reduction, feature selection, and classification. This article used the Gabor filter to reduce the noisy data, Hadoop MapReduce for database reduction, Elephant Herd Optimization (EHO) for feature selection, and a linear kernel SVM-based classifier for data classification. The result showed the proposed architecture has high maximum accuracy, specificity, sensitivity, and throughput.

Socio-cyber network [ 66 ] is the 4-layer architecture that integrates the social network with the technical network for analyzing human behavior using big data. This architecture uses the user's geolocation information to make friendships and graph theory to examine the trust index. The data generation layer is responsible for data collection, aggregating, registration, and filtration. The communication layer provides end-to-end connectivity to various devices. This layer creates a graph of data, and when new data are added to the system, this graph is updated. The data storage and processing layer perform the load balancing algorithm and graph processing. This layer uses MapReduce for data processing, the Spark GraphX tool for real-time analysis, and HDFS for data storage. This article uses the Knowledge Pyramid for knowledge extraction. The service layer shows the result to users.

Shaji et al. [ 120 ] presented a 5-phase approach for big data classification in SIoT. The phases of this approach are the data acquisition phase, data filtering phase, reduction phase, feature selection phase, and classification phase. This article uses an adaptive Savitzky–Golay filter for filtering and eliminating noisy data; the Hadoop MapReduce framework for data reduction; a modified relief technique for optimal feature selection; and a deep neural network-based marine predator algorithm for classification. This article has high accuracy, precision, specificity, sensitivity, throughput, and low energy consumption.

Floris et al. [ 67 ] proposed a 4-layer architecture based on SIoT to deploy a full-stack smart parking solution. The layers of this architecture are the hardware layer, virtualization layer, aggregation layer, and application layer. The hardware layer collected data and consisted of a vehicle detection board, Bluetooth beacon, data transmission board, and concentrator. The SIoT paradigm is implements in the virtualization layer using device virtualization. ML algorithms are implemented in the aggregation layer for data aggregation and data processing. The application layer includes the management platform that supports the control dashboard for smart parking management and the Android App for the citizens.

Cauteruccio et al. [ 166 ] presented a framework for anomaly detection and classification in MIoT scenarios. This framework investigated two problems: the anomaly effects analysis on the MIoT and the source of the anomaly detection. The anomalies in MIoT are divided into three categories: presence anomalies versus success anomalies, hard anomalies versus soft anomalies, and contact anomalies versus content anomalies.

Lo Giudice et al. [ 189 ] proposed a definition of a thing’s profile and topic-guided virtual IoT. The profile of a thing has two components: a content-based component (past behavior) and a collaborative filtering component (principal characteristics of those things it has previously interacted with the most). This article uses a supervised and unsupervised approach to build topic-guided virtual IoTs in a MIoT scenario. Table 20 shows the result of the analysis of the articles. The architecture or framework name, the OS name, programming language, advantages, and disadvantages are shown for each article in this table. As you can see, layered architecture is the most important, with five articles.

3.3 Categories based on quality attributes

Systems have different attributes generally divided into qualitative or functional attributes and non-qualitative or non-functional attributes. This section considers the quality attributes of the selected articles. Quality attributes indicate the system’s characteristics, operating conditions, and constraints. There are different software quality models, such as McCall [ 190 ], Bohem [ 191 ], ISO/IEC9126, and FURPS [ 192 ]. As far as we know, no systematic article has completely categorized articles based on qualitative characteristics. Therefore, this paper categorized the selected articles based on 18 qualitative attributes presented in Table 21 . In this table, the first column shows the names of these 18 quality attributes. The reviewed articles used these quality attributes to show the characteristics, quality attribute analysis, and performance analysis of the proposed approaches, architectures, and frameworks and comparison with other works. Performance attributes have been analyzed in different articles based on different criteria. The reviewed articles utilized 12 quality attributes for performance attribute analysis. These quality attributes are load balancing, energy conservation, network lifetime, processing/execution time, response time, delay, CPU usage, memory usage, bandwidth usage, throughput, latency, and concurrency. In Table 21 , ↓ indicates the reduction of that quality characteristic and ↑ indicates the increase of that quality characteristic. The second column in this table shows the articles that have used these features. The performance, efficiency, accuracy, and scalability attributes are the most critical quality attributes, with 79, 62, 58, and 47 articles, respectively. From another point of view, the reference model of standard software quality attributes, i.e., ISO 25010, has been used to classify articles based on quality attributes. Table 22 shows the articles' classification according to this standard. In the following, some quality attributes and their importance will be defined.

Performance: Performance refers to the ability of BDM techniques in the IoT to provide results and services with high load balancing, energy conservation, throughput, concurrency, low processing/execution time, delay, CPU/memory/ bandwidth usage, and latency.

Feasibility: Feasibility refers to the ability to perform successfully or study the current mode of operation, evaluate alternatives, and develop BDM techniques in the IoT.

Scalability: Scalability refers to the ability of BDM techniques in the IoT to exploit increasing computing resources effectively to maintain service quality when the real data volumes increase. BDM techniques in IoT must be scalable in performance and data storage. Some methods and advanced systems are used to improve the scalability of big data analysis, like parallel implementation, HPC systems, and clouds [ 193 ].

Accuracy: Accuracy refers to the ability to describe data and represent a real-world object or event correctly [ 194 ]. In the reviewed articles, various definitions of accuracy are provided, including clustering accuracy, classification accuracy, the accuracy of features selecting/extracting, and the accuracy of the prediction model. Each of these cases is evaluated in different ways.

Efficiency: Efficiency refers to BDM techniques in IoT with minimum energy and response time and high throughput, accuracy, and performance.

Reliability: Reliability refers to the ability of BDM techniques in the IoT to apply the specified functions under specified conditions and within the expected duration.

Availability: The main goal of many researchers is the availability of information and their analysis from heterogeneous data sources. Availability is one of the components of service trust and is part of reliability.

Interoperability: Interoperability refers to the ability to interconnect and communicate among smart objects, heterogeneous IoT devices, and different operating systems. Low-cost device interoperability is a vital issue in IoT [ 53 , 54 , 195 ].

Flexibility: Flexibility refers to the capacity of BDM techniques in the IoT to be adapted for different environments and situations to face external changes [ 196 ].

Robustness: Robustness refers to a stable BDM system in the IoT that can function despite erroneous, exceptional, or unexpected inputs and unexpected events.

3.4 Big data analytics types in IoT

There are different types of analytics. This study uses Gartner’s classification, Footnote 2 which includes four types of analysis: descriptive analysis (“what happened?”), diagnostic analysis (“why did it happen?”), predictive analysis (“What could happen?”), and prescriptive analysis (“What should we do?”). In descriptive analytics, historical business data is analyzed to describe what happened in the past. Diagnostic analytics investigates and identifies the causes of trends and why they occurred. The goal of predictive analytics is to forecast the future using a variety of statistical and ML techniques. Prescriptive analytics proposes the best action to take to accomplish a business’s objective using the data collected from descriptive and predictive analytics for decision-making based on future situations [ 197 ].

This paper investigates the applied methods for data analysis and categorizes them based on the type of analysis these methods provide. Organizations need statistics, AI, deep learning, data mining, prediction mechanisms, etc., for BDA and to evaluate the data [ 198 ]. The articles used ML algorithms to perform various analyses in the steps of BDA. ML algorithm is an appropriate approach or tool for BDA; decision-making; meaningful, precise, and valuable information extraction; and detecting hidden patterns in big datasets [ 199 , 200 ]. Utilizing the ML algorithms in BDA has advantages such as improving and optimizing BDM processes; heterogeneous big data analysis; sustainability; fault detection, prediction, and prevention; accurate and reliable real-time processing; resource management and reduction; and increased quality prediction, visual inspection, and productivity in IoT applications [ 83 , 201 ]. These algorithms are divided into four types: supervised, semi-supervised, unsupervised, and reinforcement ML algorithms [ 53 , 202 ]. Table 23 shows the categorization of articles based on BDA types. The most common tactics that the selected articles use for BDM in the IoT include classification (51 articles), simulation (38 articles), optimization (30 articles), and clustering (25 articles).

The reason for using more classification algorithms is that they help to categorize unstructured and high-volume data. Therefore, BDM in the IoT is faster and more efficient. Before classification begins, it must optimize the classification algorithm's inputs. Data reduction strategies extract optimal and required data from a large amount of data. These strategies include dimensionality reduction, numerosity reduction, and data compression. Some reviewed articles used Principal Components Analysis (PCA) to standardize, reduce the data redundancy and dimensionality, reduce the cost and processing time, and maintain the original data [ 69 , 114 , 118 , 135 , 136 ]. Also, the authors in [ 160 ] used the fuzzy C-means algorithm to reduce the amount of data. Feature selection methods improve classification accuracy and reduce the number of features in BDA. The collected data from IoT applications and monitoring systems are usually anomalous, and it is difficult to distinguish between the original data and the anomaly [ 201 ]. The anomaly and outlier data reduce the accuracy of the classification and prediction models. For instance, NRDD-DBSCAN [ 114 ], DBSCAN-based outlier detection [ 83 ], GA, and One-Class Support Tucker Machine (OCSTuM) [ 122 , 124 ] are some of the high-robust, high-performance, and anti-noisy methods for anomaly detection that are presented in reviewed articles.

SVM is the most common method based on classification (10 articles) for BDM in the IoT in supervised classification. SVM is a non-parametric, memory-efficient, error-reduction classification method that performs well in theoretical analysis and real-world applications. It can model non-linear, complex, and real-world problems in high-dimensional feature space [ 2 , 69 , 203 ]. However, SVM is difficult to interpret, has a high computational cost, and is not scalable [ 204 ]. In unsupervised classification, the k-means clustering algorithm is the most common strategy (6 articles). The standard k-mean clustering algorithm is a simple partitioning method that works well for small and structured datasets. It is sensitive to the number of clusters, initial input, and noise data. The standard k-means clustering must be modified to be used in BDA. Some research focuses on the MapReduce/Spark implementation of traditional k-means clustering that improves the accuracy and reduces the time complexity [ 205 ]. Also, articles used the k-means clustering algorithm to predict floods [ 75 ], security monitoring [ 136 ], energy management and improve the prediction accuracy [ 56 , 131 ], the data access and resource utilization [ 144 ] in IoT. Association rules are an unsupervised learning approach used to discover interesting and hidden relationships and correlations between variables and objects in large databases and for data modeling in IoT [ 79 ]. Association rule mining uses various algorithms to identify frequent item sets, such as the apriori algorithm, FP growth algorithm, and maximal frequent itemset algorithm [ 79 , 106 ]. Neural networks (NN) perform big data processing and analysis efficiently. NN has self-learning ability and plays a significant role in BDA in IoT. NN is used for classification, big data mining, hidden pattern recognition, correlation recognition in big data raw, and decision-making in IoT applications. There are several different kinds of neural network algorithms, including LSTM [ 108 ], radial basis functions network [ 69 ], Deep NN [ 101 , 162 ], convolutional NN [ 163 ], etc.

Deep learning is a modern machine learning model that employs supervised or unsupervised methods to learn and extract multiple-level, high-level, and hierarchical features for big data classification tasks and pattern recognition [ 163 , 206 ]. Deep learning is a BDA tool that can speed up big data decision-making and feature extraction, improve the extracted information QoE level, resolve security issues, data dimensionality, and unlabeled and un-categorized big data processing in IoT applications [ 116 , 207 ]. In the reviewed articles, deep learning methods are used for human activity recognition [ 87 ], flood detection [ 130 ], smart cities [ 116 ], and feature learning on big data in the IoT [ 163 ]. Optimization refers to selecting the best solution from a set of alternatives by minimizing or maximizing a specified objective function [ 208 ]. Bio-inspired algorithms are stochastic search techniques used by many researchers to solve optimization problems in BDM processes in the IoT, including data ingestion, processing, analytics, and virtualization [ 209 ]. The features of these algorithms are good applicability, simplicity, robustness, flexibility, self-organization, and the possibility of dealing with real-world problems [ 210 ]. There are different types of categories for these algorithms in various articles. For instance, in [ 211 ], these algorithms are categorized into six categories: local search-based and global search-based; single-solution based and population-based; memory-based and memoryless; greedy and iterative; parallel; and nature-inspired and hybridized. In the reviewed articles, GA and NN are used more for BDM in the IoT (6 articles). GA has been used for feature extraction and selection, outlier detection, scheduling, optimizing energy consumption, reducing execution time and delay, and optimizing the predictive model in IoT applications [ 69 , 86 , 115 , 122 , 146 , 173 ].

4 Open issues and challenge

This section offers a variety of vital issues and challenges that require future work. IoT faces many challenges and open issues, including security, privacy, hardware, heterogeneity, data analysis, and virtualization challenges. IoT devices produce big data that must be monitored and managed using particular data patterns. For efficient decision-making, BDA in the IoT is applied to large datasets to reveal unseen patterns and correlations. So the key challenge in big data in the IoT is analyzing that data for knowledge discovery and virtualization. Various types of research have presented different categories for challenges and open issues for BDM in the IoT. Romero et al. [ 212 ] divided challenges into principal worries, security and monitoring, technological development, standardization, and privacy. Santana et al. [ 213 ] divided challenges into privacy, data management, heterogeneity, energy management, communication, scalability, security, lack of testbed, city models, and platform maintenance. Ahmed et al. [ 27 ] divided challenges into four categories: diversity, security, data provenance, data management, and data governance and regulation. This study divides challenges into BDM in the IoT and quality attributes challenges.

4.1 Big data management in the IoT challenges

In many reviewed articles, IoT big data management depends on centralized centers, including cloud-based servers, and has technical limitations. These architectures are platform-centric and have costly customized access mechanisms. A centralized architecture can have a single point of failure, which is very inefficient in terms of scalability and reliability. Also, in these architectures, unauthorized access to the server might easily result in the modification, leak, or manipulation of critical data [ 215 ]. In some research, authors used blockchain technology to overcome these problems [ 215 , 216 ]. But this technology has some challenges. For example, blockchain platforms can consume IoT devices' computational resources extensively. During the review in Sect.  3.1 , the process of BDM in the IoT includes data collection, communication, data ingestion, data storage, processing and analysis, and post-processing, each of which faces a variety of challenges and problems. This section examines the challenges involved in each of these steps.

4.1.1 Data collection

Big data in the IoT is generated from different, distributed, and multisource heterogeneous unsupervised domain [ 217 , 218 ]. Collecting this large amount of diverse data faces challenges such as energy consumption, limited battery life in sensors and other data collection devices, different hardware and operating systems, multiple and disparate resources, and combining them. It can be difficult to obtain complete, accurate, and maintain quality data. IoT and WSN encompass a large number of distributed mobile nodes. Mobile nodes [ 219 ] must increase the amount of data collected while minimizing the power consumption of both the mobile node and IoT devices. Therefore, the main challenge is mobile data collection management, determining and planning mobile sink trajectories for collecting data from nodes. Most existing mobile data collection approaches are static and only find a solution for a scenario with fixed parameters [ 220 ]. These solutions do not consider the change in the amount of data generated by the IoT nodes or devices when an IoT device can move from one situation to another. For future work, we propose using AI techniques, including ML or deep learning, for intelligent management of mobile data collection.

4.1.2 Communication

Transferring data from different sources to the data processing and analysis stage is one of the steps in BDM in the IoT. Communication protocols and technologies must share data at high speeds and on time. The connectivity challenges include interoperability, bandwidth, reducing traffic, energy consumption, security, network, transport protocols, delivery of services, network congestion, and communication cost. Another connectivity challenge is nodes accessing other nodes' information under different network topologies with different channel fading [ 221 ]. Concerning advances in mobile information infrastructure, integration of the 6G technologies, mobile satellite communications, and AI can increase frequency band, network speed, and network coverage and improve the number of connections [ 222 ]. Different approaches are proposed for data transmission optimizing and overcoming these limitations, such as parsimonious/compressive sensing [ 223 , 224 ]. Compressive sensing technology is a theory of acquiring and compressing signals that use the sparsity behavior of natural signals at the sensing stage to minimize power consummation and data dimensionality reduction [ 225 ]. In compressive sensing technology, the collected data from different sensors are first compressed and then transmitted. Therefore, the complexity is transferred to the receiver side from the sensors, which are usually resource-constrained and self-powered [ 226 ]. For future work, we propose combining compressive sensing with AI technologies to present a lightweight, real-time, and dynamic compressive sensing method for overcoming the communication challenges in BDM in the IoT.

4.1.3 Data ingestion

Big data in the IoT have various features such as: enormous, high-speed, heterogeneity of data formats, complexity, different data resolutions, abnormal and incorrect, ambiguity, unbalanced, massive redundancy, multidimensional, granularity, continuously, inconsistencies, probabilistic, sparse, sequential, dynamical, timeliness, non-randomly distributed, and misplaced [ 56 , 63 , 89 , 117 , 119 , 125 , 135 , 137 , 173 , 227 ]. Each data ingestion step discussed in Sect.  3.1.3 has challenges. These issues are anomaly detection, missing data, outlier detection, feature selection/extraction, dimensionality reduction, redundancy, standardization, rule discovery, computational cost, and normalization that different mechanisms use for these challenges. Missing data could lead to the loss of a large amount of valuable and reliable information and bad decision-making. Many articles utilize the delete, ignore, mean/median value, or constant global methods for handling missing data. These dangerous methods may yield biased and untrustworthy results [ 228 ]. Therefore, adding new techniques by considering more efficiency, high accuracy, minimal computational complexity, and less time consumption is interesting in the future. For this purpose, we can use ML and nature-inspired optimization algorithms or a combination thereof. The parallel technology has made data ingestion and processing more efficient in recent years, and it saves space and time by eliminating the need to decompress data [ 229 ]. Also, BDA types in the IoT are used in this stage, which is discussed in Sect.  3.4 . Each of these methods has challenges. For example, clustering has challenges such as real-time clustering, local optima, determining the number of clusters, updating the clustering centers, and determining the initial clustering centers. ANN faces many issues, including how to determine the number of layers, the training, and test samples, the number of nodes, choosing an operable objective function, and how to improve the training speed of the network in a big data environment. Various articles solve these problems using meta-heuristic algorithms. However, these algorithms cannot handle big IoT data sets within the specified time due to high computation costs, limited memory, and processing units, and premature convergence [ 145 , 230 ]. For future work, we propose using new optimization meta-heuristic algorithms and AI methods based on these techniques by utilizing the strengths of MapReduce and Apache Spark.

4.1.4 Data storage

Data storage is another major challenge in BDM in the IoT. The big data storage mechanisms in the IoT were discussed in Sect.  3.1.4 . The challenges in this regard can be categorized as IoT-based big data storage systems in cloud computing and complex environments such as industry 4.0 applications and data storage architecture. The main data storage challenges are IoT data replication and consistency management. Many researchers have proposed strategies for determining the best location for copy storage in geo-distributed storage systems based on cloud and fog computing. But many of them, due to the geographical distance between distributed storage systems, cannot handle the problems of high data access latencies and replica synchronization costs [ 231 ]. Also, data consistency management strategies must manage the massive amounts of data with different data consistency requirements and system heterogeneity.

4.1.5 Processing and analysis

The big data processing and analysis in BDM in the IoT has different challenges, including task scheduling, real-time data analysis, developing the IoT data analysis infrastructure, data management in the cloud-IoT environments, and query optimization. The authors used data mining and AI algorithms to overcome these challenges. The challenges of using AI technologies for data analytics in the IoT are to balance the computational costs (or response time) and improve the accuracy of the prediction and analysis results [ 232 ]. Also, many multi-objective optimization problems have more than three objective functions, which present challenges, including the diversity and convergence speed of the algorithm [ 152 ]. However, determining an algorithm to process a dynamic IoT dataset based on some application-specific goals for better accuracy remains a challenge. Also, most current methods cannot meet user demands for the fundamental features of cloud-IoT environments, including heterogeneity, dynamism, reliability, flexibility, responsiveness, and elasticity. For future work, we propose studies of various optimization algorithms, including metaheuristic algorithms (many-objective) and ML algorithms, and combined versions of these algorithms for big data processing and analysis in the IoT. Regarding the limitations of wireless nodes (low power and computational) and cloud servers (high latency, privacy, performance bottleneck, context unawareness, etc.) for processing and analysis computing tasks, using mobile edge or fog computing to overcome these problems is helpful.

4.1.6 Post-processing

Providing insight from processed and analyzed data in the IoT requires selecting appropriate visualization techniques. Most of the reviewed methods use simulator tools such as CloudSim [ 143 , 173 ], TRNSYS [ 131 ], Cooja [ 82 ], and Extend-Sim [ 8 ] for evaluation. Additional studies are needed to evaluate the mentioned approaches in real-world systems and datasets.

4.2 QoS management

QoS is one of the critical factors in BDM in the IoT and needs research, management, and optimization (discussed in Sect.  3.3 ). The reviewed articles used these parameters and metrics for evaluation. No article considers these parameters thoroughly for its proposed architecture. Therefore, it is exciting to compare various architectures by considering the different QoS parameters and quality attributes in the future. Security, privacy, and trust are critical issues in IoT BDA that most reviewed articles did not address, and the proposed architectures or frameworks did not involve the data perception layer. The security frame generally consists of confidentiality, integrity, authentication, non-repudiation, availability, and privacy [ 233 ]. We concede that no comprehensive and highly secure scheme or platform for all types of data collection, analysis, and sharing meets all security requirements. The other main challenges are integrating privacy protection methods with data sharing platforms and selecting the best privacy protection algorithms to use during data processing [ 172 ]. Therefore, it is suggested for the future to utilize cryptographic mechanisms in different layers of architectures or frameworks, add a data perception layer, and develop security protocols specifically for IoT devices because of their heterogeneity and resource limitations.

The blockchain framework is widely used in IoT to improve protection, trust, reputation, management, control, and security. The blockchain framework provides decentralized security, authentication rules, and privacy for IoT devices. However, there are major challenges, such as high energy consumption, delay, and computational overhead, because of the resource constraints in IoT devices. Many types of research have been suggested as solutions to these problems. For instance, Corradini et al. [ 234 ] proposed a two-tier Blockchain framework for increasing the security and autonomy of smart objects in the IoT by implementing a trust-based protection mechanism. The tiers of this framework are a point-to-point local tier and a community-oriented global tier. Pincheira et al. [ 235 ] proposed a cost-effective blockchain-based architecture for ensuring data integrity, auditability, and traceability and increasing trust and trustworthiness in IoT devices. This architecture has four components: the cloud module, mobile app, connected tool, and blockchain module. Tchagna Kouanou et al. [ 236 ] proposed a 4-layer blockchain-based architecture to secure data in the IoT to increase security, integrity, scalability, flexibility, and throughput. The layers of this architecture are tokens, smart contracts, blockchain, and peers. In future research, we suggest using AI techniques and a lightweight blockchain framework to increase protection, trust, reputation, and security in the IoT.

Trust and reputation management are vital issues in the SIoT and MIoT scenarios. In [ 237 ], the authors defined trust and reputation in the MIoT as the trust of an instance in another one of the same IoT; the trust of an object in another one of the MIoT; the reputation of an instance in an IoT; the reputation of an object in a MIoT; the reputation of an IoT in a MIoT; the trust of an IoT in another IoT; and the trust of an object in an IoT. Security in the SIoT aims to differentiate between secure and malicious things and increase the safety and protection of SIoT networks [ 185 ]. Investigating trust and reputation in SIoT and MIoT has many benefits, such as identifying, isolating, managing malicious objects, supporting collaboration, and identifying and evaluating the objects’ QoS parameters. Also, the lack of trust and reputation management in SIoT and MIoT causes problems such as loss of accessibility, privacy, and security [ 237 ]. To overcome these issues, we suggest utilizing trust and reputation with AI methods to develop detection techniques for anomalous and malicious behaviors of things in the MIoT and SIoT in future works.

5 Conclusion

This paper presented a systematic review of the BDM mechanisms in the IoT. First, we discussed the advantages and disadvantages of some systematic and review articles about BDM in the IoT and then explained the purpose of this paper. Then, the research methodology and details of 110 selected articles were presented. These articles were divided into four main categories, including BDM processes, big BDM architectures/frameworks, quality attributes, and data analytics types in IoT. Some of these categories have been divided into some subcategories: BDM process in IoT was divided into data collection, communication, data ingestion, data storage, processing and analysis, and post-processing; big data architectures/frameworks in the IoT were divided into BDM architectures/frameworks in the IoT-based applications and BDM architectures/frameworks in the IoT paradigms; big data analytics-types were divided into the descriptive, diagnostic, predictive, and prescriptive analysis; and big data storage systems in the IoT were divided into relational databases, NoSQL databases, DFS, and cloud/edge/fog/mist storage. Also, the advantages and disadvantages of each of the BDM mechanisms in the IoT were discussed. The tools and platforms used for BDM in the IoT in the articles were reviewed and compared based on criteria. The most common type of analysis that articles use is predictive analysis, with 57.27%, which uses ML algorithms. The classification, optimization, and clustering algorithms are the most widely used for big data analysis in the IoT. Some articles present architectures mostly in IoT-based healthcare, with 33.33%, and IoT-based smart cities, with 22.22%. These architectures have two to eight layers, each performing a set of functions. In the review of qualitative characteristics, we observed that most articles evaluated their evaluations based on criteria, including performance, efficiency, accuracy, and scalability. Meanwhile, some features are less used, including confidentiality, sustainability, accessibility, portability, generality, and maintainability. The NoSQL database and DFS are used more to store data than other databases. The BDM process in the IoT uses different algorithms and tools with various features. Various programming languages and operating systems are used to evaluate and implement the proposed mechanisms. The Java and python programming languages and the UBUNTU operating system are used more.

This paper tries to review the BDM mechanisms in the IoT. Specifically, it considers studies published in high-quality international journals. The most recent works on BDM mechanisms in the IoT have been compared and analyzed in this paper. We hope that this study will be helpful for the next generation of studies for developing BDM mechanisms in real-complex environments.

https://www.idc.com/ .

http://www.gartner.com/it-glossary/predictive-analytics/ .

Cao, B., Zhang, Y., Zhao, J., Liu, X., Skonieczny, Ł, & Lv, Z. (2021). Recommendation based on large-scale many-objective optimization for the intelligent internet of things system. IEEE Internet of Things Journal . https://doi.org/10.1109/JIOT.2021.3104661

Article   Google Scholar  

Hou, R., Kong, Y., Cai, B., & Liu, H. (2020). Unstructured big data analysis algorithm and simulation of internet of things based on machine learning. Neural Computing and Applications, 32 , 5399–5407.

Kumar, M., Kumar, S., & Kashyap, P. K. (2021). Towards data mining in IoT cloud computing networks: Collaborative filtering based recommended system. Journal of Discrete Mathematical Sciences and Cryptography, 24 , 1309–1326.

Article   MathSciNet   MATH   Google Scholar  

Cao, B., Zhao, J., Lv, Z., & Yang, P. (2020). Diversified personalized recommendation optimization based on mobile data. IEEE Transactions on Intelligent Transportation Systems, 22 , 2133–2139.

Sanislav, T., Mois, G. D., Zeadally, S., & Folea, S. C. (2021). Energy harvesting techniques for internet of things (IoT). IEEE Access, 9 , 39530–39549.

Zhou, H., Sun, G., Fu, S., Liu, J., Zhou, X., & Zhou, J. (2019). A Big data mining approach of PSO-based BP Neural network for financial risk management with IoT. IEEE Access, 7 , 154035–154043.

Tang, B., Chen, Z., Hefferman, G., Pei, S., Wei, T., He, H., et al. (2017). Incorporating intelligence in fog computing for big data analysis in smart cities. IEEE Transactions on Industrial informatics, 13 , 2140–2150.

Jiang, W. (2019). An intelligent supply chain information collaboration model based on internet of things and big data. IEEE Access, 7 , 58324–58335.

Xiao, S., Yu, H., Wu, Y., Peng, Z., & Zhang, Y. (2017). Self-evolving trading strategy integrating internet of things and big data. IEEE Internet of Things Journal, 5 , 2518–2525.

Sowe, S. K., Kimata, T., Dong, M., & Zettsu K. (2014). Managing heterogeneous sensor data on a big data platform: IoT services for data-intensive science. In 2014 IEEE 38th International Computer Software and Applications Conference Workshops , Vasteras, Sweden, pp. 295-300

Nie, X., Fan, T., Wang, B., Li, Z., Shankar, A., & Manickam, A. (2020). Big data analytics and IoT in operation safety management in under water management. Computer Communications, 154 , 188–196.

Liu, H., & Liu, X. (2019). A novel research on the influence of enterprise culture on internal control in big data and internet of things. Mobile Networks and Applications, 24 , 365–374.

Piccialli, F., Benedusi, P., Carratore, L., & Colecchia, G. (2020). An IoT data analytics approach for cultural heritage. Personal and Ubiquitous Computing . https://doi.org/10.1007/s00779-019-01323-z

Liu, C., Feng, Y., Lin, D., Wu, L., & Guo, M. (2020). Iot based laundry services: an application of big data analytics, intelligent logistics management, and machine learning techniques. International Journal of Production Research . https://doi.org/10.1080/00207543.2019.1677961

Wang, J., Wu, Y., Yen, N., Guo, S., & Cheng, Z. (2016). Big data analytics for emergency communication networks: A survey. IEEE Communications Surveys & Tutorials, 18 , 1758–1778.

Jahanbakht, M., Xiang, W., Hanzo, L., & Azghadi, M. R. (2020) Internet of underwater things and big marine data analytics--a comprehensive survey. arXiv preprint arXiv:2012.06712 .

Stoyanova, M., Nikoloudakis, Y., Panagiotakis, S., Pallis, E., & Markakis, E. K. (2020). A survey on the internet of things (IoT) forensics: Challenges, approaches, and open issues. IEEE Communications Surveys & Tutorials, 22 , 1191–1221.

Aldalahmeh, S. A., & Ciuonzo, D. (2022). Distributed detection fusion in clustered sensor networks over multiple access fading channels. IEEE Transactions on Signal and Information Processing over Networks, 8 , 317–329.

Article   MathSciNet   Google Scholar  

Rajavel, R., Ravichandran, S. K., Harimoorthy, K., Nagappan, P., & Gobichettipalayam, K. R. (2022). IoT-based smart healthcare video surveillance system using edge computing. Journal of Ambient Intelligence and Humanized Computing, 13 , 3195–3207.

Shahid, H., Shah, M. A., Almogren, A., Khattak, H. A., Din, I. U., Kumar, N., et al. (2021). Machine learning-based mist computing enabled internet of battlefield things. ACM Transactions on Internet Technology (TOIT), 21 , 1–26.

Thomas, D., Orgun, M., Hitchens, M., Shankaran, R., Mukhopadhyay, S. C., & Ni, W. (2020). A graph-based fault-tolerant approach to modeling QoS for IoT-based surveillance applications. IEEE Internet of Things Journal, 8 , 3587–3604.

S. Vahdat (2020) The role of IT-based technologies on the management of human resources in the COVID-19 era. Kybernetes .

Hassan, M., Awan, F. M., Naz, A., deAndrés-Galiana, E. J., Alvarez, O., Cernea, A., et al. (2022). Innovations in genomics and big data analytics for personalized medicine and health care: A review. International Journal of Molecular Sciences, 23 , 4645.

Honar Pajooh, H., Rashid, M. A., Alam, F., & Demidenko, S. (2021). IoT big data provenance scheme using blockchain on Hadoop ecosystem. Journal of Big Data, 8 , 1–26.

Priyadarshini, S. B. B., Bhusan Bagjadab, A., & Mishra B. K. (2019). The role of IoT and big data in modern technological arena: A comprehensive study. In Internet of Things and Big Data Analytics for Smart Generation. Springer, pp. 13–25.

Zheng, W., Yin, L., Chen, X., Ma, Z., Liu, S., & Yang, B. (2021). Knowledge base graph embedding module design for Visual question answering model. Pattern Recognition, 120 , 108153.

Ahmed, E., Yaqoob, I., Hashem, I. A. T., Khan, I., Ahmed, A. I. A., Imran, M., et al. (2017). The role of big data analytics in internet of things. Computer Networks, 129 , 459–471.

Singh, S., & Yassine, A. (2018). IoT big data analytics with fog computing for household energy management in smart grids. In International Conference on Smart Grid and Internet of Things . pp. 13–22.

Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa A., et al. (2017). Big IoT data analytics: architecture, opportunities, and open research challenges. ieee access , 5, 5247–5261.

Li, C. (2020). Information processing in internet of things using big data analytics. Computer Communications, 160 , 718–729.

Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International journal of information management, 34 , 387–394.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35 , 137–144.

Ahmed, M., Choudhury, S., & Al-Turjman, F. (2019). Big data analytics for intelligent internet of things. In Artificial Intelligence in IoT . Springer, pp. 107–127.

Urrehman, M. H., Ahmed, E., Yaqoob, I., Hashem, I. A. T., Imran, M., & Ahmad, S. (2018). Big data analytics in industrial IoT using a concentric computing model. IEEE Communications Magazine, 56 , 37–43.

Constante Nicolalde, F., Silva, F., Herrera, B., & Pereira, A. (2018). Big data analytics in IOT: challenges, open research issues and tools. In World conference on information systems and technologies , pp. 775–788.

Talebkhah, M., Sali, A., Marjani, M., Gordan, M., Hashim, S. J., & Rokhani, F. Z. (2021). IoT and big data applications in smart cities: Recent advances, challenges, and critical issues. IEEE Access, 9 , 55465–55484.

Bansal, M., Chana, I., & Clarke, S. (2020). A survey on iot big data: Current status, 13 v’s challenges, and future directions. ACM Computing Surveys (CSUR), 53 , 1–59.

Simmhan, Y., & Perera, S. (2016). Big data analytics platforms for real-time applications in IoT. In Big data analytics . Springer, pp. 115–135.

Shoumy, N. J., Ang, L.-M., Seng, K. P., Rahaman, D. M., & Zia, T. (2020). Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications, 149 , 102447.

Ge, M., Bangui, H., & Buhnova, B. (2018). Big data for internet of things: A survey. Future Generation Computer Systems, 87 , 601–614.

Siow, E., Tiropanis, T., & Hall, W. (2018). Analytics for the internet of things: A survey. ACM Computing Surveys (CSUR), 51 , 1–36.

Fawzy, D., Moussa, S. M., & Badr, N. L. (2022). The internet of things and architectures of big data analytics: Challenges of intersection at different domains. IEEE Access, 10 , 4969–4992.

Zhong, Y., Chen, L., Dan, C., & Rezaeipanah, A. (2022). A systematic survey of data mining and big data analysis in internet of things. The Journal of Supercomputing . https://doi.org/10.1007/s11227-022-04594-1

Hajjaji, Y., Boulila, W., Farah, I. R., Romdhani, I., & Hussain, A. (2021). Big data and IoT-based applications in smart environments: A systematic review. Computer Science Review, 39 , 100318.

Ahmadova, U., Mustafayev, M., Kiani Kalejahi, B., Saeedvand, S., & Rahmani, A. M. (2021). Big data applications on the internet of things: A systematic literature review. International Journal of Communication Systems, 34 , e5004.

Doewes, R. I., Gharibian, G., Zadeh, F. A., Zaman, B. A., Vahdat, S., & Akhavan-Sigari, R. (2022). An updated systematic review on the effects of aerobic exercise on human blood lipid profile. Current Problems in Cardiology . https://doi.org/10.1016/j.cpcardiol.2022.101108

Zadeh, F. A., Bokov, D. O., Yasin, G., Vahdat, S., & Abbasalizad-Farhangi, M. (2021). Central obesity accelerates leukocyte telomere length (LTL) shortening in apparently healthy adults: A systematic review and meta-analysis. Critical Reviews in Food Science and Nutrition . https://doi.org/10.1080/10408398.2021.1971155

Esmailiyan, M., Amerizadeh, A., Vahdat, S., Ghodsi, M., Doewes, R. I., & Sundram, Y. (2021). Effect of different types of aerobic exercise on individuals with and without hypertension: An updated systematic review. Current Problems in Cardiology . https://doi.org/10.1016/j.cpcardiol.2021.101034

Vahdat, S., & Shahidi, S. (2020). D-dimer levels in chronic kidney illness: a comprehensive and systematic literature review. Proceedings of the National Academy of Sciences, India Section b: Biological Sciences . https://doi.org/10.1007/s40011-020-01172-4

Zhou, D., Yan, Z., Fu, Y., & Yao, Z. (2018). A survey on network data collection. Journal of Network and Computer Applications, 116 , 9–23.

Rathore, M. M., Ahmad, A., Paul, A., & Rho, S. (2016). Urban planning and building smart cities based on the internet of things using big data analytics. Computer Networks, 101 , 63–80.

Ahmad, A., Khan, M., Paul, A., Din, S., Rathore, M. M., Jeon, G., et al. (2018). Toward modeling and optimization of features selection in big data based social Internet of Things. Future Generation Computer Systems, 82 , 715–726.

Shah, S. A., Seker, D. Z., Rathore, M. M., Hameed, S., Yahia, S. B., & Draheim, D. (2019). Towards disaster resilient smart cities: Can internet of things and big data analytics be the game changers? IEEE Access, 7 , 91885–91903.

Celesti, A., & Fazio, M. (2019). A framework for real time end to end monitoring and big data oriented management of smart environments. Journal of Parallel and Distributed Computing, 132 , 262–273.

Silva, B. N., Khan, M., & Han, K. (2017). Integration of big data analytics embedded smart city architecture with RESTful web of things for efficient service provision and energy management. Future generation computer systems . https://doi.org/10.1016/j.future.2017.06.024

Yassine, A., Singh, S., Hossain, M. S., & Muhammad, G. (2019). IoT big data analytics for smart homes with fog and cloud computing. Future Generation Computer Systems, 91 , 563–573.

Khan, M., Han, K., & Karthik, S. (2018). Designing smart control systems based on internet of things and big data analytics. Wireless Personal Communications, 99 , 1683–1697.

Rathore, M. M., Paul, A., Ahmad, A., Anisetti, M., & Jeon, G. (2017). Hadoop-based intelligent care system (HICS) analytical approach for big data in IoT. ACM Transactions on Internet Technology (TOIT), 18 , 1–24.

Yacchirema, D. C., Sarabia-Jácome, D., Palau, C. E., & Esteve, M. (2018). A smart system for sleep monitoring by integrating IoT with big data analytics. IEEE Access, 6 , 35988–36001.

Ma, Y., Wang, Y., Yang, J., Miao, Y., & Li, W. (2016). Big health application system based on health internet of things and big data. IEEE Access, 5 , 7885–7897.

Rathore, M. M., Ahmad, A., Paul, A., Wan, J., & Zhang, D. (2016). Real-time medical emergency response system: Exploiting IoT and big data for public health. Journal of medical systems, 40 , 283.

Zhou, Q., Zhang, Z., & Wang, Y. (2019). WIT120 data mining technology based on internet of things. Health Care Management Science . https://doi.org/10.1007/s10729-019-09497-x

Silva, B. N., Khan, M., Jung, C., Seo, J., Muhammad, D., Han, J., et al. (2018). Urban planning and smart city decision management empowered by real-time data processing using big data analytics. Sensors, 18 , 2994.

Lakshmanaprabu, S., Shankar, K., Khanna, A., Gupta, D., Rodrigues, J. J., Pinheiro, P. R., et al. (2018). Effective features to classify big data using social internet of things. IEEE access, 6 , 24196–24204.

Al-Qurabat, A. K. M., Mohammed, Z. A., & Hussein, Z. J. (2021). Data traffic management based on compression and MDL techniques for smart agriculture in IoT. Wireless Personal Communications, 120 , 2227–2258.

Ahmad, A., Babar, M., Din, S., Khalid, S., Ullah, M. M., Paul, A., et al. (2019). Socio-cyber network: The potential of cyber-physical system to define human behaviors using big data analytics. Future generation computer systems, 92 , 868–878.

Floris, A., Porcu, S., Atzori, L., & Girau, R. (2022). A Social IoT-based platform for the deployment of a smart parking solution. Computer Networks, 205 , 108756.

Al-Ali, A.-R., Zualkernan, I. A., Rashid, M., Gupta, R., & AliKarar, M. (2017). A smart home energy management system using IoT and big data analytics approach. IEEE Transactions on Consumer Electronics, 63 , 426–434.

Moreno, M. V., Terroso-Sáenz, F., González-Vidal, A., Valdés-Vela, M., Skarmeta, A. F., Zamora, M. A., et al. (2016). Applicability of big data techniques to smart cities deployments. IEEE Transactions on Industrial Informatics, 13 , 800–809.

Nasiri, H., Nasehi, S., & Goudarzi, M. (2019). Evaluation of distributed stream processing frameworks for IoT applications in smart cities. Journal of Big Data, 6 , 52.

Ahanger, T. A., Tariq, U., Nusir, M., Aldaej, A., Ullah, I., & Sulman, A. (2022). A novel IoT–fog–cloud-based healthcare system for monitoring and predicting COVID-19 outspread. The Journal of Supercomputing, 78 , 1783–1806.

Rani, S., & Chauhdary, S. H. (2018). A novel framework and enhanced QoS big data protocol for smart city applications. Sensors, 18 , 3980.

Lu, Z., Wang, N., Wu, J., & Qiu, M. (2018). IoTDeM: An IoT big data-oriented MapReduce performance prediction extended model in multiple edge clouds. Journal of Parallel and Distributed Computing, 118 , 316–327.

Rathore, M. M., Paul, A., Hong, W.-H., Seo, H., Awan, I., & Saeed, S. (2018). Exploiting IoT and big data analytics: Defining smart digital city using real-time urban data. Sustainable cities and society, 40 , 600–610.

Sood, S. K., Sandhu, R., Singla, K., & Chang, V. (2018). IoT, big data and HPC based smart flood management framework. Sustainable Computing: Informatics and Systems, 20 , 102–117.

Google Scholar  

Machorro-Cano, I., Alor-Hernández, G., Paredes-Valverde, M. A., Rodríguez-Mazahua, L., Sánchez-Cervantes, J. L., & Olmedo-Aguirre, J. O. (2020). HEMS-IoT: A big data and machine learning-based smart home system for energy saving. Energies, 13 , 1097.

Raptis, T. P., Passarella, A., & Conti, M. (2018). Performance analysis of latency-aware data management in industrial IoT networks. Sensors, 18 , 2611.

Seng, K. P., & Ang, L.-M. (2018). A big data layered architecture and functional units for the multimedia Internet of Things. IEEE Transactions on Multi-Scale Computing Systems, 4 , 500–512.

Muangprathub, J., Boonnam, N., Kajornkasirat, S., Lekbangpong, N., Wanichsombat, A., & Nillaor, P. (2019). IoT and agriculture data analysis for smart farm. Computers and electronics in agriculture, 156 , 467–474.

Chilipirea, C., Petre, A.-C., Groza, L.-M., Dobre, C., & Pop, F. (2017). An integrated architecture for future studies in data processing for smart cities. Microprocessors and Microsystems, 52 , 335–342.

Enayet, A., Razzaque, M. A., Hassan, M. M., Alamri, A., & Fortino, G. (2018). A mobility-aware optimal resource allocation architecture for big data task execution on mobile cloud in smart cities. IEEE Communications Magazine, 56 , 110–117.

Plageras, A. P., Psannis, K. E., Stergiou, C., Wang, H., & Gupta, B. B. (2018). Efficient IoT-based sensor BIG data collection–processing and analysis in smart buildings. Future Generation Computer Systems, 82 , 349–357.

Syafrudin, M., Alfian, G., Fitriyani, N. L., & Rhee, J. (2018). Performance analysis of IoT-based sensor, big data processing, and machine learning model for real-time monitoring system in automotive manufacturing. Sensors, 18 , 2946.

El-Hasnony, I. M., Mostafa, R. R., Elhoseny, M., & Barakat, S. I. (2021). Leveraging mist and fog for big data analytics in IoT environment. Transactions on Emerging Telecommunications Technologies . https://doi.org/10.1002/ett.4057

Jindal, A., Kumar, N., & Singh, M. (2020). A unified framework for big data acquisition, storage, and analytics for demand response management in smart cities. Future Generation Computer Systems, 108 , 921–934.

Hussain, M. M., Beg, M. S., & Alam, M. S. (2020). Fog computing for big data analytics in IoT aided smart grid networks. Wireless Personal Communications . https://doi.org/10.1007/s11277-020-07538-1

Zhou, Z., Yu, H., & Shi, H. (2020). Human activity recognition based on improved Bayesian convolution network to analyze health care data using wearable IoT device. IEEE Access, 8 , 86411–86418.

Sengupta, S., & Bhunia, S. S. (2020). Secure data management in cloudlet assisted IoT enabled e-health framework in smart city. IEEE Sensors Journal, 20 , 9581–9588.

Babar, M., & Arif, F. (2019). Real-time data processing scheme using big data analytics in internet of things based smart transportation environment. Journal of Ambient Intelligence and Humanized Computing, 10 , 4167–4177.

Hong-Tan, L., Cui-hua, K., Muthu, B., & Sivaparthipan, C. (2021). Big data and ambient intelligence in IoT-based wireless student health monitoring system. Aggression and Violent Behavior . https://doi.org/10.1016/j.avb.2021.101601

Paul, A., Ahmad, A., Rathore, M. M., & Jabbar, S. (2016). Smartbuddy: Defining human behaviors using big data analytics in social internet of things. IEEE Wireless communications, 23 , 68–74.

Gohar, M., Ahmed, S. H., Khan, M., Guizani, N., Ahmed, A., & Rahman, A. U. (2018). A big data analytics architecture for the internet of small things. IEEE Communications Magazine, 56 , 128–133.

Armoogum, S., & Li, X. (2019). Big data analytics and deep learning in bioinformatics with hadoop. In Deep Learning and Parallel Computing Environment for Bioengineering Systems . Elsevier, pp. 17–36.

Demchenko, Y., Turkmen, F., de Laat, C., Hsu, C. H., Blanchet, C., & Loomis, C. (2017). Cloud computing infrastructure for data intensive applications. In Big Data Analytics for Sensor-Network Collected Intelligence . Elsevier, pp. 21–62.

Wu, X., Zheng, W., Xia, X., & Lo, D. (2021). Data quality matters: A case study on data label correctness for security bug report prediction. IEEE Transactions on Software Engineering . https://doi.org/10.1109/TSE.2021.3063727

Erraissi, A., & Belangour, A. (2018). Data sources and ingestion big data layers: Meta-modeling of key concepts and features. International Journal of Engineering & Technology, 7 , 3607–3612.

Ji, C., Shao, Q., Sun, J., Liu, S., Pan, L., Wu, L., et al. (2016). Device data ingestion for industrial big data platforms with a case study. Sensors, 16 , 279.

Isah, H., & Zulkernine F (2018) A scalable and robust framework for data stream ingestion. In 2018 IEEE International Conference on Big Data (Big Data) . pp. 2900-2905

Dai, H.-N., Wong, R.C.-W., Wang, H., Zheng, Z., & Vasilakos, A. V. (2019). Big data analytics for large-scale wireless networks: Challenges and opportunities. ACM Computing Surveys (CSUR), 52 , 1–36.

Chawla, H., & Khattar, P., (2020). Data ingestion. In Data Lake Analytics on Microsoft Azure . Springer, pp. 43–85.

Sankaranarayanan, S., Rodrigues, J. J., Sugumaran, V., & Kozlov, S. (2020). Data flow and distributed deep neural network based low latency IoT-edge computation model for big data environment. Engineering Applications of Artificial Intelligence, 94 , 103785.

Davoudian, A., Chen, L., & Liu, M. (2018). A survey on NoSQL stores. ACM Computing Surveys (CSUR), 51 , 1–43.

Cao, B., Sun, Z., Zhang, J., & Gu, Y. (2021). Resource allocation in 5G IoV architecture based on SDN and fog-cloud computing. IEEE Transactions on Intelligent Transportation Systems, 22 , 3832–3840.

Sonbol, K., Özkasap, Ö., Al-Oqily, I., & Aloqaily, M. (2020). EdgeKV: Decentralized, scalable, and consistent storage for the edge. Journal of Parallel and Distributed Computing, 144 , 28–40.

Akanbi, A., & Masinde, M. (2020). A distributed stream processing middleware framework for real-time analysis of heterogeneous data on big data platform: case of environmental monitoring. Sensors, 20 , 3166.

Harb, H., Mroue, H., Mansour, A., Nasser, A., & Motta Cruz, E. (2020). A hadoop-based platform for patient classification and disease diagnosis in healthcare applications. Sensors, 20 , 1931.

Osman, A. M. S. (2019). A novel big data analytics framework for smart cities. Future Generation Computer Systems, 91 , 620–633.

Alves, J. M., Honório, L. M., & Capretz, M. A. (2019). ML4IoT: A framework to orchestrate machine learning workflows on internet of things data. IEEE Access, 7 , 152953–152967.

Oğur, N. B., Al-Hubaishi, M., & Çeken, C. (2022). IoT data analytics architecture for smart healthcare using RFID and WSN. ETRI Journal, 44 , 135–146.

Bashir, M. R., Gill, A. Q., Beydoun, G., & Mccusker, B. (2020). Big data management and analytics metamodel for IoT-enabled smart buildings. IEEE Access, 8 , 169740–169758.

Chhabra, G. S., Singh, V. P., & Singh, M. (2018). Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimedia Tools and Applications . https://doi.org/10.1007/s11042-018-6338-1

Vögler, M., Schleicher, J. M., Inzinger, C., & Dustdar, S. (2017). Ahab: A cloud-based distributed big data analytics framework for the internet of things. Software: Practice and Experience, 47 , 443–454.

Farmanbar, M., & Rong, C. (2020). Triangulum city dashboard: An interactive data analytic platform for visualizing smart city performance. Processes, 8 , 250.

Ghallab, H., Fahmy, H., & Nasr, M. (2020). Detection outliers on internet of things using big data technology. Egyptian Informatics Journal, 21 , 131–138.

Lan, K., Fong, S., Song, W., Vasilakos, A. V., & Millham, R. C. (2017). Self-adaptive pre-processing methodology for big data stream mining in internet of things environmental sensor monitoring. Symmetry, 9 , 244.

He, X., Wang, K., Huang, H., & Liu, B. (2018). QoE-driven big data architecture for smart city. IEEE Communications Magazine, 56 , 88–93.

Singh, A., Garg, S., Batra, S., Kumar, N., & Rodrigues, J. J. (2018). Bloom filter based optimization scheme for massive data handling in IoT environment. Future Generation Computer Systems, 82 , 440–449.

Yu, W., Liu, Y., Dillon, T., Rahayu, W., & Mostafa, F. (2021). An integrated framework for health state monitoring in a smart factory employing IoT and big data techniques. IEEE Internet of Things Journal, 9 , 2443–2454.

Zhang, Q., Zhu, C., Yang, L. T., Chen, Z., Zhao, L., & Li, P. (2017). An incremental CFS algorithm for clustering large data in industrial Internet of Things. IEEE Transactions on Industrial Informatics, 13 , 1193–1201.

Shaji, B., Lal Raja Singh, R., & Nisha, K. (2022). A novel deep neural network based marine predator model for effective classification of big data from social internet of things. Concurrency and Computation: Practice and Experience . https://doi.org/10.1002/cpe.7244

Al-Osta, M., Bali, A., & Gherbi, A. (2019). Event driven and semantic based approach for data processing on IoT gateway devices. Journal of Ambient Intelligence and Humanized Computing, 10 , 4663–4678.

Deng, X., Jiang, P., Peng, X., & Mi, C. (2018). An intelligent outlier detection method with one class support tucker machine and genetic algorithm toward big sensor data in Internet of Things. IEEE Transactions on Industrial Electronics, 66 , 4672–4683.

Yao, X., Wang, J., Shen, M., Kong, H., & Ning, H. (2019). An improved clustering algorithm and its application in IoT data analysis. Computer Networks, 159 , 63–72.

Mansour, R. F., Abdel-Khalek, S., Hilali-Jaghdam, I., Nebhen, J., Cho, W., & Joshi, G. P. (2021). An intelligent outlier detection with machine learning empowered big data analytics for mobile edge computing. Cluster Computing . https://doi.org/10.1007/s10586-021-03472-4

Karyotis, V., Tsitseklis, K., Sotiropoulos, K., & Papavassiliou, S. (2018). Big data clustering via community detection and hyperbolic network embedding in IoT applications. Sensors, 18 , 1205.

Chui, K. T., Liu, R. W., Lytras, M. D., & Zhao, M. (2019). Big data and IoT solution for patient behaviour monitoring. Behaviour & Information Technology, 38 , 940–949.

Song, C.-W., Jung, H., & Chung, K. (2019). Development of a medical big-data mining process using topic modeling. Cluster Computing, 22 , 1949–1958.

Khan, M., Iqbal, J., Talha, M., Arshad, M., Diyan, M., & Han, K. (2018). Big data processing using internet of software defined things in smart cities. International Journal of Parallel Programming . https://doi.org/10.1007/s10766-018-0573-y

Gohar, M., Muzammal, M., & Rahman, A. U. (2018). SMART TSS: Defining transportation system behavior using big data analytics in smart cities. Sustainable cities and society, 41 , 114–119.

Anbarasan, M., Muthu, B., Sivaparthipan, C., Sundarasekar, R., Kadry, S., Krishnamoorthy, S., et al. (2020). Detection of flood disaster system based on IoT, big data and convolutional deep neural network. Computer Communications, 150 , 150–157.

Luo, X., Oyedele, L. O., Ajayi, A. O., Monyei, C. G., Akinade, O. O., & Akanbi, L. A. (2019). Development of an IoT-based big data platform for day-ahead prediction of building heating and cooling demands. Advanced Engineering Informatics, 41 , 100926.

Hossain, M. A., Ferdousi, R., Hossain, S. A., Alhamid, M. F., & El Saddik, A. (2020). A novel framework for recommending data mining algorithm in dynamic iot environment. IEEE Access, 8 , 157333–157345.

Safa, M., & Pandian, A. (2021). Intelligent big data analytics model for efficient cardiac disease prediction with IoT devices in WSN using fuzzy rules. Wireless Personal Communications . https://doi.org/10.1007/s11277-021-08788-3

Alsaig, A., Alagar, V., Chammaa, Z., & Shiri, N. (2019). Characterization and efficient management of big data in IoT-driven smart city development. Sensors, 19 , 2430.

Tang, R., & Fong, S. (2018). Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop. Future Generation Computer Systems, 86 , 1395–1412.

Kotenko, I., Saenko, I., & Branitskiy, A. (2018). Framework for mobile internet of things security monitoring based on big data processing and machine learning. IEEE Access . https://doi.org/10.1109/ACCESS.2018.2881998

Wang, T., Bhuiyan, M. Z. A., Wang, G., Rahman, M. A., Wu, J., & Cao, J. (2018). Big data reduction for a smart city’s critical infrastructural health monitoring. IEEE Communications Magazine, 56 , 128–133.

Kaur, I., Lydia, E. L., Nassa, V. K., Shrestha, B., Nebhen, J., Malebary, S., et al. (2021). Generative adversarial networks with quantum optimization model for mobile edge computing in IoT big data. Wireless Personal Communications . https://doi.org/10.1007/s11277-021-08706-7

Lakshmanaprabu, S., Shankar, K., Ilayaraja, M., Nasir, A. W., Vijayakumar, V., & Chilamkurti, N. (2019). Random forest for big data classification in the internet of things using optimal features. International journal of machine learning and cybernetics, 10 , 2609–2618.

Ullah, F., Habib, M. A., Farhan, M., Khalid, S., Durrani, M. Y., & Jabbar, S. (2017). Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare. Sustainable Cities and Society, 34 , 90–96.

Manogaran, G., Varatharajan, R., Lopez, D., Kumar, P. M., Sundarasekar, R., & Thota, C. (2018). A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system. Future Generation Computer Systems, 82 , 375–387.

Hendawi, A., Gupta, J., Liu, J., Teredesai, A., Ramakrishnan, N., Shah, M., et al. (2019). Benchmarking large-scale data management for internet of things. The Journal of Supercomputing, 75 , 8207–8230.

Mo, Y. (2019). A data security storage method for IoT under hadoop cloud computing platform. International Journal of Wireless Information Networks, 26 , 152–157.

Tu, L., Liu, S., Wang, Y., Zhang, C., Li, P. (2019). An optimized cluster storage method for real-time big data in internet of things. The Journal of Supercomputing . 1–17.

Tripathi, A. K., Sharma, K., Bala, M., Kumar, A., Menon, V. G., & Bashir, A. K. (2020). A parallel military-dog-based algorithm for clustering big data in cognitive industrial internet of things. IEEE Transactions on Industrial Informatics, 17 , 2134–2142.

Alelaiwi, A. (2017). A collaborative resource management for big IoT data processing in Cloud. Cluster Computing, 20 , 1791–1799.

Meerja, K. A., Naidu, P. V., & Kalva, S. R. K. (2019). Price versus performance of big data analysis for cloud based internet of things networks. Mobile Networks and Applications, 24 , 1078–1094.

Wang, T., Liang, Y., Zhang, Y., Arif, M., Wang, J., & Jin, Q. (2020). An intelligent dynamic offloading from cloud to edge for smart IoT systems with big data. IEEE Transactions on Network Science and Engineering . https://doi.org/10.1109/TNSE.2020.2988052

Vasconcelos, D., Andrade, R., Severino, V., & Souza, J. D. (2019). Cloud, fog, or mist in IoT? That is the question. ACM Transactions on Internet Technology (TOIT), 19 , 1–20.

Jamil, B., Ijaz, H., Shojafar, M., Munir, K., & Buyya, R. (2022). Resource allocation and task scheduling in fog computing and internet of everything environments: A taxonomy, review, and future directions. ACM Computing Surveys (CSUR) . https://doi.org/10.1145/3513002

Javadzadeh, G., & Rahmani, A. M. (2020). Fog computing applications in smart cities: A systematic survey. Wireless Networks, 26 , 1433–1457.

Cao, B., Zhang, J., Liu, X., Sun, Z., Cao, W., Nowak, R. M., et al. (2021). Edge–cloud resource scheduling in space–air–ground-integrated networks for internet of vehicles. IEEE Internet of Things Journal, 9 , 5765–5772.

Linaje, M., Berrocal, J., & Galan-Benitez, A. (2019). Mist and edge storage: Fair storage distribution in sensor networks. IEEE Access, 7 , 123860–123876.

Mehdipour, F., Noori, H., & Javadi, B. (2016). Energy-efficient big data analytics in datacenters. In Advances in Computers . Vol. 100. Elsevier, pp. 59–101.

Zhou, L., Mao, H., Zhao, T., Wang, V. L., Wang, X., & Zuo, P. (2022). How B2B platform improves Buyers’ performance: Insights into platform’s substitution effect. Journal of Business Research, 143 , 72–80.

García-Magariño, I., Lacuesta, R., & Lloret, J. (2017). Agent-based simulation of smart beds with Internet-of-Things for exploring big data analytics. IEEE Access, 6 , 366–379.

Bi, Z., Jin, Y., Maropoulos, P., Zhang, W.-J., & Wang, L. (2021). Internet of things (IoT) and big data analytics (BDA) for digital manufacturing (DM). International Journal of Production Research . https://doi.org/10.1080/00207543.2021.1953181

Ahmed, I., Ahmad, M., Jeon, G., & Piccialli, F. (2021). A framework for pandemic prediction using big data analytics. Big Data Research, 25 , 100190.

Puschmann, D., Barnaghi, P., & Tafazolli, R. (2016). Adaptive clustering for dynamic IoT data streams. IEEE Internet of Things Journal, 4 , 64–74.

Bu, F. (2018). An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT. Future Generation Computer Systems, 88 , 675–682.

Zhang, Q., Yang, L. T., Chen, Z., & Li, P. (2018). High-order possibilistic c-means algorithms based on tensor decompositions for big data in IoT. Information Fusion, 39 , 72–80.

Lavalle, A., Teruel, M. A., Maté, A., & Trujillo, J. (2020). Improving sustainability of smart cities through visualization techniques for big data from IoT devices. Sustainability, 12 , 5595.

Li, P., Chen, Z., Yang, L. T., Zhang, Q., & Deen, M. J. (2017). Deep convolutional computation model for feature learning on big data in internet of things. IEEE Transactions on Industrial Informatics, 14 , 790–798.

Patterson, E. K., Gurbuz, S., Tufekci, Z., & Gowdy, J. N. (2002). CUAVE: A new audio-visual database for multimodal human-computer interface research. In 2002 IEEE International conference on acoustics, speech, and signal processing , pp. II-2017-II-2020.

Zhang, Q., Yang, L. T., & Chen, Z. (2015). Deep computation model for unsupervised feature learning on big data. IEEE Transactions on Services Computing, 9 , 161–171.

Cauteruccio, F., Cinelli, L., Corradini, E., Terracina, G., Ursino, D., Virgili, L., et al. (2021). A framework for anomaly detection and classification in Multiple IoT scenarios. Future Generation Computer Systems, 114 , 322–335.

Liang, W., Li, W., & Feng, L. (2021). Information security monitoring and management method based on big data in the internet of things environment. IEEE Access, 9 , 39798–39812.

Vahdat, S. (2022). A review of pathophysiological mechanism, diagnosis, and treatment of thrombosis risk associated with COVID-19 infection. IJC Heart & Vasculature . https://doi.org/10.1016/j.ijcha.2022.101068

Abbasi, S., Naderi, Z., Amra, B., Atapour, A., Dadkhahi, S. A., Eslami, M. J., et al. (2021). Hemoperfusion in patients with severe COVID-19 respiratory failure, lifesaving or not? Journal of Research in Medical Sciences, 26 , 34.

Li, W., Chai, Y., Khan, F., Jan, S. R. U., Verma, S., Menon, V. G., et al. (2021). A comprehensive survey on machine learning-based big data analytics for IoT-enabled smart healthcare system. Mobile Networks and Applications, 26 , 234–252.

Biswas, R. (2022). Outlining big data analytics in health sector with special reference to Covid-19. Wireless Personal Communications, 124 , 2097–2108.

Wu, X., Zhang, Y., Wang, A., Shi, M., Wang, H., & Liu, L. (2020). MNSSp3: Medical big data privacy protection platform based on Internet of things. Neural Computing and Applications . https://doi.org/10.1007/s00521-020-04873-z

Elhoseny, M., Abdelaziz, A., Salama, A. S., Riad, A. M., Muhammad, K., & Sangaiah, A. K. (2018). A hybrid model of internet of things and cloud computing to manage big data in health services applications. Future generation computer systems, 86 , 1383–1394.

Jan, M. A., He, X., Song, H., & Babar, M. (2021). Machine learning and big data analytics for IoT-enabled smart cities. Mobile Networks and Applications, 26 , 156–158.

Liu, Z., Wang, Y., & Feng, J. (2022). Vehicle-type strategies for manufacturer’s car sharing. Kybernetes . https://doi.org/10.1108/K-11-2021-1095

Khan, M. A., Siddiqui, M. S., Rahmani, M. K. I., & Husain, S. (2021). Investigation of big data analytics for sustainable smart city development: An emerging country. IEEE Access, 10 , 16028–16036.

Sivaparthipan, C., Muthu, B. A., Manogaran, G., Maram, B., Sundarasekar, R., Krishnamoorthy, S., et al. (2020). Innovative and efficient method of robotics for helping the Parkinson’s disease patient using IoT in big data analytics. Transactions on Emerging Telecommunications Technologies, 31 , e3838.

Yang, L., Xiong, Z., Liu, G., Hu, Y., Zhang, X., & Qiu, M. (2021). An analytical model of page dissemination for efficient big data transmission of C-ITS. IEEE Transactions on Intelligent Transportation Systems . https://doi.org/10.1109/TITS.2021.3134557

Zantalis, F., Koulouras, G., Karabetsos, S., & Kandris, D. (2019). A review of machine learning and IoT in smart transportation. Future Internet, 11 , 94.

Guo, J., Liu, R., Cheng, D., Shanthini, A., & Vadivel, T. (2022). Urbanization based on IoT using big data analytics the impact of internet of things and big data in urbanization. Arabian Journal for Science and Engineering . https://doi.org/10.1007/s13369-021-06124-2

Shao, N. (2022). Research on architectural planning and landscape design of smart city based on computational intelligence. Computational Intelligence and Neuroscience. 2022.

Jia, T., Cai, C., Li, X., Luo, X., Zhang, Y., & Yu, X. (2022). Dynamical community detection and spatiotemporal analysis in multilayer spatial interaction networks using trajectory data. International Journal of Geographical Information Science . https://doi.org/10.1080/13658816.2022.2055037

Kahveci, S., Alkan, B., Mus’ab, H. A., Ahmad, B., & Harrison, R. (2022). An end-to-end big data analytics platform for IoT-enabled smart factories: A case study of battery module assembly system for electric vehicles. Journal of Manufacturing Systems, 63 , 214–223.

Nitti, M., Girau, R., & Atzori, L. (2013). Trustworthiness management in the social internet of things. IEEE Transactions on knowledge and data engineering, 26 , 1253–1266.

Shahab, S., Agarwal, P., Mufti, T., & Obaid, A. J. (2022). SIoT (social internet of things): A review. ICT Analysis and Applications . https://doi.org/10.1007/978-981-16-5655-2_28

Atzori, L., Iera, A., Morabito, G., & Nitti, M. (2012). The social internet of things (siot)–when social networks meet the internet of things: Concept, architecture and network characterization. Computer networks, 56 , 3594–3608.

Baldassarre, G., Giudice, P. L., Musarella, L., & Ursino, D. (2019). The MIoT paradigm: Main features and an “ad-hoc” crawler. Future Generation Computer Systems, 92 , 29–42.

Meghana, J., Hanumanthappa, J., & Prakash, S. S. (2021). Performance comparison of machine learning algorithms for data aggregation in social internet of things. Global Transitions Proceedings, 2 , 212–219.

Lo Giudice, P., Nocera, A., Ursino, D., & Virgili, L. (2019). Building topic-driven virtual iots in a multiple iots scenario. Sensors, 19 , 2956.

McCall, J. (1994). Quality factors, encyclopedia of software engineering. (vol. 2, p. 760). New York: Wiley

Boehm, B., & In, H. (1996). Identifying quality-requirement conflicts. IEEE software, 13 , 25–35.

Grady, R. B. (1992). Practical software metrics for project management and process improvement : Prentice-Hall, Inc.

Talia, D. (2019). A view of programming scalable data analysis: From clouds to exascale. Journal of Cloud Computing, 8 , 1–16.

Firmani, D., Mecella, M., Scannapieco, M., & Batini, C. (2016). On the meaningfulness of “big data quality.” Data Science and Engineering, 1 , 6–20.

Jabbar, S., Ullah, F., Khalid, S., Khan, M., & Han, K. (2017). Semantic interoperability in heterogeneous IoT infrastructure for healthcare. Wireless Communications and Mobile Computing, 2017

Rialti, R., Marzi, G., Caputo, A., & Mayah, K. A. (2020) Achieving strategic flexibility in the era of big data. Management Decision .

Roy, D., Srivastava, R., Jat, M., & Karaca, M. S. (2022). A complete overview of analytics techniques: descriptive, predictive, and prescriptive. Decision intelligence analytics and the implementation of strategic business management, 15–30.

Rahul, K., Banyal, R. K., Goswami, P., & Kumar, V. (2021). Machine learning algorithms for big data analytics. In Computational Methods and Data Engineering , Springer, pp. 359–367.

Nti, I. K., Quarcoo, J. A., Aning, J., & Fosu, G. K. (2022). A mini-review of machine learning in big data analytics: Applications, challenges, and prospects. Big Data Mining and Analytics, 5 , 81–97.

Rajendran, R., Sharma, P., Saran, N. K., Ray, S., Alanya-Beltran, J., & Tongkachok, K. (2022) An exploratory analysis of machine learning adaptability in big data analytics environments: A data aggregation in the age of big data and the internet of things. In 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM) , pp. 32–36.

Angelopoulos, A., Michailidis, E. T., Nomikos, N., Trakadas, P., Hatziefremidis, A., Voliotis, S., et al. (2019). Tackling faults in the industry 4.0 era—a survey of machine-learning solutions and key aspects. Sensors, 20 , 109.

Zhou, L., Pan, S., Wang, J., & Vasilakos, A. V. (2017). Machine learning on big data: Opportunities and challenges. Neurocomputing, 237 , 350–361.

Prastyo, D. D., Khoiri, H. A., Purnami, S. W., Fam, S.-F., & Suhermi, N. (2020). Survival support vector machines: A simulation study and its health-related application. Supervised and Unsupervised Learning for Data Science (pp. 85–100). Cham: Springer.

Chapter   Google Scholar  

Pink, C. M. (2016). Forensic ancestry assessment using cranial nonmetric traits traditionally applied to biological distance studies. In Biological Distance Analysis , Elsevier, pp. 213–230.

Lu, W. (2019). Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. Journal of Grid Computing . https://doi.org/10.1007/s10723-019-09503-0

Zheng, W., Liu, X., & Yin, L. (2021). Research on image classification method based on improved multi-scale relational network. PeerJ Computer Science, 7 , e613.

Goswami, S., & Kumar, A. (2022). Survey of deep-learning techniques in big-data analytics. Wireless Personal Communications . https://doi.org/10.1007/s11277-022-09793-w

Roni, M., Karim, H., Rana, M., Pota, H., Hasan, M., & Hussain, M. (2022). Recent trends in bio-inspired meta-heuristic optimization techniques in control applications for electrical systems: A review. International Journal of Dynamics and Control . https://doi.org/10.1007/s40435-021-00892-3

Swayamsiddha, S. (2020). Bio-inspired algorithms: principles, implementation, and applications to wireless communication. In Nature-Inspired Computation and Swarm Intelligence . Elsevier, pp. 49–63.

Ni, J., Wu, L., Fan, X., & Yang, S. X. (2016). Bioinspired intelligent algorithm and its applications for mobile robot control: a survey. Computational intelligence and neuroscience, 2016 .

Game, P. S., & Vaze, D. (2020). Bio-inspired Optimization: metaheuristic algorithms for optimization. arXiv preprint arXiv:2003.11637 .

Romero, C. D. G., Barriga, J. K. D., & Molano, J. I. R. (2016) Big data meaning in the architecture of IoT for smart cities. In International Conference on Data Mining and Big Data , pp. 457–465.

Santana, E. F. Z., Chaves, A. P., Gerosa, M. A., Kon, F., & Milojicic, D. S. (2017). Software platforms for smart cities: Concepts, requirements, challenges, and a unified reference architecture. ACM Computing Surveys (Csur), 50 , 1–37.

Granat, J., Batalla, J. M., Mavromoustakis, C. X., & Mastorakis, G. (2019). Big data analytics for event detection in the IoT-multicriteria approach. IEEE Internet of Things Journal, 7 , 4418–4430.

Xiong, Z., Zhang, Y., Luong, N. C., Niyato, D., Wang, P., & Guizani, N. (2020). The best of both worlds: A general architecture for data management in blockchain-enabled Internet-of-Things. IEEE Network, 34 , 166–173.

Oktian, Y. E., Lee, S.-G., & Lee, B.-G. (2020). Blockchain-based continued integrity service for IoT big data management: A comprehensive design. Electronics, 9 , 1434.

Liu, F., Zhang, G., & Lu, J. (2020). Multisource heterogeneous unsupervised domain adaptation via fuzzy relation neural networks. IEEE Transactions on Fuzzy Systems, 29 , 3308–3322.

Dong, J., Cong, Y., Sun, G., Fang, Z., & Ding, Z. (2021). Where and how to transfer: knowledge aggregation-induced transferability perception for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence . https://doi.org/10.1109/TPAMI.2021.3128560

Zenggang, X., Xiang, L., Xueming, Z., Sanyuan, Z., Fang, X., Xiaochao, Z., et al. (2022). A service pricing-based two-stage incentive algorithm for socially aware networks. Journal of Signal Processing Systems . https://doi.org/10.1007/s11265-022-01768-1

Benhamaid, S., Lakhlef, H., & Bouabdallah, A. (2021) Towards energy efficient mobile data collection in cluster-based IoT networks. In 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) , pp. 340-343.

Sun, W., Lv, X., & Qiu, M. (2020). Distributed estimation for stochastic Hamiltonian systems with fading wireless channels. IEEE Transactions on Cybernetics .

Lv, Z., Qiao, L., & You, I. (2020). 6G-enabled network in box for internet of connected vehicles. IEEE transactions on intelligent transportation systems, 22 , 5275–5282.

Xifilidis, T., & Psannis, K. E. (2022). Correlation-based wireless sensor networks performance: The compressed sensing paradigm. Cluster Computing, 25 , 965–981.

Mohammadi, A., Ciuonzo, D., Khazaee, A., & Rossi, P. S. (2022). Generalized locally most powerful tests for distributed sparse signal detection. IEEE Transactions on Signal and Information Processing over Networks, 8 , 528–542.

Aziz, A., Osamy, W., Khedr, A. M., El-Sawy, A. A., & Singh, K. (2020). Grey Wolf based compressive sensing scheme for data gathering in IoT based heterogeneous WSNs. Wireless Networks, 26 , 3395–3418.

Djelouat, H., Amira, A., & Bensaali, F. (2018). Compressive sensing-based IoT applications: A review. Journal of Sensor and Actuator Networks, 7 , 45.

Wang, K., Zhang, B., Alenezi, F., & Li, S. (2022). Communication-efficient surrogate quantile regression for non-randomly distributed system. Information Sciences, 588 , 425–441.

Lee, G. H., Han, J., & Choi, J. K. (2021). MPdist-based missing data imputation for supporting big data analyses in IoT-based applications. Future Generation Computer Systems, 125 , 421–432.

Zhang, F., Zhai, J., Shen, X., Mutlu, O., & Du, X. (2021). POCLib: A high-performance framework for enabling near orthogonal processing on compression. IEEE Transactions on Parallel and Distributed Systems, 33 , 459–475.

Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data applications in IoT cloud computing environments. Cluster Computing, 24 , 2957–2976.

Naas, M. I., Lemarchand, L., Raipin, P., & Boukhobza, J. (2021). IoT data replication and consistency management in fog computing. Journal of Grid Computing, 19 , 1–25.

Ma, Z., Zheng, W., Chen, X., & Yin, L. (2021). Joint embedding VQA model based on dynamic word vector. PeerJ Computer Science, 7 , e353.

Rahouma, K. H., Aly, R. H., & Hamed, H. F. (2020). Challenges and solutions of using the social internet of things in healthcare and medical solutions—a survey. Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures and Applications (pp. 13–30). Cham: Springer.

Corradini, E., Nicolazzo, S., Nocera, A., Ursino, D., & Virgili, L. (2022). A two-tier Blockchain framework to increase protection and autonomy of smart objects in the IoT. Computer Communications, 181 , 338–356.

Pincheira, M., Antonini, M., & Vecchio, M. (2022). Integrating the IoT and blockchain technology for the next generation of mining inspection systems. Sensors, 22 , 899.

Tchagna Kouanou, A., Tchito Tchapga, C., Sone Ekonde, M., Monthe, V., Mezatio, B. A., Manga, J., et al. (2022). Securing data in an internet of things network using blockchain technology: smart home case. SN Computer Science, 3 , 1–10.

Ursino, D., & Virgili, L. (2020). An approach to evaluate trust and reputation of things in a Multi-IoTs scenario. Computing, 102 , 2257–2298.

Download references

Author information

Arezou Naghib

Present address: Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Arash Sharifi

Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Kadir Has University, Istanbul, Turkey

Nima Jafari Navimipour

Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran

Institute of Research and Development, Duy Tan University, Da Nang, Vietnam

Mehdi Hosseinzadeh

School of Medicine and Pharmacy, Duy Tan University, Da Nang, Vietnam

Computer Science, University of Human Development, Sulaymaniyah, 0778-6, Iraq

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nima Jafari Navimipour .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Naghib, A., Jafari Navimipour, N., Hosseinzadeh, M. et al. A comprehensive and systematic literature review on the big data management techniques in the internet of things. Wireless Netw 29 , 1085–1144 (2023). https://doi.org/10.1007/s11276-022-03177-5

Download citation

Accepted : 19 October 2022

Published : 15 November 2022

Issue Date : April 2023

DOI : https://doi.org/10.1007/s11276-022-03177-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Big data management
  • Internet of things
  • Knowledge discovery
  • Systematic literature review (SLR)
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

BDCC-logo

Article Menu

big data related research papers

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Explore big data analytics applications and opportunities: a review.

big data related research papers

1. Introduction

2. literature review, 2.1. big data and analytics, 2.2. characteristics of big data, 2.3. the types of data analytics.

  • Descriptive
  • Prescriptive
  • Business Intelligent, which consist of both Descriptive analytics and Diagnostic analytics
  • Advanced Analytics, which consider the higher data analytics types in maturity level, namely predictive and prescriptive analytics [ 55 , 56 , 59 ].

3. Big Data Analytics Opportunities and Applications

4. big data applications pre the covid-19 pandemic, 4.1. big data in healthcare, 4.2. big data in education, 4.3. big data in transportation.

  • First source of data which is the primary source is the direct physical sensing. Represented, in road-side static sensors such as LiDAR, microwave Radars, and sensors that measure speed, noise, and traffic flow known as acoustic sensors [ 100 ]. Other examples are the use of mobile phone technologies such as GPS, GSM, and Bluetooth [ 97 ].
  • The second source of data is the social media sources “human & social Sensing” highlighted in the use of motorists to the smartphone-compatible platforms [ 101 ]. For instance Instagram, twitter and others [ 97 ].
  • The third category of data source is urban sensing which is generated by transportation operators. In this category data captured can analyze urban mobility in terms of congestion and traffic flows [ 102 ]. This can be performed via credit cards and smart cards scanned through urban sensors from public transit, retail scanners and digital toll systems [ 97 ].

4.4. Big Data in Banking

5. big data applications peri the covid-19 pandemic, 5.1. big data in healthcare, 5.2. big data in education, 5.3. big data in transportation, 5.4. big data in banking, 5.5. big data analytics across industry, 6. conclusions, author contributions, institutional review board statement, data availability statement, conflicts of interest.

  • Alhomdy, S.; Thabit, F.; Abdulrazzak, F.A.H.; Haldorai, A.; Jagtap, S. The role of cloud computing technology: A savior to fight the lockdown in COVID-19 crisis, the benefits, characteristics and applications. Int. J. Intell. Netw. 2021 , 2 , 166–174. [ Google Scholar ] [ CrossRef ]
  • Alsunaidi, S.J.; Almuhaideb, A.M.; Ibrahim, N.M.; Shaikh, F.S.; Alqudaihi, K.S.; Alhaidari, F.A.; Khan, I.U.; Aslam, N.; Alshahrani, M.S. Applications of Big Data Analytics to Control COVID-19 Pandemic. Sensors 2021 , 21 , 2282. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rothberg, H.N.; Erickson, G.S. Big data systems: Knowledge transfer or intelligence insights? J. Knowl. Manag. 2017 , 21 , 92–112. [ Google Scholar ] [ CrossRef ]
  • Adrian, C.; Adrian, C.; Abdullah, R.; Atan, R.; Jusoh, Y.Y. Conceptual Model Development of Big Data Analytics Implementation Assessment Effect on Article in Press Conceptual Model Development of Big Data Analytics Implementation Assessment Effect on Decision-Making. Int. J. Interact. Multimed. Artif. Intell. 2018 , 5 , 101–106. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Gupta, M.; George, J.F. Toward the development of a big data analytics capability. Inf. Manag. 2016 , 53 , 1049–1064. [ Google Scholar ] [ CrossRef ]
  • Kalema, B.M.; Mokgadi, M. Developing countries organizations’ readiness for Big Data analytics. Probl. Perspect. Manag. 2017 , 15 , 260–270. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • De Bruin, T.; Kulkarni, U.; Rosemann, M.; Freeze, R. Understanding the Main Phases of Developing a Maturity Assessment Model ; ACIS: Sydney, Australia, 2005. [ Google Scholar ]
  • Rialti, R.; Marzi, G.; Ciappei, C.; Busso, D. Big data and dynamic capabilities: A bibliometric analysis and systematic literature review. Manag. Decis. 2019 , 57 , 2052–2068. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Al-sai, Z.A.; Abdullah, R.; Husin, M.H.; Syed-mohamad, S.M. A Preliminary Systematic Literature Review On Critical Success Factors Categories For Big Data. In Proceedings of the AiIC2019, Toyama, Japan, 7–11 July 2019. [ Google Scholar ]
  • Al-Sai, Z.A.; Abdullah, R.; Husin, M.H. Big Data Impacts and Challenges: A Review. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, Amman, Jordan, 9–11 April 2019; pp. 150–155. [ Google Scholar ] [ CrossRef ]
  • Al-Sai, Z.A.; Abdullah, R.; Husin, M.H. A review on big data maturity models. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, Amman, Jordan, 9–11 April 2019; pp. 156–161. [ Google Scholar ]
  • Zulkarnain, N.; Meyliana, M.; Prabowo, H.; Nizar Hidayanto, A. The critical success factors for big data adoption in government. Int. J. Mech. Eng. Technol. 2019 , 10 , 864–875. [ Google Scholar ] [ CrossRef ]
  • Al-Sai, Z.A.; Abualigah, L.M. Big Data and E-government: A review. In Proceedings of the 2017 8th International Conference on Information Technology, Amman, Jordan, 17–18 May 2017; pp. 580–587. [ Google Scholar ] [ CrossRef ]
  • Malik, P. Governing Big Data: Principles and practices. IBM J. Res. Dev. 2013 , 57 , 1:1–1:13. [ Google Scholar ] [ CrossRef ]
  • Big Data Analytics and Fi Nancial Reporting Quality: Qualitative Evidence from Canada. Available online: https://doi.org/10.1108/JFRA-12-2021-0489 (accessed on 1 January 2021).
  • Measuring Your Big Data Maturity. Available online: https://michaelskenny.com/points-of-view/measuring-your-big-data-maturity/ (accessed on 1 January 2021).
  • Davenport, T.; Dyché, J. Big Data in Big Companies. Baylor Bus. Rev. 2013 , 32 , 20–21. [ Google Scholar ]
  • Ward, J.S.; Barker, A. Undefined By Data: A Survey of Big Data Definitions. arXiv 2013 , arXiv:1309.5821. [ Google Scholar ]
  • Bertot, J.C.; Choi, H. Big data and e-government: Issues, policies, and recommendations. In Proceedings of the 14th Annual International Conference on Digital Government Research, New York, NY, USA, 17 June 2013; pp. 1–10. [ Google Scholar ]
  • Braun, H. Evaluation of Big Data Maturity Models—A Bench- Marking Study To Support Big Data Maturity Assessment In Organizations ; Tampere University of Technology: Tampere, Finland, 2015. [ Google Scholar ]
  • Manyika, J.; Chui, M.; Brown, B.; Bughin, J.; Dobbs, R.; Roxburgh, C.; Byers, A.H. Big Data: The Next Frontier for Innovation, Competition, and Productivity ; McKinsey Global Institute: Chicago, IL, USA, 2011. [ Google Scholar ]
  • Henke, N.; Bughin, J.; Chui, M.; Manyika, J.; Saleh, T.; Wiseman, B.; Sethupathy, G. The Age of Analytics: Competing in a Data-Driven World ; McKinsey Global Institute: Chicago, IL, USA, 2016; Volume 12, pp. 904–920. [ Google Scholar ] [ CrossRef ]
  • Chen, M.; Mao, S.; Liu, Y. Big data: A survey. Mob. Netw. Appl. 2014 , 19 , 171–209. [ Google Scholar ] [ CrossRef ]
  • Romijn, B.-J. Big Data in the Public Sector: Uncertainties and Readiness in the Dutch Public Executive Sector. Inf. Syst. Frontiers. 2017 , 19 , 267–283. [ Google Scholar ]
  • Sun, S.; Cegielski, C.G.; Jia, L.; Hall, D.J. Understanding the Factors Affecting the Organizational Adoption of Big Data. J. Comput. Inf. Syst. 2016 , 58 , 193–203. [ Google Scholar ] [ CrossRef ]
  • Zainal, N.Z.; Hussin, H.; Nazri, M.N.M. Big Data Initiatives by Governments —Issues and Challenges: A Review. In Proceedings of the 2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M), Jakarta, Indonesia, 22–24 November 2016; pp. 304–309. [ Google Scholar ] [ CrossRef ]
  • Esteves, J.; Curto, J. A risk and benefits behavioral model to assess intentions to adopt big data. J. Intell. Stud. Bus. 2013 , 3 , 37–46. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kaka, E.S. E-Government Adoption and Framework for Big Data Analytics. In Proceedings of the Second Covenant University Conferenced on E-Governance in Nigeria (CUCEN 2015), Covenant University Canaanland, Ota Ogun State, Nigeria, 10–12 June 2015; pp. 1–28. [ Google Scholar ]
  • Batko, K.; Ślęzak, A. The use of Big Data Analytics in healthcare. J. Big Data 2022 , 9 , 3. [ Google Scholar ] [ CrossRef ]
  • Al Nuaimi, E.; Al Neyadi, H.; Mohamed, N.; Al-Jaroodi, J. Applications of big data to smart cities. J. Internet Serv. Appl. 2015 , 6 , 25. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lutfi, A.; Alrawad, M.; Alsyouf, A.; Almaiah, M.A.; Al-Khasawneh, A.; Al-Khasawneh, A.L.; Alshira’h, A.F.; Alshirah, M.H.; Saad, M.; Ibrahim, N. Drivers and impact of big data analytic adoption in the retail industry: A quantitative investigation applying structural equation modeling. J. Retail. Consum. Serv. 2023 , 70 , 103129. [ Google Scholar ] [ CrossRef ]
  • Big Data Analytics. Available online: https://www.ibm.com/analytics/big-data-analytics (accessed on 1 January 2021).
  • Brock, V.; Khan, H.U. Big data analytics: Does organizational factor matters impact technology acceptance? J. Big Data 2017 , 4 , 21. [ Google Scholar ] [ CrossRef ]
  • Chen, J.; Chen, Y.; Du, X.; Li, C.; Lu, J.; Zhao, S.; Zhou, X. Big Data Challenge: A Data Management Perspective. Front. Comput. Sci. 2013 , 7 , 157–164. [ Google Scholar ] [ CrossRef ]
  • Hood-Clark, S.F. Influences On The Use And Behavioral Intention To Use Big Data. Ph.D. Thesis, Capella University ProQuest Dissertations Publishing, Minneapolis, MN, USA, 2016. [ Google Scholar ]
  • Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015 , 35 , 137–144. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Munné, R. Big data in the public sector. In New Horizons for a Data-Driven Economy ; Springer: Cham, Switzerland, 2016; pp. 195–208. [ Google Scholar ]
  • Comuzzi, M.; Patel, A. How organisations leverage Big Data: A maturity model. Ind. Manag. Data Syst. 2016 , 116 , 1468–1492. [ Google Scholar ] [ CrossRef ]
  • Laney, D. META Delta. Appl. Deliv. Strateg. 2001 , 949 , 4. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kanan, T.; Mughaid, A.; Al-Shalabi, R.; Al-Ayyoub, M.; Elbe, M.; Sadaqa, O. Business intelligence using deep learning techniques for social media contents. Cluster Comput. 2022 , 1–12. [ Google Scholar ] [ CrossRef ]
  • Singh, S.; Singh, N. Big Data analytics. In Proceedings of the 2012 International Conference on Communication, Information & Computing Technology, Mumbai, India, 19–20 October 2012; pp. 1–4. [ Google Scholar ] [ CrossRef ]
  • Where Big Data Projects Fail. Available online: https://www.forbes.com/sites/bernardmarr/2015/03/17/where-big-data-projects-fail/?sh=22d01455239f (accessed on 1 January 2021).
  • Cato, P.; Golzer, P.; Demmelhuber, W. An investigation into the implementation factors affecting the success of big data systems. In Proceedings of the 2015 11th International Conference on Innovations in Information Technology, Dubai, United Arab Emirates, 1–3 November 2015; pp. 134–139. [ Google Scholar ] [ CrossRef ]
  • Motau, M.; Kalema, B.M. Big Data Analytics Readiness: A South African Public Sector Perspective. In Proceedings of the 2016 IEEE International Conference on Emerging Technologies and Innovative Business Practices for the Transformation of Societies, Balaclava, Mauritius, 3–6 August 2016. [ Google Scholar ]
  • Big Security for Big Data: Addressing Security Challenges for the Big Data Infrastructure. Available online: https://doi.org/10.1007/978-3-319-06811-4 (accessed on 1 January 2021).
  • Soon, K.W.K.; Lee, C.A.; Boursier, P. A study of the determinants affecting adoption of big data using integrated Technology Acceptance Model (TAM) and diffusion of innovation (DOI) in Malaysia. Int. J. Appl. Bus. Econ. Res. 2016 , 14 , 17–47. [ Google Scholar ]
  • Kaisler, S.; Armour, F.; Espinosa, J.A.; Money, W. Big Data: Issues and Challenges Moving Forward. In Proceedings of the 2013 46th Hawaii International Conference on System Sciences, Wailea, HI, USA, 7–10 January 2013; pp. 995–1004. [ Google Scholar ] [ CrossRef ]
  • Khan, N.; Yaqoob, I.; Hashem, I.A.; Inayat, Z.; Ali, W.M.; Shiraz, M.; Gani, A.; Member, S. Big Data: Survey, Technologies, Opportunities, and Challenges. Sci. World J. 2014 , 2014 , 712826. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Sagiroglu, S.; Sinanc, D. Big data: A review. In Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, San Diego, CA, USA, 20–24 May 2013; pp. 42–47. [ Google Scholar ] [ CrossRef ]
  • Saxena, S. Integrating Open and Big Data via e-Oman: Prospects and issues—Read. Contemp. Arab Aff. 2016 , 9 , 607–621. [ Google Scholar ] [ CrossRef ]
  • Mohanty, S.K.; Jagadeesh, M.; Srivatsa, H.K. Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics. Available online: https://www.ptonline.com/articles/how-to-get-better-mfi-results (accessed on 1 January 2021).
  • Harnessing the Potential of Big Data in Post-Pandemic Southeast Asia. Available online: https://www.adb.org/publications/potential-big-data-post-pandemic-southeast-asia (accessed on 1 January 2021).
  • Long, C.K.; Agrawal, R.; Trung, H.Q.; Van Pham, H. A big data framework for E-Government in Industry 4.0. Open Comput. Sci. 2021 , 11 , 461–479. [ Google Scholar ] [ CrossRef ]
  • Sivarajah, U.; Kamal, M.M.; Irani, Z.; Weerakkody, V. Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 2017 , 70 , 263–286. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Sheng, J.; Amankwah-Amoah, J.; Khan, Z.; Wang, X. COVID-19 Pandemic in the New Era of Big Data Analytics: Methodological Innovations and Future Research Directions. Br. J. Manag. 2021 , 32 , 1164–1183. [ Google Scholar ] [ CrossRef ]
  • Delen, D.; Zolbanin, H.M. The analytics paradigm in business research. J. Bus. Res. 2018 , 90 , 186–195. [ Google Scholar ] [ CrossRef ]
  • Rustagi, V.; Bajaj, M.; Tanvi; Singh, P.; Aggarwal, R.; AlAjmi, M.F.; Hussain, A.; Hassan, M.I.; Singh, A.; Singh, I.K. Analyzing the Effect of Vaccination Over COVID Cases and Deaths in Asian Countries Using Machine Learning Models. Front. Cell. Infect. Microbiol. 2022 , 11 , 806265. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lepenioti, K.; Bousdekis, A.; Apostolou, D.; Mentzas, G. Prescriptive analytics: Literature review and research challenges. Int. J. Inf. Manag. 2020 , 50 , 57–70. [ Google Scholar ] [ CrossRef ]
  • Delen, D.; Ram, S. Research challenges and opportunities in business analytics. J. Bus. Anal. 2018 , 1 , 2–12. [ Google Scholar ] [ CrossRef ]
  • Sekli, G.F.M.; De La Vega, I. Adoption of big data analytics and its impact on organizational performance in higher education mediated by knowledge management. J. Open Innov. Technol. Mark. Complex. 2021 , 7 , 221. [ Google Scholar ] [ CrossRef ]
  • Nageshwaran, G.; Harris, R.C.; El Guerche-Seblain, C. Review of the role of big data and digital technologies in controlling COVID-19 in Asia: Public health interest vs. privacy. Digit. Health 2021 , 7 , 20552076211002953. [ Google Scholar ] [ CrossRef ]
  • Henke, N.; Puri, A.; Saleh, T. Accelerating analytics to navigate COVID-19 and the next normal. McKinsey Anal. 2020 , 9. [ Google Scholar ]
  • Sözen, M.E.; Sarlyer, G.; Ataman, M.G. Big data analytics and COVID-19: Investigating the relationship between government policies and cases in Poland, Turkey and South Korea. Health Policy Plan. 2022 , 37 , 100–111. [ Google Scholar ] [ CrossRef ]
  • Nwanga, M.E.; Onwuka, E.N.; Aibinu, A.M.; Ubadike, O.C. Impact of Big Data Analytics to Nigerian mobile phone industry. In Proceedings of the 2015 International Conference on Industrial Engineering and Operations Management, Dubai, United Arab Emirates, 3–5 March 2015; pp. 1314–1319. [ Google Scholar ] [ CrossRef ]
  • Mathrani, S.; Lai, X. Big data analytic framework for organizational leverage. Appl. Sci. 2021 , 11 , 2340. [ Google Scholar ] [ CrossRef ]
  • Akter, S.; Wamba, S.F. Big data analytics in E-commerce: A systematic review and agenda for future research. Electron. Mark. 2016 , 26 , 173–194. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Naik, K.; Joshi, A. Role of Big Data in various sectors. In Proceedings of the 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 10–11 February 2017; pp. 117–122. [ Google Scholar ] [ CrossRef ]
  • Chen, Y.; Alspaugh, S.; Katz, R. Interactive analytical processing in big data systems: A crossindustry study of mapreduce workloads. Proc. VLDB Endow. 2012 , 5 , 1802–1813. [ Google Scholar ] [ CrossRef ]
  • Chakraborty, S.; Saha, A.K.; Ezugwu, A.E.; Agushaka, J.O.; Zitar, R.A.; Abualigah, L. Differential Evolution and Its Applications in Image Processing Problems: A Comprehensive Review. Arch. Comput. Methods Eng. 2022 , 1–56. [ Google Scholar ] [ CrossRef ]
  • Yousefinaghani, S.; Dara, R.; Mubareka, S.; Papadopoulos, A.; Sharif, S. An analysis of COVID-19 vaccine sentiments and opinions on Twitter. Int. J. Infect. Dis. 2021 , 108 , 256–262. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mellado, B.; Wu, J.; Kong, J.D.; Bragazzi, N.L.; Asgary, A.; Kawonga, M.; Choma, N.; Hayasi, K.; Lieberman, B.; Mathaha, T.; et al. Leveraging artificial intelligence and big data to optimize covid-19 clinical public health and vaccination roll-out strategies in Africa. Int. J. Environ. Res. Public Health 2021 , 18 , 7890. [ Google Scholar ] [ CrossRef ]
  • E-Government Survey 2020—Digital Government in the Decade of Action for Sustainable Development: With Addendum on COVID-19. Available online: https://www.scienceopen.com/book?vid=bc74a872-5582-485a-aafe-260bd9a415bd. (accessed on 1 January 2021).
  • REal-Time Data Monitoring for Shared, Adaptive, Multi-Domain and Personalised Prediction and Decision Making for Long-Term Pulmonary Care Ecosystems (RE-SAMPLE). Available online: https://clinicaltrials.gov/ct2/show/NCT04955080 (accessed on 1 January 2021).
  • Gherheș, V.; Stoian, C.E.; Fărcașiu, M.A.; Stanici, M. E-learning vs. Face-to-face learning: Analyzing students’ preferences and behaviors. Sustainability 2021 , 13 , 4381. [ Google Scholar ] [ CrossRef ]
  • OECD. An Assessment of the Impact of COVID-19 on Job and Skills Demand Using Online Job Vacancy Data ; OECD: Paris, France, 2021; pp. 1–19. [ Google Scholar ]
  • Innovations From The Nation. Available online: https://www.mbrcgi.gov.ae/en/enrich/innovations-from-the-nation (accessed on 1 January 2021).
  • Haleem, A.; Javaid, M.; Khan, I.H.; Vaishya, R. Significant Applications of Big Data in COVID-19 Pandemic. Indian J. Orthop. 2020 , 54 , 526–528. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Perspective, A.W. Winning in the Next Normal with Insights: How Banks & FIs Are Preparing. 2022. Available online: https://www.wns.com/%0Aperspectives/blogs/blogdetail/932/%0Awinning-in-the-next-normal-with-insights-how-banks--fis-are-preparing%0A (accessed on 1 January 2021).
  • Big Data Analytics in Healthcare. Available online: https://www.hindawi.com/journals/jhe/si/971905/ (accessed on 1 January 2021).
  • Strickland, N.H. PACS (picture archiving and communication systems): Filmless radiology. Arch. Dis. Child. 2000 , 83 , 82–86. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dash, S.; Shakyawar, S.K.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, analysis and future prospects. J. Big Data 2019 , 6 , 54. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Dayrit, M.M.; Lagrada, L.P.; Picazo, O.F.; Pons, M.C.; Villaverde, M.C. The Philippines Health System Review. Health Syst. Transit. 2018 , 8 , 2. [ Google Scholar ]
  • Wu, J.; Wang, J.; Nicholas, S.; Maitland, E.; Fan, Q. Application of big data technology for COVID-19 prevention and control in China: Lessons and recommendations. J. Med. Internet Res. 2020 , 22 , e21980. [ Google Scholar ] [ CrossRef ]
  • Berendt, B.; Littlejohn, A.; Kern, P.; Mitros, P.; Shacklock; Blakemore, M. Big Data for Monitoring Educational Systems ; Publications Office of the European Union: Luxembourg, 2017. [ Google Scholar ]
  • Dishon, G. New data, old tensions: Big data, personalized learning, and the challenges of progressive education. Theory Res. Educ. 2017 , 15 , 272–289. [ Google Scholar ] [ CrossRef ]
  • Ruiz-Palmero, J.; Colomo-Magaña, E.; Ríos-Ariza, J.M.; Gómez-García, M. Big data in education: Perception of training advisors on its use in the educational system. Soc. Sci. 2020 , 9 , 53. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Fischer, C.; Pardos, Z.A.; Baker, R.S.; Williams, J.J.; Smyth, P.; Yu, R.; Slater, S.; Baker, R.; Warschauer, M. Mining Big Data in Education: Affordances and Challenges. Rev. Res. Educ. 2020 , 44 , 130–160. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ochoa, X.; Worsley, M. Editorial: Augmenting Learning Analytics with Multimodal Sensory Data. J. Learn. Anal. 2016 , 3 , 213–219. [ Google Scholar ] [ CrossRef ]
  • Crossley, S.; Ocumpaugh, J.; Labrum, M.; Bradfield, F.; Dascalu, M.; Baker, R.S. Modeling math identity and math success through sentiment analysis and linguistic features. In Proceedings of the International Conference on Educational Data Mining (EDM), Raleigh, NC, USA, 16–20 July 2018. [ Google Scholar ]
  • Chaturapruek, S.; Dee, T.S.; Johari, R.; Kizilcec, R.F.; Stevens, M.L. How a data-driven course planning tool affects college students’ GPA: Evidence from two field experiments. In Proceedings of the 5th Annual ACM Conference on Learning at Scale, London, UK, 26–28 June 2018. [ Google Scholar ] [ CrossRef ]
  • Lukosch, H.K.; Bekebrede, G.; Kurapati, S.; Lukosch, S.G. A Scientific Foundation of Simulation Games for the Analysis and Design of Complex Systems. Simul. Gaming 2018 , 49 , 279–314. [ Google Scholar ] [ CrossRef ]
  • Teaching, E.; Educational, L.T.; Mining, D.; Sin, K.; Muthu, L.; Prakash, B.R.; Hanumanthappa, M.; Kavitha, V. Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. ICTACT J. Soft Comput. 2015 , 5 , 1035–1049. [ Google Scholar ]
  • Anaya, A.R.; Boticario, J.G. A data mining approach to reveal representative collaboration indicators in open collaboration frameworks. In Proceedings of the International Conference on Educational Data Mining (EDM), Cordoba, Spain, 1–3 July 2009; pp. 210–219. [ Google Scholar ]
  • Wang, X.; Guo, B.; Shen, Y. Predicting the At-Risk Online Students Based on the Click Data Distribution Characteristics. Sci. Program. 2022 , 2022 , 9938260. [ Google Scholar ] [ CrossRef ]
  • Zhu, Z.T.; Yu, M.H.; Riezebos, P. A research framework of smart education. Smart Learn. Environ. 2016 , 3 , 4. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Torre-Bastida, A.I.; Del Ser, J.; Laña, I.; Ilardia, M.; Bilbao, M.N.; Campos-Cordobés, S. Big Data for transportation and mobility: Recent advances, trends and challenges. IET Intell. Transp. Syst. 2018 , 12 , 742–755. [ Google Scholar ] [ CrossRef ]
  • Selod, H.; Soumahoro, S. Big Data in Transportation: An Economics Perspective ; World Bank: Washington, DC, USA, 2020. [ Google Scholar ]
  • Hou, Y.; Chen, J.; Wen, S. The effect of the dataset on evaluating urban traffic prediction. Alex. Eng. J. 2021 , 60 , 597–613. [ Google Scholar ] [ CrossRef ]
  • A Research Agenda for Transport Policy. Available online: https://www.e-elgar.com/shop/gbp/a-research-agenda-for-transport-policy-9781788970198.html (accessed on 1 January 2021).
  • New Traffic Data Sources—An Overview. Available online: https://www.bitre.gov.au/sites/default/files/2019-12/NewDataSources-BackgroundPaper-April%202014.pdf. (accessed on 1 January 2021).
  • Mining the Datasphere: Big Data, Technologies, and Transportation Disaster Management. Available online: https://sbenrc.com.au/app/uploads/2021/03/SBEnrc-Project-1.45-Milestone-1-Congestion-v2.pdf (accessed on 1 January 2021).
  • Blazquez, D.; Domenech, J. Big Data sources and methods for social and economic analyses. Technol. Forecast. Soc. Chang. 2018 , 130 , 99–113. [ Google Scholar ] [ CrossRef ]
  • Nobanee, H.; Dilshad, M.N.; Al Dhanhani, M.; Al Neyadi, M.; Al Qubaisi, S.; Al Shamsi, S. Big Data Applications the Banking Sector: A Bibliometric Analysis Approach. SAGE Open 2021 , 11 , 4. [ Google Scholar ] [ CrossRef ]
  • Gasser, L.Z.U.; Gassmann, O.; Hens, T.; Leifer, L.; Puschmann, T. Digital Banking 2025. 2017. Available online: http://www.dv.co.th/blog-th/digital-banking-trend/ (accessed on 1 January 2021).
  • Bhasin, M.L. Combatting Bank Frauds by Integration of Technology: Experience of a Developing Country. Br. J. Res. 2016 , 3 , 64–92. [ Google Scholar ]
  • Vives, X. Digital Disruption in Banking and Its Impact on Competition ; OECD: Paris, France, 2020; pp. 1–50. [ Google Scholar ]
  • Alexandru, A.G.; Radu, I.M.; Bizon, M.-L. Big Data in Healthcare—Opportunities and Challenges. Inform. Econ. 2018 , 22 , 43–54. [ Google Scholar ] [ CrossRef ]
  • Bourdeaux, M. Reimagining the Role of Technology in Education. Relig. Communist Lands 2017 , 9 , 2–3. [ Google Scholar ] [ CrossRef ]
  • Modeling the Big Data Challenges in Context of Smart Cities—An Integrated Fuzzy ISM-DEMATEL Approach. Available online: https://doi.org/10.1108/IJBPA-02-2021-0027 (accessed on 1 January 2021).
  • World Bank. Data Big in Action for Government ; World Bank: Washington, DC, USA, 2017; p. 18. [ Google Scholar ]
  • Use of Technology in the Ebola Response in West Africa. Available online: https://pdf.usaid.gov/pdf_docs/PA00K99H.pdf (accessed on 1 January 2021).
  • Oussous, A.; Benjelloun, F.Z.; Ait Lahcen, A.; Belfkih, S. Big Data technologies: A survey. J. King Saud Univ.—Comput. Inf. Sci. 2018 , 30 , 431–448. [ Google Scholar ] [ CrossRef ]
  • ENOVA Annual Report 2013. Available online: https://www.enova.no/upload_images/5649E609BFEA4B2A890777583DD1654D.pdf (accessed on 1 January 2021).
  • Data Analytics Tools in Higher Education. Available online: https://ceur-ws.org/Vol-3061/ERIS_2021-art10(sh).pdf (accessed on 1 January 2021).
  • Learning Analytics in Higher Education: A Review of UK and International Practice Full Report. Available online: https://www.jisc.ac.uk/sites/default/files/learning-analytics-in-he-v2_0.pdf%0Ahttps://www.jisc.ac.uk/reports/learning-analytics-in-higher-education (accessed on 1 January 2021).
  • Alsrehin, N.O.; Klaib, A.F.; Magableh, A. Intelligent Transportation and Control Systems Using Data Mining and Machine Learning Techniques: A Comprehensive Study. IEEE Access 2019 , 7 , 49830–49857. [ Google Scholar ] [ CrossRef ]
  • Big Data in Banking for Marketers—How to Derive Value from Big Data. Available online: https://dataanalytics.report/Resources/Whitepapers/be309286-97b2-497d-9c6d-af9de5f620e1_bank-2020---big-data---whitepaper.pdf (accessed on 1 January 2021).
  • Corsi, A.; de Souza, F.F.; Pagani, R.N.; Kovaleski, J.L. Big data analytics as a tool for fighting pandemics: A systematic review of literature. J. Ambient Intell. Humaniz. Comput. 2021 , 12 , 9163–9180. [ Google Scholar ] [ CrossRef ]
  • Gigauri, I. Effects of COVID-19 on Human Resource Management from the Perspective of Digitalization and Work-life-balance. Int. J. Innov. Technol. Econ. 2020 , 31 , 1–8. [ Google Scholar ] [ CrossRef ]
  • OCED. The Territorial Impact of COVID-19: Managing The Crisis across Levels of Government. 2020. Available online: https://www.oecd.org/coronavirus/policy-responses/the-territorial-impact-of-covid-19-managing-the-crisis-across-levels-of-government-d3e314e1/ (accessed on 1 January 2021).
  • Big Data, Big Outcomes: How Analytics Can Transform Public Services and Improve Citizens’ Lives. Available online: https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/future-of-government/ey-future-of-gov-digital-analytics-report.pdf?download#:~:text=By%20combining%20data%20from%20a,%2C%20well%2Dbeing%20and%20safety (accessed on 1 January 2021).
  • E-Government Survey 2020. Available online: https://publicadministration.un.org/egovkb/en-us/Reports/UN-E-Government-Survey-2020 (accessed on 1 January 2021).
  • Mihalis, K. Ten technologies to fight coronavirus. Eur. Parliam. Res. Serv. 2020 , 1–20. [ Google Scholar ]
  • Central Bank of the Russian Federation. Using Big Data in The Financial Sector and Risk to Financial Stability ; Central Bank of the Russian Federation: Moscow, Russia, 2021. [ Google Scholar ]
  • Asli Demirgüç-Kunt, J.H.; Klapper, L.; Singer, D.; Ansar, S. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution ; World Bank: Washington, DC, USA, 2017; ISBN 9781464812590. [ Google Scholar ]
  • OECD. Artificial Intelligence, Machine Learning and Big Data in Finance: Opportunities, Challenges, and Implications for Policy Makers. 2021. Available online: https://www.oecd.org/finance/financial-markets/Artificial-intelligence-machine-learning-big-data-in-finance.pdf (accessed on 1 January 2021).

Click here to enlarge figure

FieldData Analytics TypeHow BDA Has Been UtilizedData Processing Models Used to Analyse Big DataReference
HealthcareDescriptive and Predictive Data Analytics Proactive actions and interventions based on predictive models to trigger any noncommunicable diseases.Predictive models based on search engines and social media data.
Smart phone applications tracking system to identify infection hot spots
[ , ]
Perspective Data AnalyticsVaccine distributionSentiment analysis to reduce community resistance towards the vaccine.[ , , ]
Diagnostic and Predictive Data Analytics Vaccine distributionMachine learning models to prioritize the citizens’ need and urgency to the vaccine
Diagnostic and Prescriptive Data Analytics Monitoring live and frequent data on the spread of the disease
Provide more personalized consultations by “virtual doctors”
Dashboards
AI Chabot
[ , ]
EducationDescriptive Data Analytics Enhance online educational platform experience Analyzing data captured from online educational platforms can ease educators remote leaning experience[ , ]
Diagnostic Data Analytics Bridge the gap of unemploymentAnalysis of data captured from job portals[ , ]
TransportationDescriptive and Prescriptive Data Analytics implementation of precautionary measures-Ensure social distancing in public transportationCapturing relevant data and use machine learning techniques to detect incompliance actions[ ]
Detect citizens’ commute route to store their travel history. Use both AI and Big data applications to capture, track and predict valuable insights about citizens movement within and across cities and countries [ ]
Banking Fraud Detection Use AI and ML techniques to describe and detect real-time abnormal activities and online transaction, and build ML models based on classification algorithims to predict any suspecious case.[ ]
Descriptive and Predictive Data AnalyticsRisk AssessmentUse both diagnositic and prescriptuve data analytics models to analyze real-time data and asses the creditworthiness to customers. Consequenlty developing the appropriate cutomer portfolio and tailor clients needs to their services. Cossequently boosting customers’ satisfaction, loayality and enhance banks botom line records. [ ]
FieldOpportunitiesDescriptionReference
HealthcareServe efficiently considering both value and costs to individual casesBDA have powerful ability to highlight the correlation and patterns between different variables rather than finding the casual inference between them and serve individual patients’ cases.[ , , , , ]
EducationImprove the learning process
Provide real time feedback and construct development plans
Construct a more personalized learning environment
Enrich the learning environment
Utilize BDA in marketing research purposes for institutions
BDA enables educational institutes and professionals to personalize the educational experience for students[ , , , , , , , ]
TransportationThe base for researchers, economists and regulators to analyze traffic flow, congestion and their social, economic and environmental impacts.
Apply a combination of new methods of analysis such as AI approaches, to pave the way for predicting and providing innovative solutions for the future in the field of transportation.
BDA predictive capabilities and the incorporation of economic insights can exceed the ability to understand and analyze the past and real time data, to predict the optimal legislations for traffic congestion issues in smart cities.[ , , , , , ]
BanksDetect fraud cases
Ease the merge and acquisition operations Optimize banking supply chain performance
Interpret clients’ behaviors.
Provide valued and satisfactory services to clients.
Analyze, predict, and visualize both external market conditions and internal clients’ trends and preferences
Increase market share and enhance profitability.
BDA supported the introduction of digital banking operations and virtual banking systems[ , , ]
Field in ChargeApplication NameDescriptionReference
HealthEbola Open Data InitiativeWest Africa-data has been utilized to develop an open-source global model for tracking the cases of Ebola cases in in 2014[ , , ]
HealthMapa platform used to visualize diseases trends and provides an early trigger on the proper response[ , ]
Proactive listening, mobile phone-based systemBrazil-to govern the issue of bribes in the health services, and handle any related issues and take an immediate and effective action against corruption.[ ]
EducationENOVAMexico, through the utilization of data and data analytics can analyze and predict students’ interactions. Consequently, boosts the educational strategies and enhances the used tools and techniques in the teaching-learning process.[ , ]
(PASS) Personalized Adaptive Study SuccessThe Open University Australia-Predicts course material, beside a more personalized studying environment. The predictive data analytics model is based on analyzing students, individual characteristics, beside other student related data captured from other systems.
The main goal of the application is to develop a more customized environment that ensures students involvement, engagement, and retention in an e-learning environment.
[ ]
TransportationOpenTraffic platformAn application to support in urban infrastructure decisions, based on data captured from both vehicles and smartphones, to analyze it and visualize it into both historic and real-time traffic situations.[ , ]
Seoul, South Africa-the application is used to support night bus drivers to ease their journey from origin to destination. This will occur through capturing data from tremendous number of calls and text data points, as well as private and corporate taxi data sources.[ ]
BanksAvaloq, Finnova, SAP, Sungard and TemenosOCBC is the largest bank in terms of market capitalization in Singapore. It operates in more than 15 countries globally. It is a success example of the utilization of BDA. For instance, the bank responded to customer actions, customers’ personalized events and their demographic profiles. Hence, OCBC Bank succeed in achieving higher customer engagement and increasing the level of customer satisfaction by 20% in comparison to a control group.
These core banking applications, such as Avaloq, Finnova, SAP, Sungard or Temenos for example, were designed to handle large amounts of transactions in back-office processes for basic financial products and services, such as bank accounts, deposits, etc.
[ , ]
FieldData Analytics TypeHow BDA Has Been UtilizedMethod/ModelReference
HealthcareDescriptive and Predictive Data Analytics ModelsProactive actions and interventions based on predictive models to trigger any noncommunicable diseases.Predictive models based on search engines and social media data.
Smart phone applications tracking system to identify infection hot spots
[ , ]
Perspective Data AnalyticsVaccine distributionSentiment analysis to reduce community resistance towards the vaccine.[ , ]
Diagnostic and Predictive Data Analytics ModelsVaccine distributionMachine learning models to prioritize the citizens’ need and urgency to the vaccine
Diagnostic and Prescriptive Data Analytics ModelsMonitoring live and frequent data on the spread of the disease
Provide more personalized consultations by “virtual doctors”
Dashboards
AI Chabot
[ , ]
EducationDescriptive Data Analytics ModelEnhance online educational platform experience Analyzing data captured from online educational platforms can ease educators remote leaning experience[ , ]
Diagnostic Data Analytics ModelBridge the gap of unemploymentAnalysis of data captured from job portals[ , ]
TransportationDescriptive and Prescriptive Data Analytics Modelsimplementation of precautionary measures-Ensure social distancing in public transportationCapturing relevant data and use machine learning techniques to detect incompliance actions[ ]
Descriptive and Predictive Data Analytics ModelsDetect citizens’ commute route to store their travel history. Use both AI and Big data applications to capture, track and predict valuable insights about citizens movement within and across cities and countries [ ]
Banking Fraud Detection Use AI and ML techniques to describe and detect real-time abnormal activities and online transaction, and build ML models based on classification algorithims to predict any suspecious case.[ ]
Risk AssessmentUse both diagnositic and prescriptuve data analytics models to analyze real-time data and asses the creditworthiness to customers. Consequenlty developing the appropriate cutomer portfolio and tailor clients needs to their services. Cossequently boosting customers’ satisfaction, loayality and enhance banks botom line records.[ ]
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Al-Sai, Z.A.; Husin, M.H.; Syed-Mohamad, S.M.; Abdin, R.M.S.; Damer, N.; Abualigah, L.; Gandomi, A.H. Explore Big Data Analytics Applications and Opportunities: A Review. Big Data Cogn. Comput. 2022 , 6 , 157. https://doi.org/10.3390/bdcc6040157

Al-Sai ZA, Husin MH, Syed-Mohamad SM, Abdin RMS, Damer N, Abualigah L, Gandomi AH. Explore Big Data Analytics Applications and Opportunities: A Review. Big Data and Cognitive Computing . 2022; 6(4):157. https://doi.org/10.3390/bdcc6040157

Al-Sai, Zaher Ali, Mohd Heikal Husin, Sharifah Mashita Syed-Mohamad, Rasha Moh’d Sadeq Abdin, Nour Damer, Laith Abualigah, and Amir H. Gandomi. 2022. "Explore Big Data Analytics Applications and Opportunities: A Review" Big Data and Cognitive Computing 6, no. 4: 157. https://doi.org/10.3390/bdcc6040157

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • DOI: 10.51594/csitrj.v4i3.1500
  • Corpus ID: 272315039

Real-Time Cybersecurity threat detection using machine learning and big data analytics: A comprehensive approach

  • Kingsley David Onyewuchi Ofoegbu , Olajide Soji Osundare , +2 authors Adebimpe Bolatito Ige
  • Published in Computer Science & IT… 30 December 2023
  • Computer Science, Engineering
  • Computer Science & IT Research Journal

Related Papers

Showing 1 through 3 of 0 Related Papers

Research on Data Science, Data Analytics and Big Data

INTERNATIONAL JOURNAL OF ENGINEERING, SCIENCE AND - Volume 9, Issue 5, May 2020 Pages: 99-105.

7 Pages Posted: 10 Jun 2020

Rahul Reddy Nadikattu

University of the Cumberlands; University of the Cumberlands (formerly Cumberland College) - Department of Information Technology

Date Written: April 17, 2020

Big Data refers to a huge volume of data of various types, i.e., structured, semi structured, and unstructured. This data is generated through various digital channels such as mobile, Internet, social media, e-commerce websites, etc. Big Data has proven to be of great use since its inception, as companies started realizing its importance for various business purposes. Now that the companies have started deciphering this data, they have witnessed exponential growth over the years.Impact on various sectors like Retail, Banking and investment, Fraud detection and analyzing, Customer-centric applications and Operational analysis. Data Science deals with the slicing and dicing of the big chunks of data, as well as finding insightful patterns and trends from them using technology, mathematics, and statistical techniques. Data Scientists are responsible for uncovering the facts hidden in the complex web of unstructured data so as to be used in making business decisions. Data Scientists perform the aforementioned job by developing heuristic algorithms and models that can be used in the future for significant purposes. This amalgamation of technology and concepts makes Data Science a potential field for lucrative career opportunities. McKinsey once predicted that there will be an acute shortage of Data Science Professionals in the next decade. Impact on various sectors like Web development, Digital advertisements, E-commerce, Internet search, Finance, Telecom, Utilities. Data Analytics seeks to provide operational insights into complex business situations. The concept of big data has been around for years; most organizations now understand that if they capture all the data that streams into their businesses, they can apply analytics and get significant value from it. But even in the 1950s, decades before anyone uttered the term big data, Businesses were using basic analytics (essentially numbers in a spreadsheet that were manually examined) to uncover insights and trends. he new benefits that big data analytics brings to the table, however, are speed and efficiency. Whereas a few years ago a business would have gathered information, run analytics and unearthed information that could be used for future decisions, today that business can identify insights for immediate decisions. The ability to work faster – and stay agile – gives organizations a competitive edge they didn't have before. Looking into the historical data from a modern perspective, finding new and challenging business scenarios and applying methodologies to find a better solution are the prime concerns of a Data Analyst. Not only this, but a Data Analyst also predicts the upcoming opportunities which the company can exploit. Data Analytics has shown such a tremendous growth across the globe that soon the Big Data market revenue is expected grow by 50 percent.Impact on various sectors like Traveling and transportation, Financial analysis, Retail, Research, Energy management, Healthcare.

Keywords: Data Science, Data Analytics, Big Data

Suggested Citation: Suggested Citation

Rahul Reddy Nadikattu (Contact Author)

University of the cumberlands (formerly cumberland college) - department of information technology ( email ).

United States

University of the Cumberlands ( email )

6178 College Station Drive Williamsburg, KY 40769 United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, information systems & economics ejournal.

Subscribe to this fee journal for more curated articles on this topic

Information Systems eJournal

Software engineering ejournal, computation theory ejournal, computer science education ejournal, electrical engineering ejournal, computer science negative results ejournal, industrial & manufacturing engineering ejournal, data science, data analytics & informatics ejournal, mechanical engineering ejournal, materials engineering ejournal, psychology research methods ejournal, computational & mathematical sociology ejournal, materials performance ejournal, electronic, optical & magnetic materials ejournal, computational materials science ejournal.

Advances, Systems and Applications

  • Open access
  • Published: 06 August 2022

Big data analytics in Cloud computing: an overview

  • Blend Berisha 1 ,
  • Endrit Mëziu 1 &
  • Isak Shabani 1  

Journal of Cloud Computing volume  11 , Article number:  24 ( 2022 ) Cite this article

39k Accesses

46 Citations

10 Altmetric

Metrics details

Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable to deal with them. Besides being big, this data moves fast and has a lot of variety. Big Data is a concept that deals with storing, processing and analyzing large amounts of data. Cloud computing on the other hand is about offering the infrastructure to enable such processes in a cost-effective and efficient manner. Many sectors, including among others businesses (small or large), healthcare, education, etc. are trying to leverage the power of Big Data. In healthcare, for example, Big Data is being used to reduce costs of treatment, predict outbreaks of pandemics, prevent diseases etc. This paper, presents an overview of Big Data Analytics as a crucial process in many fields and sectors. We start by a brief introduction to the concept of Big Data, the amount of data that is generated on a daily bases, features and characteristics of Big Data. We then delve into Big Data Analytics were we discuss issues such as analytics cycle, analytics benefits and the movement from ETL to ELT paradigm as a result of Big Data analytics in Cloud. As a case study we analyze Google’s BigQuery which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. As a Platform as a Service (PaaS) supports querying using ANSI SQL. We use the tool to perform different experiments such as average read, average compute, average write, on different sizes of datasets.

Introduction

We live in the data age. We see them everywhere and this is due to the great technological developments that have taken place in recent years. The rate of digitalization has increased significantly and now we are rightly talking about” digital information societies”. If 20 or 30 years ago only 1% of the information produced was digital, now over 94% of this information is digital and it comes from various sources such as our mobile phones, servers, sensor devices on the Internet of Things, social networks, etc. [ 1 ]. The year 2002 is considered the” beginning of the digital age” where an explosion of digitally produced equipment and information was seen.

The number and amount of information collected has increased significantly due to the increase of devices that collect this information such as mobile devices, cheap and numerous sensor devices on the Internet of Things (IoT), remote sensing, software logs, cameras, microphones, RFID readers, wireless sensor networks, etc. [ 2 ]. According to statistics, the amount of data generated / day is about 44 zettabytes (44 × 10 21 bytes). Every second, 1.7 MB of data is generated per person [ 3 ]. Based on International Data Group forecasts, the global amount of data will increase exponentially from 2020 to 2025, with a move from 44 to 163 zettabytes [ 4 ]. Figure  1 shows the amount of global data generated, copied and consumed. As can be seen, in the years 2010–2015, the rate of increase from year to year has been smaller, while since 2018, this rate has increased significantly thus making the trend exponential in nature [ 3 ].

figure 1

Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2024 (estimated) [ 3 ]

To get a glimpse of the amount of data that is generated on a daily basis, let’s see a portion of data that different platforms produce. On the Internet, there is so much information at our fingertips. We add to the stockpile everytime we look for answers from our search engines. As a results Google now produces more than 500,000 searches every second (approximately 3.5 billion search per day) [ 5 ]. By the time of writing this article, this number must have changed! Social media on the other hand is a massive data producer. 

People’s ‘love affair’ with social media certainly fuels data creation. Every minute, Snapchat users share 527,760 photos, more than 120 professionals join LinkedIn, users watch 4,146,6000 Youtube videos, 456,000 are sent to Twitter and Instagram users post 46,740 photos [ 5 ]. Facebook remains the largest social media platform, with over 300 million photos uploaded every day with more than 510,000 comments posted and 293,000 statuses updated every minute.

With the increase in the number and quantity of data, there have been advantages but also challenges as systems for managing relational databases and other traditional systems have difficulties in processing and analyzing this quantity. For this reason, the term ‘big data’ arose not only to describe the amount of data but also the need for new technologies and ways of processing and analyzing this data. Cloud Computing has facilitated data storage, processing and analysis. Using Cloud we have access to almost limitless storage and computer power offered by different vendors. Cloud delivery models such as: IAAS (Infrastructure as a Service), PAAS (Platform as a Service) can help organisations across different sectors handle Big Data easier and faster. The aim of this paper is to provide an overview of how analytics of Big Data in Cloud Computing can be done. For this we use Google’s platform BigQuery which is a serverless data warehouse with built-in machine learning capabilities. It’s very robust and has plenty of features to help with the analytics of different size and type of data.

What is big data?

Many authors and organizations have tried to provide a definition of ‘Big Data’. According to [ 6 ] “Big Data refers to data volumes in the range of exabytes and beyond”. In Wikipedia [ 7 ] big data is defined as an accumulation of datasets so huge and complex that it becomes hard to process using database management tools or traditional data processing applications, while the challenges include capture, storage, search, sharing, transfer, analysis, and visualization.

Sam Madden from Massachusetts Institute of Technology (MIT) considers” Big Data” to be data that is too big, too fast, or too hard for existing tools to process [ 8 ]. By too big, it means data that is at the petabyte level and that comes from various sources. By ‘too fast’ it means data growth which is fast and should also be processed quickly. By too hard it means the difficulty that arises as a result the data not adapting to the existing processing tools [ 9 ]. In PCMag (one of the most popular journals on technological trends), Big data refers to the massive amounts of data that is collected over time that are difficult to analyze and handle using common database management tools [ 10 ]. There are many other definitions for Big Data, but we consider that these are enough to gain an impression on this concept.

Features and characteristics of big data

One question that researchers have struggled to answer is what might qualify as ‘big data’? For this reason, in 2001 industry analyst Doug Laney from Gartner introduced the 3 V model which are three features that must complement the data to be considered” big data”: volume, velocity, variety . Volume is a property or characteristic that determines the size of data, usually reported in Terabyte or Petabyte. For example, social networks like Facebook store among others photos of users. Due to the large number of users, it is estimated that Facebook stores about 250 billion photos and over 2.5 trillion posts of its users. This is an extremely large amount of data that needs to be stored and processed. Volume is the most representative feature of ‘big data’ [ 8 ]. In terms of volume, tera or peta level data is usually considered ‘big’ although this depends on the capacity of those analyzing this data and the tools available to them [ 8 ]. Figure  2 shows what each of the three V's represent.

figure 2

3 V’s of Big Data [ 6 ]

The second property or characteristic is velocity . This refers to the degree to which data is generated or the speed at which this data must be processed and analyzed [ 8 ]. For example, Facebook users upload more than 900 million photos a day, which is approximately 104 uploaded photos per second. In this way, Facebook needs to process, store and retrieve this information to its users in real time. Figure  3 shows some statistics obtained from [ 11 ] which show the speed of data generation from different sources. As can be seen, social media and the Internet of Things (IoT) are the largest data generators, with a growing trend.

figure 3

Examples of the velocity of Big Data [ 9 ]

There are two main types of data processing: batch and stream. In batch, processing happens in blocks of data that have been stored over a period of time. Usually data processed in batch are big, so they will take longer to process. Hadoop MapReduce is considered to be the best framework for processing data in batches [ 11 ]. This approach works well in situations where there is no need for real-time analytics and where it is important to process large volumes of data to get more detailed insights.

Stream processing, on the other hand, is a key to the processing and analysis of data in real time. Stream processing allows for data processing as they arrive. This data is immediately fed into analytics tools so the results are generated instantly. There are many scenarios where such an approach can be useful such as fraud detection, where anomalies that signal fraud are detected in real time. Another use case would be online retailers, where real-time processing would enable them to compile large histories of costumer interactions so that additional purchases could be recommended for the costumers in real time [ 11 ].

The third property is variety , which refers to different types of data which are generated from different sources. “Big Data” is usually classified into three major categories: structured data (transactional data, spreadsheets, relational databases etc.), semi-structured (Extensible Markup Language - XML, web server logs etc) and unstructured (social media posts, audio, images, video etc.). In the literature, as a fourth category is also mentioned ‘meta-data’ which represents data about data. This is also shown in Fig.  4 . Most of the data today belong to the category of unstructured data (80%) [ 11 ].

figure 4

Main categories of data variety in Big Data [ 9 ]

Over time, the tree features of big data have been complemented by two additional ones: veracity and value . Veracity is equivalent to quality, which means data that are clean and accurate and that have something to offer [ 12 ]. The concept is also related to the reliability of data that is extracted (e.g., costumer sentiments in social media are not highly reliable data). Value of the data is related to the social or economic value data can generate. The degree of value data can produce depends also on the knowledge of those that make use of it.

Big data analytics in cloud computing

Cloud Computing is the delivery of computing services such as servers, storage, databases, networking, software, analytics etc., over the Internet (“the cloud”) with the aim of providing flexible resources, faster innovation and economies of scale [ 13 ]. Cloud computing has revolutionized the way computing infrastructure is abstracted and used. Cloud paradigms have been extended to include anything that can be considered as a service (hence x a service). The many benefits of cloud computing such as elasticity, pay-as-you-go or pay-per-use model, low upfront investment etc., have made it a viable and desirable choice for big data storage, management and analytics [ 13 ]. Because big data is now considered vital for many organizations and fields, service providers such as Amazon, Google and Microsoft are offering their own big data systems in a cost-efficient manner. These systems offer scalability for business of all sizes. This had led to the prominence of the term Analytics as a Service (AaaS) as a faster and efficient way to integrate, transform and visualize different types of data. Data Analytics.

Big data analytics cycle

According to [ 14 ] processing big data for analytics differs from processing traditional transactional data. In traditional environments, data is first explored then a model design as well as a database structure is created. Figure  5 . depicts the flow of big data analysis. As can be seen, it starts by gathering data from multiple sources, such as multiple files, systems, sensors and the Web. This data is then stored in the so called” landing zone” which is a medium capable of handling the volume, variety and velocity of data. This is usually a distributed file system. After data is stored, different transformations occur in this data to preserve its efficiency and scalability. Afer that, they are integrated into particular analytical tasks, operational reporting, databases or raw data extracts [ 14 ].

figure 5

Flow in the processing of Big Data [ 11 ]

Moving from ETL to ELT paradigm

ETL (Extract, Transform, Load) is about taking data from a data source, applying the transformations that might be required and then load it into a data warehouse to run reports and queries against them. The downside of this approach or paradigm is that is characterized by a lot of I/O activity, a lot of string processing, variable transformation and a lot of data parsing [ 15 ].

ELT (Extract, Load, Transform) is about taking the most compute-intensive activity (transformation) and doing it not in an on-premise service which is already under pressure with regular transaction-handling but instead taking it to the cloud [ 15 ]. This means that there is no need for data staging because data warehousing solution is used for different types.

of data including those that are structured, semi-structured, unstructured and raw. This approach employs the concept of” data lakes” that are different from OLAP (Online Analytical Processing) data warehouses because they do not require the transformation of data before loading them [ 15 ]. Figure 6 illustrates the differences between the two paradigms. As seen, the main difference is where transformation process takes place.

figure 6

Differences between ETL and ELT [ 15 ]

ELT has many benefits over traditional ETL paradigm. The most crucial, as mentioned, is the fact that data of any format can be ingested as soon as it becomes available. Another one is the fact that only the data required for particular analysis can be transformed. In ETL, the entire pipeline and structure of the data in the OLAP may require modification if the previous structure does not allow for new types of analysis [ 16 ].

Some advantages of big data analytics

As mentioned, companies across various sectors in the industry are leveraging Big Data in order to promote decision making that is data-driven. Besides tech industry, the usage and popularity of Big Data has expanded to include healthcare, governance, retail, supply chain management, education etc. Some of the benefits of Big Data Analytics mentioned in [ 17 ] include:

Data accumulation from different sources including the Internet, online shopping sites, social media, databases, external third-party sources etc.

Identification of crucial points that are hidden within large datasets in order to influence business decisions.

Identification of the issues regarding systems and business processes in real time.

Facilitation of service/product delivery to meet or exceed client expecations.

Responding to customer requests, queries and grievances in real time.

Some other benefits according to [ 16 ] are related to:

Cost optimization - One of the biggest advantages of Big Data tools such as Hadoop or Spark is that they offer cost advantages to businesses regarding the storage, processing and analysis of large amounts of data. Authors mention the logistics industry as an example to highlight the cost-reduction benefits of Big Data. In this industry, the cost of product returns is 1.5 times higher than that of actual shipping costs. With Big Data Analytics, companies can minimize product return costs by predicting the likelihood of product returns. By doing so, they can then estimate which products are most likely to be returned and thus enable the companies to take suitable measures to reduce losses on returns.

Efficiency improvements - Big Data can improve operational efficiency by a margin. Big Data tools can amass large amounts of useful costumer data by interacting and gaining their feedback. This data can then be analyzed and interpreted to extract some meaningful patterns hidden within such as customer taste and preferences, buying behaviors etc. This in turn allows companies to create personalized or tailored products/services.

Innovation - Insights from Big Data can be used to tweak business strategies, develop new products/services, optimize service delivery, improve productivity etc. These can all lead to more innovation.

As seen, Big Data Analytics has been mostly leveraged by businesses, but other sectors have also benefited. For example, in healthcare many states are now utilizing the power of Big Data to predict and also prevent epidemics, cure diseases, cut down costs etc. This data has also been used to establish many efficient treatment models. With Big Data more comprehensive reports were generated and these were then converted into relevant critical insights to provide better care [ 17 ].

In education, Big Data has also been used extensively. They have enabled teachers to measure, monitor and respond in real-time to student’s understanding of the material. Professors have created tailor-made materials for students with different knowledge levels to increase their interest [ 18 ].

Case study: GOOGLE’S big query for data processing and analytics

Google Cloud Platform contains a number of services designed to analyze and process big data. Throughout this paper we have described and discussed the architecture and main components of Biguery as one of the most used big data processing tools in GCP. BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. Since its launch in 2011 it has gained a lot of popularity and many big companies have utilized it for their data analytics [ 19 ].

From a user perspective, BigQuery has an intuitive user interface which can be accessed in a number of ways depending on user needs. The simplest way to interact with this tool is to use its graphical web interface as shown in Fig.  7 . Slightly more complicated but faster approaches include using cloud console or Bigquery APIs. From Fig. 7 Bigquery web interface offers you the options to add or select existing datasets, schedule and construct queries or transfer data and display results.

figure 7

BigQuery Interface

Data processing and query construction occurs under the sql workspace section, Bigquery offers a rich sql-like syntax to compute and process large sets of data, it operates on relational datasets with well-defined structure including tables with specified columns and types. Figure  8 shows a simple query construction syntax and highlights its execution details. Data displayed under query results shows main performance components of the executed query starting from elapsed time, consumed slot time, size of data processed, average and maximum wait, write and compute times. Query defined in Fig.  8 combines three datasets which contain information regarding Covid-19 reported cases, deaths and recoveries from more than 190 countries through year 2020 till January 2021. Google BigQuery is flexible in a way that allows you to use and combine various datasets suitable for your task easily and with small delays. It contains an ever growing list of public datasets at your disposal and also offers the options to create, edit and import your own. Figure  9 shows the process of adding a table to the newly created dataset. From the Fig.  9 , we see that for table creation as a source we have used a local csv file, this file will be used to create table schema and populate it with data, aside from local upload option as a source to create the table we can use Google BigTable, Google Cloud Storage or Google Drive. The newly created table with its respective data then is ready to be used to construct queries and obtain new insights as shown in Fig. 8 .

figure 8

BigQuery execution details

figure 9

Adding table to the created dataset

One advantage of using imported data in the cloud is the option to manage its access and visibility in the cloud project and cloud members scope. Depending from the way of use, queried data can be saved directly to the local computer through the use of “save results” option from Fig. 8 which offers a variety of formats and data extensions settings to choose from but can also be explored in different configurations using “explore data” option. You can also save constructed queries for later use or schedule query execution interval for more accurate data transmutation through API endpoints. Figure 10 shows how much the average compute time will change/increase with the increase in the size of the dataset used.

figure 10

Average compute time dependence in dataset size

Experiments with different dataset sizes

Before moving to data exploration lets analyze performance results of BigQuery in simple queries with variable dataset sizes. In Table  1 we have shown the query execution details of five simple select queries done on five different datasets. The results are displayed against six different performance categories, from the data we see a correlation between size of the dataset and its average read, write and compute.

From the graph we see that the dependence between dataset size and average compute size is exponential, meaning that with the increase in data size, average compute time is exponentially increased.

Data returned from constructed queries aside from being displayed in a simple tabular form or as a JSON object can also be transferred to data studio which is an integrated tool to better display and visualize gathered information. One way of displaying queried data from Fig. 8 with data studio tool is shown in Fig.  11 . In this case a bar table chart visualization option is chosen.

figure 11

Using data studio for data visualization

Big Data is not a new term but has gained its spotlight due to the huge amounts of data that are produced daily from different sources. From our analysis we saw that big data is increasing in a fast pace, leading to benefits but also challenges. Cloud Computing is considered to be the best solution for storing, processing and analyzing Big Data. Companies like Amazon, Google and Microsoft offer their public services to facilitate the process of dealing with Big Data. From the analysis we saw that there are multiple benefits that Big Data analytics provides for many different fields and sectors such as healthcare, education and business. We also saw that because of the interaction of Big Data with Cloud Computing there is a shift in the way data is processed and analyzed. In traditional settings, ETL is used whereas in Big Data, ELT is used. We saw that the latter has clear advantages when compared to the former.

From our case study we saw that BigQuery is very good for running complex analytical queries, which means there is no point in running queries that are doing simple aggregation or filtering. BigQuery is suitable for heavy queries, those that operate using a big set of data. The bigger the dataset, the more it is likely to gain in performance. This is when compared to the traditional relational databases,as BigQuery implements different parallel schemas to speed up the execution time.

BigQuery doesn’t like joins and merging data into one table gets a better execution time. It is good for scenarios where data does not change often as it has built-in cache. BigQuery can also be used when one wants to reduce the load on the relational database as it offers different options and configurations to improve query performance. Also pay as you go service can be used where charges are made based on usage or flat rate service which offers a specific slot rate and charges in daily, monthly or yearly plan.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request. The authors declare that they have no funder.

Hillbert M, Lopez P (2011) The world’s technological capacity to store, communicate and compute information. Science III:62–65

Google Scholar  

J. Hellerstein,“ Gigaom Blog,”2019. Available: https://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallelprogramming/ . Accessed 20 Jan 2021

Statista,“Statista,“2020. Available: https://www.statista.com/statistics/871513/worldwide-data-created/ . Accessed 21 Jan 2021

Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to-life critical. International Data Corporation, Framingham

Forbes, “Forbes”, 2020. Available: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-muchdata-do-we-create-every-day-the-mind-blowing-stats-everyone-shouldread/?sh=5936b00460ba

Kaisler S, Armour F, Espinosa J (2013) Big data: issues and challenges moving forward, Wailea, Maui, HI, s.n, pp 995–1004

Wikipedia,“ Wikipedia,” 2018. Available: https://www.en.wikipedia.org/wiki/Bigdata/ . Accessed 4 Jan 2021

D. Gewirtz,“ ZDNet,” 2018. Available: https://www.zdnet.com/article/volume-velocity-and-varietyunderstanding-the-three-vs-of-big-data/ . Accessed 1 Jan 2021

Weathington J (2012) Big Data Defined. Tech Republic.  https://www.techrepublic.com/article/big-data-defined/

PCMagazine,“ PC Magazine,” 2018. Available: http://www.pcmag.com/encyclopedia/term/62849/big-data . Accessed 9 Jan 2021

Akhtar SMF (2018) Big Data Architect’s Handbook, Packt

WhishWorks, “WhishWorks”, 2019. Available: https://www.whishworks.com/blog/data-analytics/understanding-the3-vs-of-big-data-volume-velocity-and-variety/ . Accessed 23 Jan 2021

Yadav S, Sohal A (2017) Review paper on big data analytics in Cloud computing. Int J Comp Trends Technol (IJCTT) IX. 49(3);156-160

Kimball R, Ross M (2013) The data warehouse toolkit: the definitive guide to dimensional modeling, 3rd edn. John Wiley & Sons

LaprinthX, “LaprinthX,”2018. Available: https://laptrinhx.com/better-faster-smarter-elt-vs-etl-2084402419/ . Accessed 22 Jan 2021

Xplenty, “XPlenty, ”, 2019. Available: https://www.xplenty.com/blog/etl-vs-elt/# . Accessed 20 Jan 2021

Forbes,“Forbes,”,2018. Available: https://www.forbes.com/sites/forbestechcouncil/2019/11/06/fivebenefits-of-big-data-analytics-and-how-companies-can-getstarted/?sh=7e1b901417e4 . Accessed 13 Jan 202

EDHEC, “EDHEC, ”, 2019. Available: https://master.edhec.edu/news/three-ways-educators-are-using-bigdata-analytics-improve-learning-process# . Accessed 6 Jan 2021

Google Cloud, “BigQuery, ”, 2020. Available: https://cloud.google.com/bigquery . Accessed 5 Jan 2021

Download references

Acknowledgements

The authors would like to thank the colleageous and professors from the University of Prishtina for their insightful comments and suggestions that helped in improving the quality of the paper.

The authors declare that they have no funder.

Author information

Authors and affiliations.

Faculty of Electrical and Computer Engineering, Department of Computer Engineering, University of Prishtina, 10000, Prishtina, Kosovo

Blend Berisha, Endrit Mëziu & Isak Shabani

You can also search for this author in PubMed   Google Scholar

Contributions

Blend Berisha wrote the Introduction, Features and characteristics of Big Data and Conclusions. Endrit Meziu wrote Big Data¨ Analytics in Cloud Computing and part of the case study. Isak Shabani has contributed in the methodology, resources and in supervising the work process. All authors prepared the figures and also reviewed the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Isak Shabani .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Berisha, B., Mëziu, E. & Shabani, I. Big data analytics in Cloud computing: an overview. J Cloud Comp 11 , 24 (2022). https://doi.org/10.1186/s13677-022-00301-w

Download citation

Received : 08 April 2022

Accepted : 24 July 2022

Published : 06 August 2022

DOI : https://doi.org/10.1186/s13677-022-00301-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cloud computing

big data related research papers

COMMENTS

  1. Big Data Research

    The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic.. The journal will accept papers on foundational aspects in … View full aims & scope $2760

  2. 15 years of Big Data: a systematic literature review

    Big Data is still gaining attention as a fundamental building block of the Artificial Intelligence and Machine Learning world. Therefore, a lot of effort has been pushed into Big Data research in the last 15 years. The objective of this Systematic Literature Review is to summarize the current state of the art of the previous 15 years of research about Big Data by providing answers to a set of ...

  3. Home page

    The Journal of Big Data publishes open-access original research on data science and data analytics. Deep learning algorithms and all applications of big data are welcomed. Survey papers and case studies are also considered. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big ...

  4. Articles

    In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an archit... Abdul Rasheed Mahesar, Xiaoping Li and Dileep Kumar Sajnani. Journal of Big Data 2024 11:123. Research Published on: 4 September 2024.

  5. Frontiers in Big Data

    This innovative journal focuses on the power of big data - its role in machine learning, AI, and data mining, and its practical application from cybersecurity to climate science and public health. ... Submit your research. Start your submission and get more impact for your research by publishing with us. Author guidelines.

  6. Major Research Topics in Big Data: A Literature Analysis from 2013 to

    Big data is a popular phenomenon among practitioners as well as scholars. Due to its multidisciplinary background, big data research literature includes a wide spectrum of scientific publications in various research areas. With the aim of identification of research trends in big data literature, an empirical analysis based on probabilistic topic models was performed on peer reviewed articles ...

  7. Big Data Analytics: A Literature Review Paper

    Abstract. In the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques. Due to the rapid growth of such data, solutions need to be studied and ...

  8. Publications

    Publications. IEEE Talks Big Data - Check out our new Q&A article series with big Data experts!. Call for Papers - Check out the many opportunities to submit your own paper. This is a great way to get published, and to share your research in a leading IEEE magazine! Publications - See the list of various IEEE publications related to big data and analytics here.

  9. A comprehensive and systematic literature review on the big data

    The Internet of Things (IoT) is a communication paradigm and a collection of heterogeneous interconnected devices. It produces large-scale distributed, and diverse data called big data. Big Data Management (BDM) in IoT is used for knowledge discovery and intelligent decision-making and is one of the most significant research challenges today. There are several mechanisms and technologies for ...

  10. Explore Big Data Analytics Applications and Opportunities: A Review

    Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic ...

  11. Beyond the hype: Big data concepts, methods, and analytics

    A particular distinguishing feature of this paper is its focus on analytics related to unstructured data, which constitute 95% of big data. This paper highlights the need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats.

  12. Big Data Analytics and Knowledge Discovery

    This editorial accompanies the best papers from the 23rd International Conference on Big Data Analytics and Knowledge Discovery (DAWAK 2021), which was held virtually in August 2021 (originally planned to be held in Vienna, Austria). The series of DAWAK conferences has been running incessantly since 1999, with regular and short research papers ...

  13. Big Data Analytics: A Literature Review Paper

    related to big data, in order to serve the pu rpose of our research. The publication years The publication years range from 2008-2013, with most of the literature f ocusing on big data ranging from

  14. Exploring research trends in big data

    Topic analyses of 1308 big data-related papers covered by PubMed showed that themes in biomedical literature are aligned with the general characteristics of big data [10]. Mapping topics and geography of 406 ... Analyses of 1,415 big data research papers showed that around half of them have been published in areas other than computer science ...

  15. Privacy Prevention of Big Data Applications: A Systematic Literature

    This paper focuses on privacy and security concerns in Big Data. This paper also covers the encryption techniques by taking existing methods such as differential privacy, k-anonymity, T-closeness, and L-diversity.Several privacy-preserving techniques have been created to safeguard privacy at various phases of a large data life cycle.

  16. Big Data and Big Data Analytics: Concepts, Types and Technologies

    The term "Big Data" refers to the evolution and use of. technologies that provide the right user at the right time with. the right information from a mass of data that has been. growing ...

  17. Big data analytics: a survey

    Expected trend of the marketing of big data between 2012 and 2018. Note that yellow, red, and blue of different colored box represent the order of appearance of reference in this paper for particular year. Full size image. The report of IDC [9] indicates that the marketing of big data is about $16.1 billion in 2014.

  18. (PDF) An Overview of Big Data Concepts, Methods, and Analytics

    At the same time, technologies related to big data are also developing. The rapid growth of cloud computing and the Internet of Things (IoT) is accelerating the dramatic growth of data generation.

  19. Big Data Applications the Banking Sector: A Bibliometric Analysis

    The articles were selected from 2012 to 2020 and sorted by the citation rate in results and analysis. We have discovered 60 papers related to big data in banking, although the applications of big data in the banking sector are growing rapidly, the number of research output in this field is limited.

  20. Business analytics and big data research in information systems

    The "Business Analytics and Big Data" track as a melting pot for topics in information systems (IS) and neighbouring disciplines has a long and successful history at the European Conference on Information Systems (ECIS). From its initial year in 2012 to 2021, the track has received 512 submissions.

  21. Real-Time Cybersecurity threat detection using machine learning and big

    The findings suggest that integrating ML and big data analytics in real-time threat detection systems significantly improves cybersecurity defenses, providing organizations with the tools to proactively counteract cyber threats. The rapid digitization of industries and the proliferation of connected devices have exponentially increased the surface area for cyber threats, making traditional ...

  22. A new theoretical understanding of big data analytics capabilities in

    Of the 70 papers satisfying our selection criteria, publication year and type (journal or conference paper) reveal an increasing trend in big data analytics over the last 6 years (Table 6). Additionally, journals produced more BDA papers than Conference proceedings (Fig. 2 ), which may be affected during 2020-2021 because of COVID, and fewer ...

  23. Research on Data Science, Data Analytics and Big Data

    Abstract. Big Data refers to a huge volume of data of various types, i.e., structured, semi structured, and unstructured. This data is generated through various digital channels such as mobile, Internet, social media, e-commerce websites, etc. Big Data has proven to be of great use since its inception, as companies started realizing its importance for various business purposes.

  24. Big data analytics in Cloud computing: an overview

    Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable to deal with them. Besides being big, this data moves fast and has a lot of variety. Big Data is a concept that deals with storing, processing and ...