Subscribe to the PwC Newsletter

Join the community, trending research, llama-omni: seamless speech interaction with large language models.

machine learning research paper

We build our model based on the latest Llama-3. 1-8B-Instruct model.

PaperQA: Retrieval-Augmented Generative Agent for Scientific Research

whitead/paper-qa • 8 Dec 2023

We present PaperQA, a RAG agent for answering questions over the scientific literature.

machine learning research paper

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

We also introduce the VoiceAssistant-400K dataset to fine-tune models optimized for speech output.

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through optimized context.

GeoCalib: Learning Single-image Calibration with Geometric Optimization

This single-image calibration can benefit various downstream applications like image editing and 3D mapping.

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.

machine learning research paper

SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning

lamm-mit/SciAgentsDiscovery • 9 Sep 2024

A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data.

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

NoviScl/AI-Researcher • 6 Sep 2024

Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas.

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations.

iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models

AuvaLab/itext2kg • 5 Sep 2024

Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.

The Journal of Machine Learning Research

Latest Issue

Volume 24, Issue 1

January 2023

Carnegie Mellon University

University of Illinois Urbana-Champaign

  • Neural networks
  • reinforcement learning
  • deep learning
  • machine learning
  • graph neural networks
  • causal inference

ACM Digital Library Logo

Approximation bounds for hierarchical clustering: average linkage, bisecting k-means, and local search

Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and ...

  • View online with eReader

The Brier score under administrative censoring: problems and a solution

The Brier score is commonly used for evaluating probability predictions. In survival analysis, with right-censored observations of the event times, this score can be weighted by the inverse probability of censoring (IPCW) to retain its original ...

Bayesian spiked Laplacian graphs

In network analysis, it is common to work with a collection of graphs that exhibit heterogeneity. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-...

Efficient structure-preserving support tensor train machine

An increasing amount of the collected data are high-dimensional multi-way arrays (tensors), and it is crucial for efficient learning algorithms to exploit this tensorial structure as much as possible. The ever present curse of dimensionality for high ...

Cluster-specific predictions with multi-task Gaussian processes

A model involving Gaussian processes (GPs) is introduced to simultaneously handle multitask learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a ...

AutoKeras: an AutoML library for deep learning

To use deep learning, one needs to be familiar with various software tools like TensorFlow or Keras, as well as various model architecture and optimization best practices. Despite recent progress in software usability, deep learning remains a highly ...

On distance and kernel measures of conditional dependence

Measuring conditional dependence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian network learning, and others. In this work, we explore the connection ...

A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs

We introduce a relaxed inertial forward-backward-forward (RIFBF) splitting algorithm for approaching the set of zeros of the sum of a maximally monotone operator and a single-valued monotone and Lipschitz continuous operator. This work aims to extend ...

Sampling random graph homomorphisms and applications to network data analysis

A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph ...

A line-search descent algorithm for strict saddle functions with complexity guarantees

We describe a line-search algorithm which achieves the best-known worst-case complexity results for problems with a certain "strict saddle" property that has been observed to hold in low-rank matrix optimization problems. Our algorithm is adaptive, in ...

Optimal strategies for reject option classifiers

In classification with a reject option, the classifier is allowed in uncertain cases to abstain from prediction. The classical cost-based model of a reject option classifier requires the rejection cost to be defined explicitly. The alternative bounded-...

Learning-augmented count-min sketches via Bayesian nonparametrics

The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens' frequencies in a data stream of tokens, i.e. point queries, based on random hashed data. A learning-augmented version of the CMS, ...

Adaptation to the range in K-armed bandits

We consider stochastic bandit problems with K arms, each associated with a distribution supported on a given finite range [ m,M ]. We do not assume that the range [ m,M ] is known and show that there is a cost for learning this range. Indeed, a new trade-off ...

Python package for causal discovery based on LiNGAM

Causal discovery is a methodology for learning causal graphs from data, and LiNGAM is a well-known model for causal discovery. This paper describes an open-source Python package for causal discovery based on LiNGAM. The package implements various LiNGAM ...

Extending adversarial attacks to produce adversarial class probability distributions

Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious ...

Globally-consistent rule-based summary-explanations for machine learning models: application to credit-risk evaluation

We develop a method for understanding specific predictions made by (global) predictive models by constructing (local) models tailored to each specific observation (these are also called "explanations" in the literature). Unlike existing work that "...

Learning mean-field games with discounted and average costs

We consider learning approximate Nash equilibria for discrete-time mean-field games with stochastic nonlinear state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed ...

An inertial block majorization minimization framework for nonsmooth nonconvex optimization

In this paper, we introduce TITAN, a novel inerTIal block majorizaTion minimizAtioN framework for nonsmooth nonconvex optimization problems. To the best of our knowledge, TITAN is the first framework of block-coordinate update method that relies on the ...

Regularized joint mixture models

Regularized regression models are well studied and, under appropriate conditions, offer fast and statistically interpretable results. However, large data in many applications are heterogeneous in the sense of harboring distributional differences between ...

Interpolating classifiers make few mistakes

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite ...

Graph-aided online multi-kernel learning

Multi-kernel learning (MKL) has been widely used in learning problems involving function learning tasks. Compared with single kernel learning approach which relies on a preselected kernel, the advantage of MKL is its exibility results from combining a ...

Lower bounds and accelerated algorithms for bilevel optimization

Bilevel optimization has recently attracted growing interests due to its wide applications in modern machine learning problems. Although recent studies have characterized the convergence rate for several such popular algorithms, it is still unclear how ...

Bayesian data selection

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic--...

Calibrated multiple-output quantile regression with representation learning

We develop a method to generate predictive regions that cover a multivariate response variable with a user-specified probability. Our work is composed of two components. First, we use a deep generative model to learn a representation of the response that ...

Discrete variational calculus for accelerated optimization

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective (Betancourt et al., 2018). This has opened up the possibility of ...

Generalization bounds for noisy iterative algorithms using properties of additive noise channels

Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive ...

The SKIM-FA kernel: high-dimensional variable selection and nonlinear interaction discovery in linear time

Many scientific problems require identifying a small set of covariates that are associated with a target response and estimating their effects. Often, these effects are nonlinear and include interactions, so linear and additive methods can lead to poor ...

Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping

Much recent research effort has been devoted to explain the success of deep learning. Random Matrix Theory (RMT) provides an emerging way to this end by analyzing the spectra of large random matrices involved in a trained deep neural network (DNN) such ...

HiClass: a Python library for local hierarchical classification compatible with Scikit-learn

HiClass is an open-source Python library for local hierarchical classification entirely compatible with scikit-learn. It contains implementations of the most common design patterns for hierarchical machine learning models found in the literature, that is,...

Attacks against federated learning defense systems and their mitigation

The susceptibility of federated learning (FL) to attacks from untrustworthy endpoints has led to the design of several defense systems. FL defense systems enhance the federated optimization algorithm using anomaly detection, scaling the updates from ...

Save to Binder

Export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

machine learning research paper

Machine Learning

  • Reports substantive results on a wide range of learning methods applied to various learning problems.
  • Provides robust support through empirical studies, theoretical analysis, or comparison to psychological phenomena.
  • Demonstrates how to apply learning methods to solve significant application problems.
  • Improves how machine learning research is conducted.
  • Prioritizes verifiable and replicable supporting evidence in all published papers.
  • Hendrik Blockeel

machine learning research paper

Latest issue

Volume 113, Issue 9

Latest articles

Towards a foundation large events model for soccer.

  • Tiago Mendes-Neves
  • Luís Meireles
  • João Mendes-Moreira

machine learning research paper

Persistent Laplacian-enhanced algorithm for scarcely labeled data classification

  • Gokul Bhusal
  • Ekaterina Merkurjev
  • Guo-Wei Wei

machine learning research paper

Conformal prediction for regression models with asymmetrically distributed errors: application to aircraft navigation during landing maneuver

  • Solène Vilfroy
  • Lionel Bombrun
  • Philippe Carré

machine learning research paper

Evaluating large language models for user stance detection on X (Twitter)

  • Margherita Gambini
  • Caterina Senette
  • Maurizio Tesconi

machine learning research paper

In-game soccer outcome prediction with offline reinforcement learning

  • Pegah Rahimian
  • Balazs Mark Mihalyi
  • Laszlo Toka

machine learning research paper

Journal updates

Cfp: discovery science 2023.

Submission Deadline: March 4, 2024

Guest Editors: Rita P. Ribeiro, Albert Bifet, Ana Carolina Lorena

CfP: IJCLR Learning and reasoning

Call for papers: conformal prediction and distribution-free uncertainty quantification.

Submission Deadline: January 7th, 2024

Guest Editors: Henrik Boström, Eyke Hüllermeier, Ulf Johansson, Khuong An Nguyen, Aaditya Ramdas

Call for Papers: Special Issue on Explainable AI for Secure Applications

Submissions Open: October 15, 2024 Submission Deadline:  January 15, 2025

Guest Editors:  Annalisa Appice, Giuseppeina Andresini, Przemysław Biecek, Christian Wressnegger

Journal information

  • ACM Digital Library
  • Current Contents/Engineering, Computing and Technology
  • EI Compendex
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Mathematical Reviews
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer Science+Business Media LLC, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

futureinternet-logo

Article Menu

machine learning research paper

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Machine learning: models, challenges, and research directions.

machine learning research paper

1. Introduction

  • Brief discussion of data pre-processing;
  • Detailed classification of supervised, semi-supervised, unsupervised, and reinforcement learning models;
  • Study of known optimization techniques;
  • Challenges of machine learning in the field of cybersecurity.

2. Related Work and Research Methodology

ReferenceYearStudy HighlightsCoverage of Data Pre-Processing and Hyperparameter TuningCoverage of Machine Learning
Data Pre-ProcessingHyperparameter Tuning ApproachSupervised LearningUnsupervised LearningSemi-Supervised LearningReinforcement Learning
[ ]2021Describes the known deep learning models, their principles, and characteristics.
[ ]2019Focuses on limited machine learning techniques on only software-defined networking.
[ ]2022Investigates the known issues in the field of system designs that can be solved using machine learning techniques.
[ ]2021Presents a detailed description of a few supervised models and their optimization techniques.
[ ]2021Provides an overview of semi-supervised machine learning techniques with their existing algorithms.
[ ]2022Provides the state of the art, challenges, and limitations of supervised models in the field of maritime risk analysis.
[ ] 2022Reviews hardware architecture of reinforcement learning algorithms.
[ ]2022Presents the existing algorithm for wireless sensor networks and describes the existing challenges of using such techniques.
[ ] 2016Describes most of the known supervised algorithms for classification problems.
[ ]2019Provides a description of known supervised and unsupervised models.
[ ] 2021Discusses supervised and unsupervised deep learning models for intrusion detection systems.
[ ] 2021Surveys existing supervised and unsupervised techniques in smart grid.
[ ]2021Explains known algorithms for image classifications.
[ ]2022Illustrates the unsupervised deep learning models and summarizes their challenges.
[ ] 2023Discusses techniques for energy usage in future
[ ] 2020Reviews various ML techniques in the security of the Internet of Things.
[ ]2020Proposes a taxonomy of machine learning techniques in the security of Internet of Things.
[ ]2019Surveys the taxonomy of machine learning models in intrusion detection systems.
[ ]2022Gives ML techniques in industrial control systems.
[ ]2022Proposes the taxonomy of intrusion detection systems for supervised models.

3. Machine Learning Models

3.1. supervised learning, 3.2. semi-supervised learning, 3.3. unsupervised learning, 3.4. reinforcement learning, 4. machine learning processes, 4.1. data pre-processing, 4.2. tuning approaches, 4.3. evaluation metrics, 4.3.1. evaluation metrics for supervised learning, 4.3.2. evaluation metrics for unsupervised learning models, 4.3.3. evaluation metrics for semi-supervised learning models, 4.3.4. evaluation metrics for reinforcement learning models, 5. challenges and future directions, 6. conclusions, author contributions, data availability statement, conflicts of interest.

  • Sarker, I.H. Machine Learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021 , 2 , 160. [ Google Scholar ] [ CrossRef ]
  • Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Felländer, A.; Langhans, S.D.; Tegmark, M.; Nerini, F.F. The role of artificial intelligence in achieving the sustainable development goals. Nat. Commun. 2020 , 11 , 233. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ullah, Z.; Al-Turjman, F.; Mostarda, L.; Gagliardi, R. Applications of artificial intelligence and machine learning in smart cities. Comput. Commun. 2020 , 154 , 313–323. [ Google Scholar ] [ CrossRef ]
  • Ozcanli, A.K.; Yaprakdal, F.; Baysal, M. Deep learning methods and applications for electrical power systems: A comprehensive review. Int. J. Energy Res. 2020 , 44 , 7136–7157. [ Google Scholar ] [ CrossRef ]
  • Zhao, S.; Blaabjerg, F.; Wang, H. An Overview of Artificial Intelligence Applications for Power Electronics. IEEE Trans. Power Electron. 2021 , 36 , 4633–4658. [ Google Scholar ] [ CrossRef ]
  • Mamun, A.A.; Sohel, M.; Mohammad, N.; Sunny, M.S.H.; Dipta, D.R.; Hossain, E. A Comprehensive Review of the Load Fore-casting Techniques Using Single and Hybrid Predictive Models. IEEE Access 2020 , 8 , 134911–134939. [ Google Scholar ] [ CrossRef ]
  • Massaoudi, M.; Darwish, A.; Refaat, S.S.; Abu-Rub, H.; Toliyat, H.A. UHF Partial Discharge Localization in Gas-Insulated Switch-gears: Gradient Boosting Based Approach. In Proceedings of the 2020 IEEE Kansas Power and Energy Conference (KPEC), Manhattan, KS, USA, 13–14 July 2020; pp. 1–5. [ Google Scholar ]
  • Ali, S.S.; Choi, B.J. State-of-the-Art Artificial Intelligence Techniques for Distributed Smart Grids: A Review. Electronics 2020 , 9 , 1030. [ Google Scholar ] [ CrossRef ]
  • Yin, L.; Gao, Q.; Zhao, L.; Zhang, B.; Wang, T.; Li, S.; Liu, H. A review of machine learning for new generation smart dispatch in power systems. Eng. Appl. Artif. Intell. 2020 , 88 , 103372. [ Google Scholar ] [ CrossRef ]
  • Peng, S.; Sun, S.; Yao, Y.-D. A Survey of Modulation Classification Using Deep Learning: Signal Representation and Data Prepro-cessing. In IEEE Transactions on Neural Networks and Learning Systems ; IEEE: New York, NY, USA, 2021. [ Google Scholar ]
  • Arjoune, Y.; Kaabouch, N. A Comprehensive Survey on Spectrum Sensing in Cognitive Radio Networks: Recent Advances, New Challenges, and Future Research Directions. Sensors 2019 , 19 , 126. [ Google Scholar ] [ CrossRef ]
  • Meng, T.; Jing, X.; Yan, Z.; Pedrycz, W. A survey on machine learning for data fusion. Inf. Fusion 2020 , 57 , 115–129. [ Google Scholar ] [ CrossRef ]
  • Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 2019 , 8 , 832. [ Google Scholar ] [ CrossRef ]
  • Khoei, T.T.; Ismail, S.; Kaabouch, N. Boosting-based Models with Tree-structured Parzen Estimator Optimization to Detect Intrusion Attacks on Smart Grid. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021; pp. 165–170. [ Google Scholar ] [ CrossRef ]
  • Hutter, F.; Lücke, J.; Schmidt-Thieme, L. Beyond manual tuning of hyperparameters. KI-Künstliche Intell. 2015 , 29 , 329–337. [ Google Scholar ] [ CrossRef ]
  • Khoei, T.T.; Aissou, G.; Hu, W.C.; Kaabouch, N. Ensemble Learning Methods for Anomaly Intrusion Detection System in Smart Grid. In Proceedings of the IEEE International Conference on Electro Information Technology (EIT), Mt. Pleasant, MI, USA, 14–15 May 2021; pp. 129–135. [ Google Scholar ] [ CrossRef ]
  • Waubert de Puiseau, C.; Meyes, R.; Meisen, T. On reliability of reinforcement learning based production scheduling systems: A comparative survey. J. Intell. Manuf. 2022 , 33 , 911–927. [ Google Scholar ] [ CrossRef ]
  • Moos, J.; Hansel, K.; Abdulsamad, H.; Stark, S.; Clever, D.; Peters, J. Robust Reinforcement Learning: A Review of Foundations and Recent Advances. Mach. Learn. Knowl. Extr. 2022 , 4 , 276–315. [ Google Scholar ] [ CrossRef ]
  • Latif, S.; Cuayáhuitl, H.; Pervez, F.; Shamshad, F.; Ali, H.S.; Cambria, E. A survey on deep reinforcement learning for audio-based applications. Artif. Intell. Rev. 2022 , 56 , 2193–2240. [ Google Scholar ] [ CrossRef ]
  • Passah, A.; Kandar, D. A lightweight deep learning model for classification of synthetic aperture radar images. Ecol. Inform. 2023 , 77 , 102228. [ Google Scholar ] [ CrossRef ]
  • Verbraeken, J.; Wolting, M.; Katzy, J.; Kloppenburg, J.; Verbelen, T.; Rellermeyer, J.S. A survey on distributed machine learning. ACM Comput. Surv. 2020 , 53 , 1–33. [ Google Scholar ] [ CrossRef ]
  • Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A survey of deep learning and its applications: A new paradigm to machine learning. Arch. Comput. Methods Eng. 2020 , 27 , 1071–1092. [ Google Scholar ] [ CrossRef ]
  • Pitropakis, N.; Panaousis, E.; Giannetsos, T.; Anastasiadis, E.; Loukas, G. A taxonomy and survey of attacks against machine learning. Comput. Sci. Rev. 2019 , 34 , 100199. [ Google Scholar ] [ CrossRef ]
  • Wu, X.; Xiao, L.; Sun, Y.; Zhang, J.; Ma, T.; He, L. A survey of human-in-the-loop for machine learning. Futur. Gener. Comput. Syst. 2022 , 135 , 364–381. [ Google Scholar ] [ CrossRef ]
  • Wang, Q.; Ma, Y.; Zhao, K.; Tian, Y. A comprehensive survey of loss functions in machine learning. Ann. Data Sci. 2022 , 9 , 187–212. [ Google Scholar ] [ CrossRef ]
  • Choi, H.; Park, S. A Survey of Machine Learning-Based System Performance Optimization Techniques. Appl. Sci. 2021 , 11 , 3235. [ Google Scholar ] [ CrossRef ]
  • Rawson, A.; Brito, M. A survey of the opportunities and challenges of supervised machine learning in maritime risk analysis. Transp. Rev. 2022 , 43 , 108–130. [ Google Scholar ] [ CrossRef ]
  • Ahmad, R.; Wazirali, R.; Abu-Ain, T. Machine Learning for Wireless Sensor Networks Security: An Overview of Challenges and Issues. Sensors 2022 , 22 , 4730. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Singh, A.; Thakur, N.; Sharma, A. A review of supervised machine learning algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [ Google Scholar ]
  • Abdallah, E.E.; Eleisah, W.; Otoom, A.F. Intrusion Detection Systems using Supervised Machine Learning Techniques: A survey. Procedia Comput. Sci. 2022 , 201 , 205–212. [ Google Scholar ] [ CrossRef ]
  • Dike, H.U.; Zhou, Y.; Deveerasetty, K.K.; Wu, Q. Unsupervised Learning Based On Artificial Neural Network: A Review. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), 25–27 October 2018; pp. 322–327. [ Google Scholar ]
  • van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020 , 109 , 373–440. [ Google Scholar ] [ CrossRef ]
  • Rothmann, M.; Porrmann, M. A Survey of Domain-Specific Architectures for Reinforcement Learning. IEEE Access 2022 , 10 , 13753–13767. [ Google Scholar ] [ CrossRef ]
  • Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2020 , 40 , 100379. [ Google Scholar ] [ CrossRef ]
  • Ray, S. A Quick Review of Machine Learning Algorithms. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 35–39. [ Google Scholar ]
  • Lansky, J.; Ali, S.; Mohammadi, M.; Majeed, M.K.; Karim, S.H.T.; Rashidi, S.; Hosseinzadeh, M.; Rahmani, A.M. Deep Learning-Based Intrusion Detection Systems: A Systematic Review. IEEE Access 2021 , 9 , 101574–101599. [ Google Scholar ] [ CrossRef ]
  • Massaoudi, M.; Abu-Rub, H.; Refaat, S.S.; Chihi, I.; Oueslati, F.S. Deep Learning in Smart Grid Technology: A Review of Recent Advancements and Future Prospects. IEEE Access 2021 , 9 , 54558–54578. [ Google Scholar ] [ CrossRef ]
  • Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019 , 9 , 4396. [ Google Scholar ] [ CrossRef ]
  • Wu, N.; Xie, Y. A survey of machine learning for computer architecture and systems. ACM Comput. Surv. 2022 , 55 , 1–39. [ Google Scholar ] [ CrossRef ]
  • Schmarje, L.; Santarossa, M.; Schröder, S.-M.; Koch, R. A Survey on Semi-, Self- and Unsupervised Learning for Image Classification. IEEE Access 2021 , 9 , 82146–82168. [ Google Scholar ] [ CrossRef ]
  • Xie, J.; Yu, F.R.; Huang, T.; Xie, R.; Liu, J.; Wang, C.; Liu, Y. A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges. In IEEE Communications Surveys & Tutorials ; IEEE: New York, NY, USA, 2019; Volume 21, pp. 393–430. [ Google Scholar ]
  • Yao, Z.; Lum, Y.; Johnston, A.; Mejia-Mendoza, L.M.; Zhou, X.; Wen, Y.; Aspuru-Guzik, A.; Sargent, E.H.; Seh, Z.W. Machine learning for a sustainable energy future. Nat. Rev. Mater. 2023 , 8 , 202–215. [ Google Scholar ] [ CrossRef ]
  • Al-Garadi, M.A.; Mohamed, A.; Al-Ali, A.K.; Du, X.; Ali, I.; Guizani, M. A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security. In IEEE Communications Surveys & Tutorials ; IEEE: New York, NY, USA, 2020; Volume 22, pp. 1646–1685. [ Google Scholar ]
  • Messaoud, S.; Bradai, A.; Bukhari, S.H.R.; Quang, P.T.A.; Ahmed, O.B.; Atri, M. A survey on machine learning in internet of things: Algorithms, strategies, and applications. Internet Things 2020 , 12 , 100314. [ Google Scholar ] [ CrossRef ]
  • Umer, M.A.; Junejo, K.N.; Jilani, M.T.; Mathur, A.P. Machine learning for intrusion detection in industrial control systems: Ap-plications, challenges, and recommendations. Int. J. Crit. Infrastruct. Prot. 2022 , 38 , 100516. [ Google Scholar ] [ CrossRef ]
  • Von Rueden, L.; Mayer, S.; Garcke, J.; Bauckhage, C.; Schuecker, J. Informed machine learning–towards a taxonomy of explicit integration of knowledge into machine learning. Learning 2019 , 18 , 19–20. [ Google Scholar ]
  • Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 2020 , 104 , 101822. [ Google Scholar ] [ CrossRef ]
  • Wang, H.; Lv, L.; Li, X.; Li, H.; Leng, J.; Zhang, Y.; Thomson, V.; Liu, G.; Wen, X.; Luo, G. A safety management approach for Industry 5.0′ s human-centered manufacturing based on digital twin. J. Manuf. Syst. 2023 , 66 , 1–12. [ Google Scholar ] [ CrossRef ]
  • Reuther, A.; Michaleas, P.; Jones, M.; Gadepally, V.; Samsi, S.; Kepner, J. Survey and Benchmarking of Machine Learning Accelerators. In Proceedings of the 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA USA, 24–26 September 2019; pp. 1–9. [ Google Scholar ]
  • Kaur, B.; Dadkhah, S.; Shoeleh, F.; Neto, E.C.P.; Xiong, P.; Iqbal, S.; Lamontagne, P.; Ray, S.; Ghorbani, A.A. Internet of Things (IoT) security dataset evolution: Challenges and future directions. Internet Things 2023 , 22 , 100780. [ Google Scholar ] [ CrossRef ]
  • Paullada, A.; Raji, I.D.; Bender, E.M.; Denton, E.; Hanna, A. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns 2021 , 2 , 100336. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Slimane, H.O.; Benouadah, S.; Khoei, T.T.; Kaabouch, N. A Light Boosting-based ML Model for Detecting Deceptive Jamming Attacks on UAVs. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; pp. 328–333. [ Google Scholar ]
  • Manesh, M.R.; Kenney, J.; Hu, W.C.; Devabhaktuni, V.K.; Kaabouch, N. Detection of GPS spoofing attacks on unmanned aerial systems. In Proceedings of the 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 11–14 January 2019; pp. 1–6. [ Google Scholar ]
  • Sharifani, K.; Amini, M. Machine Learning and Deep Learning: A Review of Methods and Applications. World Inf. Technol. Eng. J. 2023 , 10 , 3897–3904. [ Google Scholar ]
  • Obaid, H.S.; Dheyab, S.A.; Sabry, S.S. The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Ac-curacy of Machine Learning. In Proceedings of the 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), Jaipur, India, 13–15 March 2019; pp. 279–283. [ Google Scholar ]
  • Liu, B.; Ding, M.; Shaham, S.; Rahayu, W.; Lin, Z. When machine learning meets privacy: A survey and outlook. ACM Comput. Surv. (CSUR) 2021 , 54 , 1–36. [ Google Scholar ] [ CrossRef ]
  • Singh, S.; Gupta, P. Comparative study ID3, cart and C4. 5 decision tree algorithm: A survey. Int. J. Adv. Inf. Sci. Technol. (IJAIST) 2014 , 27 , 97–103. [ Google Scholar ]
  • Zhang, M.-L.; Zhou, Z.-H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007 , 40 , 2038–2048. [ Google Scholar ] [ CrossRef ]
  • Musavi, M.T.; Ahmed, W.; Chan, K.H.; Faris, K.B.; Hummels, D.M. On the training of radial basis function classifiers. Neural Netw. 1992 , 5 , 595–603. [ Google Scholar ] [ CrossRef ]
  • Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics 2021 , 10 , 593. [ Google Scholar ] [ CrossRef ]
  • Jiang, T.; Fang, H.; Wang, H. Blockchain-Based Internet of Vehicles: Distributed Network Architecture and Performance Analy-sis. IEEE Internet Things J. 2019 , 6 , 4640–4649. [ Google Scholar ] [ CrossRef ]
  • Jia, W.; Dai, D.; Xiao, X.; Wu, H. ARNOR: Attention regularization based noise reduction for distant supervision relation classifi-cation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1399–1408. [ Google Scholar ]
  • Abiodun, O.I.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018 , 4 , e00938. [ Google Scholar ] [ CrossRef ]
  • Izeboudjen, N.; Larbes, C.; Farah, A. A new classification approach for neural networks hardware: From standards chips to embedded systems on chip. Artif. Intell. Rev. 2014 , 41 , 491–534. [ Google Scholar ] [ CrossRef ]
  • Wang, D.; He, H.; Liu, D. Intelligent Optimal Control With Critic Learning for a Nonlinear Overhead Crane System. IEEE Trans. Ind. Informatics 2018 , 14 , 2932–2940. [ Google Scholar ] [ CrossRef ]
  • Wang, S.-C. Artificial Neural Network. In Interdisciplinary Computing in Java Programming ; Springer: Berlin/Heidelberg, Germany, 2003; pp. 81–100. [ Google Scholar ]
  • Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017. [ Google Scholar ]
  • Khoei, T.T.; Slimane, H.O.; Kaabouch, N. Cyber-Security of Smart Grids: Attacks, Detection, Countermeasure Techniques, and Future Directions. Commun. Netw. 2022 , 14 , 119–170. [ Google Scholar ] [ CrossRef ]
  • Gunturi, S.K.; Sarkar, D. Ensemble machine learning models for the detection of energy theft. Electr. Power Syst. Res. 2021 , 192 , 106904. [ Google Scholar ] [ CrossRef ]
  • Chafii, M.; Bader, F.; Palicot, J. Enhancing coverage in narrow band-IoT using machine learning. In Proceedings of the 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; pp. 1–6. [ Google Scholar ]
  • Bithas, P.S.; Michailidis, E.T.; Nomikos, N.; Vouyioukas, D.; Kanatas, A.G. A Survey on Machine-Learning Techniques for UAV-Based Communications. Sensors 2019 , 19 , 5170. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors 2021 , 21 , 3758. [ Google Scholar ] [ CrossRef ]
  • Wagle, P.P.; Rani, S.; Kowligi, S.B.; Suman, B.H.; Pramodh, B.; Kumar, P.; Raghavan, S.; Shastry, K.A.; Sanjay, H.A.; Kumar, M.; et al. Machine Learning-Based Ensemble Network Security System. In Recent Advances in Artificial Intelligence and Data Engineering ; Springer: Berlin/Heidelberg, Germany, 2022; pp. 3–15. [ Google Scholar ]
  • Sutton, C.D. Classification and regression trees, bagging, and boosting. Handb. Stat. 2005 , 24 , 303–329. [ Google Scholar ]
  • Zaadnoordijk, L.; Besold, T.R.T.; Cusack, R. Lessons from infant learning for unsupervised machine learning. Nat. Mach. Intell. 2022 , 4 , 510–520. [ Google Scholar ] [ CrossRef ]
  • Khoei, T.T.; Kaabouch, N. A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems. Information 2023 , 14 , 103. [ Google Scholar ] [ CrossRef ]
  • Kumar, P.; Gupta, G.P.; Tripathi, R. An ensemble learning and fog-cloud architecture-driven cyber-attack detection framework for IoMT networks. Comput. Commun. 2021 , 166 , 110–124. [ Google Scholar ] [ CrossRef ]
  • Hady, M.; Abdel, A.M.F.; Schwenker, F. Semi-supervised learning. In Handbook on Neural Information Processing ; Springer: Berlin/Heidelberg, Germany, 2013. [ Google Scholar ]
  • Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019 , 20 , 1–21. [ Google Scholar ]
  • Luo, Y.; Zhu, J.; Li, M.; Ren, Y.; Zhang, B. Smooth neighbors on teacher graphs for semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Lake City, UT, USA, 18–22 June 2018; pp. 8896–8905. [ Google Scholar ]
  • Park, S.; Park, J.; Shin, S.; Moon, I. Adversarial dropout for supervised and semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3917–3924. [ Google Scholar ]
  • Khoei, T.T.; Kaabouch, N. ACapsule Q-learning based reinforcement model for intrusion detection system on smart grid. In Proceedings of the IEEE International Conference on Electro Information Technology (eIT), Romeoville, IL, USA, 18–20 May 2023; pp. 333–339. [ Google Scholar ]
  • Polydoros, A.S.; Nalpantidis, L. Survey of model-based reinforcement learning: Applications on robotics. J. Intell. Robot. Syst. 2017 , 86 , 153–173. [ Google Scholar ] [ CrossRef ]
  • Degris, T.; Pilarski, P.M.; Sutton, R.S. Model-Free reinforcement learning with continuous action in practice. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 2177–2182. [ Google Scholar ] [ CrossRef ]
  • Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement learning and its applications in modern power and energy systems: A review. J. Mod. Power Syst. Clean Energy 2020 , 8 , 1029–1042. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.M.; Harman, M.; Ma, L.; Liu, Y. Machine Learning Testing: Survey, Landscapes and Horizons. In IEEE Transactions on Software Engineering ; IEEE: New York, NY, USA, 2022; Volume 48, pp. 1–36. [ Google Scholar ]
  • Salahdine, F.; Kaabouch, N. Security threats, detection, and countermeasures for physical layer in cognitive radio networks: A survey. Phys. Commun. 2020 , 39 , 101001. [ Google Scholar ] [ CrossRef ]
  • Ramírez, J.; Yu, W.; Perrusquía, A. Model-free reinforcement learning from expert demonstrations: A survey. Artif. Intell. Rev. 2022 , 55 , 3213–3241. [ Google Scholar ] [ CrossRef ]
  • Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020 , 415 , 295–316. [ Google Scholar ] [ CrossRef ]
  • Dev, K.; Maddikunta, P.K.R.; Gadekallu, T.R.; Bhattacharya, S.; Hegde, P.; Singh, S. Energy Optimization for Green Communication in IoT Using Harris Hawks Optimization. In IEEE Transactions on Green Communications and Networking ; IEEE: New York, NY, USA, 2022; Volume 6, pp. 685–694. [ Google Scholar ]
  • Khodadadi, N.; Snasel, V.; Mirjalili, S. Dynamic Arithmetic Optimization Algorithm for Truss Optimization Under Natural Fre-quency Constraints. IEEE Access 2022 , 10 , 16188–16208. [ Google Scholar ] [ CrossRef ]
  • Cummins, C.; Wasti, B.; Guo, J.; Cui, B.; Ansel, J.; Gomez, S.; Jain, S.; Liu, J.; Teytaud, O.; Steinerm, B.; et al. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. In Proceedings of the 2022 IEEE/ACM In-ternational Symposium on Code Generation and Optimization (CGO), Seoul, Republic of Korea, 2–6 April 2022; pp. 92–105. [ Google Scholar ]
  • Zhang, W.; Gu, X.; Tang, L.; Yin, Y.; Liu, D.; Zhang, Y. Application of machine learning, deep learning and optimization algo-rithms in geoengineering and geoscience: Comprehensive review and future challenge. Gondwana Res. 2022 , 109 , 1–17. [ Google Scholar ] [ CrossRef ]
  • Mittal, S.; Vaishay, S. A survey of techniques for optimizing deep learning on GPUs. J. Syst. Arch. 2019 , 99 , 101635. [ Google Scholar ] [ CrossRef ]
  • Zhang, Q.; Yang, L.T.; Chen, Z.; Li, P. A survey on deep learning for big data. Inf. Fusion 2018 , 42 , 146–157. [ Google Scholar ] [ CrossRef ]
  • Oyelade, O.N.; Ezugwu, A.E.-S.; Mohamed, T.I.A.; Abualigah, L. Ebola Optimization Search Algorithm: A New Nature-Inspired Metaheuristic Optimization Algorithm. IEEE Access 2022 , 10 , 16150–16177. [ Google Scholar ] [ CrossRef ]
  • Blank, J.; Deb, K. Pymoo: Multi-Objective Optimization in Python. IEEE Access 2020 , 8 , 89497–89509. [ Google Scholar ] [ CrossRef ]
  • Qiao, K.; Yu, K.; Qu, B.; Liang, J.; Song, H.; Yue, C. An Evolutionary Multitasking Optimization Framework for Constrained Multi-objective Optimization Problems. IEEE Trans. Evol. Comput. 2022 , 26 , 263–277. [ Google Scholar ] [ CrossRef ]
  • Riaz, M.; Ahmad, S.; Hussain, I.; Naeem, M.; Mihet-Popa, L. Probabilistic Optimization Techniques in Smart Power System. Energies 2022 , 15 , 825. [ Google Scholar ] [ CrossRef ]
  • Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv 2020 , arXiv:2003.05689. [ Google Scholar ]
  • Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on deep semi-supervised learning. arXiv 2021 , arXiv:2103.00550. [ Google Scholar ] [ CrossRef ]
  • Gibson, B.R.; Rogers, T.T.; Zhu, X. Human semi-supervised learning. Top. Cogn. Sci. 2013 , 5 , 132–172. [ Google Scholar ] [ CrossRef ]
  • Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020 , 50 , 3826–3839. [ Google Scholar ] [ CrossRef ]
  • Canese, L.; Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-Agent Reinforcement Learning: A Review of Challenges and Applications. Appl. Sci. 2021 , 11 , 4948. [ Google Scholar ] [ CrossRef ]
  • Du, W.; Ding, S. A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artif. Intell. Rev. 2020 , 54 , 3215–3238. [ Google Scholar ] [ CrossRef ]
  • Salwan, D.; Kant, S.; Pareek, H.; Sharma, R. Challenges with reinforcement learning in prosthesis. Mater. Today Proc. 2022 , 49 , 3133–3136. [ Google Scholar ] [ CrossRef ]
  • Narkhede, M.S.; Chatterji, S.; Ghosh, S. Trends and challenges in optimization techniques for operation and control of Mi-crogrid—A review. In Proceedings of the 2012 1st International Conference on Power and Energy in NERIST (ICPEN), Nirjuli, India, 28–29 December 2012; pp. 1–7. [ Google Scholar ]
  • Khoei, T.T.; Ismail, S.; Kaabouch, N. Dynamic Selection Techniques for Detecting GPS Spoofing Attacks on UAVs. Sensors 2022 , 22 , 662. [ Google Scholar ] [ CrossRef ]
  • Khoei, T.T.; Ismail, S.; Al Shamaileh, K.; Devabhaktuni, V.K.; Kaabouch, N. Impact of Dataset and Model Parameters on Machine Learning Performance for the Detection of GPS Spoofing Attacks on Unmanned Aerial Vehicles. Appl. Sci. 2022 , 13 , 383. [ Google Scholar ] [ CrossRef ]
  • Khoei, T.T.; Kaabouch, N. Densely Connected Neural Networks for Detecting Denial of Service Attacks on Smart Grid Network. In Proceedings of the IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 0207–0211. [ Google Scholar ]
  • Khan, A.; Khan, S.H.; Saif, M.; Batool, A.; Sohail, A.; Khan, M.W. A Survey of Deep Learning Techniques for the Analysis of COVID-19 and their usability for Detecting Omicron. J. Exp. Theor. Artif. Intell. 2023 , 1–43. [ Google Scholar ] [ CrossRef ]
  • Gopinath, M.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 2023 , 47 , 100529. [ Google Scholar ]
  • Gheisari, M.; Ebrahimzadeh, F.; Rahimi, M.; Moazzamigodarzi, M.; Liu, Y.; Pramanik, P.K.D.; Heravi, M.A.; Mehbodniya, A.; Ghaderzadeh, M.; Feylizadeh, M.R.; et al. Deep learning: Applications, architectures, models, tools, and frameworks: A com-prehensive survey. In CAAI Transactions on Intelligence Technology ; IET: Stevenage, UK, 2023. [ Google Scholar ]
  • Morgan, D.; Jacobs, R. Opportunities and challenges for machine learning in materials science. Annu. Rev. Mater. Res. 2020 , 50 , 71–103. [ Google Scholar ] [ CrossRef ]
  • Phoon, K.K.; Zhang, W. Future of machine learning in geotechnics. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2023 , 17 , 7–22. [ Google Scholar ] [ CrossRef ]
  • Krishnam, N.P.; Ashraf, M.S.; Rajagopal, B.R.; Vats, P.; Chakravarthy, D.S.K.; Rafi, S.M. Analysis of Current Trends, Advances and Challenges of Machine Learning (Ml) and Knowledge Extraction: From Ml to Explainable AI. Ind. Qualif.-Stitute Adm. Manag. UK 2022 , 58 , 54–62. [ Google Scholar ]
  • Li, Z.; Yoon, J.; Zhang, R.; Rajabipour, F.; Srubar, W.V., III; Dabo, I.; Radlińska, A. Machine learning in concrete science: Applications, challenges, and best practices. NPJ Comput. Mater. 2022 , 8 , 127. [ Google Scholar ] [ CrossRef ]
  • Houssein, E.H.; Abohashima, Z.; Elhoseny, M.; Mohamed, W.M. Machine learning in the quantum realm: The state-of-the-art, challenges, and future vision. Expert Syst. Appl. 2022 , 194 , 116512. [ Google Scholar ] [ CrossRef ]
  • Khan, T.; Tian, W.; Zhou, G.; Ilager, S.; Gong, M.; Buyya, R. Machine learning (ML)-centric resource management in cloud computing: A review and future directions. J. Netw. Comput. Appl. 2022 , 204 , 103405. [ Google Scholar ] [ CrossRef ]
  • Esterhuizen, J.A.; Goldsmith, B.R.; Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat. Catal. 2022 , 5 , 175–184. [ Google Scholar ] [ CrossRef ]
  • Bharadiya, J.P. Leveraging Machine Learning for Enhanced Business Intelligence. Int. J. Comput. Sci. Technol. 2023 , 7 , 1–19. [ Google Scholar ]
  • Talaei Khoei, T.; Ould Slimane, H.; Kaabouch, N. Deep learning: Systematic review, models, challenges, and research directions. In Neural Computing and Applications ; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–22. [ Google Scholar ]
  • Ben Amor, S.; Belaid, F.; Benkraiem, R.; Ramdani, B.; Guesmi, K. Multi-criteria classification, sorting, and clustering: A bibliometric review and research agenda. Ann. Oper. Res. 2023 , 325 , 771–793. [ Google Scholar ] [ CrossRef ]
  • Valdez, F.; Melin, P. A review on quantum computing and deep learning algorithms and their applications. Soft Comput. 2023 , 27 , 13217–13236. [ Google Scholar ] [ CrossRef ]
  • Fihri, W.F.; Arjoune, Y.; Hassan El Ghazi, H.; Kaabouch, N.; Abou El Majd, A.B. A particle swarm optimization based algorithm for primary user emulation attack detection. In Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018; pp. 823–827. [ Google Scholar ]

Click here to enlarge figure

Classification CategoryCharacteristicsAdvantagesDisadvantages
Bayesian-
Based
Tree-
based
Instance-
based
Regularization-based
Neural network-based
Ensemble-based
Classification CategoryCharacteristicsAdvantageDisadvantage
Inductive-
Based
Generates a model that can create predictions for any sample in the input spaceThe predictions of new samples are independent of old samplesThe same model can be used in training and predicting new data samples
Transductive-
based
Predictive strengths are limited to objects that are processed during the training stepsNo difference between the training and testing stepsNo distinction between the transductive algorithms in a supervised manner
Classification CategoryCharacteristicsAdvantagesDisadvantages
Cluster-basedDivides uncategorized data into similar groups;
Dimensionality reduction-basedDecreases the number of features in the given dataset;
Neural network-basedInspiration of human brains.
Classification CategoryCharacteristicsAdvantageDisadvantage
Model-basedOptimal actions are learned via a model
Model free-basedNo transition of a probability distribution or reward associated with the Markov decision process
Data Preprocessing StepsMethodologyTechniqueHighlights
Data transformationStandardization
and
normalization
Unit vector normalizationExtract the given data, and convert them to a usable format
Max abs scalar
Quantile transformer scalar
Robust scalar Min-max scaling
Power transformer scalar
Unit vector normalization
Standard scalar
Data cleaningMissing value imputationComplete case analysisLoss of efficiency, strong bias, and complications in handling data.
Frequent category imputation
Mean/median imputation
Mode imputation
End of tail imputation
Nearest neighbor imputation
Iterative imputation
Hot and cold deck imputation
Exploration imputation
Interpolation imputation
Regression-based imputation
Noise treatmentData polishing
Noise filters
Data reduction/
increasing
Feature selectionWrapperDecrease or increase the number of samples or features that are not important in the process of training
Filter
Embedded
Feature extractionPrinciple component analysis
Linear discriminative analysis
Independent component analysis
Partial least square
Multifactor dimensionality reduction
Nonlinear dimensionality reduction
Autoencoder
Tensor decomposition
Instance generationCondensation algorithms
Edition algorithms
Hybrid algorithms
DiscretizationDiscretization-basedChi-squared discretizationLoss of information, simplicity, readability, and faster learning process
Efficient discretization
Imbalanced learningUnder-samplingRandom under-samplingPresents true evaluation results
Tomek links
Condensed nearest neighbor
Edited nearest neighbor
Near-miss under-sampling
OversamplingRandom oversampling
Synthetic minority oversampling technique
Adaptive synthetic
Borderline-synthetic minority oversampling technique
Hyperparameter MethodsStrengthsLimitations
Grid search
Random search
Genetic algorithm
Gradient-based techniques
Bayesian optimization-Gaussian process
Particle swarm optimization
Bayesian optimization-tree structure parzen estimator
Hyperband
Bayesian optimization-SMAC
Population-based
CategoryMetric Name
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning
ChallengesDescriptions
Interpretability and Explain-ability
Bias and Fairness
Adversarial Robustness
Privacy and Security
Reinforcement Learning
Quantum Computing
Multi-Criteria Models
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Talaei Khoei, T.; Kaabouch, N. Machine Learning: Models, Challenges, and Research Directions. Future Internet 2023 , 15 , 332. https://doi.org/10.3390/fi15100332

Talaei Khoei T, Kaabouch N. Machine Learning: Models, Challenges, and Research Directions. Future Internet . 2023; 15(10):332. https://doi.org/10.3390/fi15100332

Talaei Khoei, Tala, and Naima Kaabouch. 2023. "Machine Learning: Models, Challenges, and Research Directions" Future Internet 15, no. 10: 332. https://doi.org/10.3390/fi15100332

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

This week: the arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: machine learning and deep learning.

Abstract: Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep learning models outperform shallow machine learning models and traditional data analysis approaches. In this article, we summarize the fundamentals of machine learning and deep learning to generate a broader understanding of the methodical underpinning of current intelligent systems. In particular, we provide a conceptual distinction between relevant terms and concepts, explain the process of automated analytical model building through machine learning and deep learning, and discuss the challenges that arise when implementing such intelligent systems in the field of electronic markets and networked business. These naturally go beyond technological aspects and highlight issues in human-machine interaction and artificial intelligence servitization.
Comments: Published online first in Electronic Markets
Subjects: Artificial Intelligence (cs.AI)
Cite as: [cs.AI]
  (or [cs.AI] for this version)
  Focus to learn more arXiv-issued DOI via DataCite
: Focus to learn more DOI(s) linking to related resources

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

AIM

  • Conferences
  • Last Updated: September 13, 2024
  • In AI Mysteries

Top Machine Learning Research Papers

machine learning research paper

  • by Dr. Nivash Jeevanandam

Join AIM in Whatsapp

Advances in machine learning and deep learning research are reshaping our technology. Machine learning and deep learning have accomplished various astounding feats, and key research articles have resulted in technical advances used by billions of people. The research in this sector is advancing at a breakneck pace and assisting you to keep up. Here is a collection of the most important scientific study papers in machine learning.

Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training

The authors of this work examined why ACGAN training becomes unstable as the number of classes in the dataset grows. The researchers revealed that the unstable training occurs due to a gradient explosion problem caused by the unboundedness of the input feature vectors and the classifier’s poor classification capabilities during the early training stage. The researchers presented the Data-to-Data Cross-Entropy loss (D2D-CE) and the Rebooted Auxiliary Classifier Generative Adversarial Network to alleviate the instability and reinforce ACGAN (ReACGAN). Additionally, extensive tests of ReACGAN demonstrate that it is resistant to hyperparameter selection and is compatible with a variety of architectures and differentiable augmentations.

This article is ranked #1 on CIFAR-10 for Conditional Image Generation.

For the research paper, read here .

For code, see here .

Dense Unsupervised Learning for Video Segmentation

The authors presented a straightforward and computationally fast unsupervised strategy for learning dense spacetime representations from unlabeled films in this study. The approach demonstrates rapid convergence of training and a high degree of data efficiency. Furthermore, the researchers obtain VOS accuracy superior to previous results despite employing a fraction of the previously necessary training data. The researchers acknowledge that the research findings may be utilised maliciously, such as for unlawful surveillance, and that they are excited to investigate how this skill might be used to better learn a broader spectrum of invariances by exploiting larger temporal windows in movies with complex (ego-)motion, which is more prone to disocclusions.

This study is ranked #1 on DAVIS 2017 for Unsupervised Video Object Segmentation (val).

Temporally-Consistent Surface Reconstruction using Metrically-Consistent Atlases

The authors offer an atlas-based technique for producing unsupervised temporally consistent surface reconstructions by requiring a point on the canonical shape representation to translate to metrically consistent 3D locations on the reconstructed surfaces. Finally, the researchers envisage a plethora of potential applications for the method. For example, by substituting an image-based loss for the Chamfer distance, one may apply the method to RGB video sequences, which the researchers feel will spur development in video-based 3D reconstruction.

This article is ranked #1 on ANIM in the category of Surface Reconstruction. 

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

The researchers propose a revolutionary interactive architecture called EdgeFlow that uses user interaction data without resorting to post-processing or iterative optimisation. The suggested technique achieves state-of-the-art performance on common benchmarks due to its coarse-to-fine network design. Additionally, the researchers create an effective interactive segmentation tool that enables the user to improve the segmentation result through flexible options incrementally.

This paper is ranked #1 on Interactive Segmentation on PASCAL VOC

Learning Transferable Visual Models From Natural Language Supervision

The authors of this work examined whether it is possible to transfer the success of task-agnostic web-scale pre-training in natural language processing to another domain. The findings indicate that adopting this formula resulted in the emergence of similar behaviours in the field of computer vision, and the authors examine the social ramifications of this line of research. CLIP models learn to accomplish a range of tasks during pre-training to optimise their training objective. Using natural language prompting, CLIP can then use this task learning to enable zero-shot transfer to many existing datasets. When applied at a large scale, this technique can compete with task-specific supervised models, while there is still much space for improvement.

This research is ranked #1 on Zero-Shot Transfer Image Classification on SUN

CoAtNet: Marrying Convolution and Attention for All Data Sizes

The researchers in this article conduct a thorough examination of the features of convolutions and transformers, resulting in a principled approach for combining them into a new family of models dubbed CoAtNet. Extensive experiments demonstrate that CoAtNet combines the advantages of ConvNets and Transformers, achieving state-of-the-art performance across a range of data sizes and compute budgets. Take note that this article is currently concentrating on ImageNet classification for model construction. However, the researchers believe their approach is relevant to a broader range of applications, such as object detection and semantic segmentation.

This paper is ranked #1 on Image Classification on ImageNet (using extra training data).

SwinIR: Image Restoration Using Swin Transformer

The authors of this article suggest the SwinIR image restoration model, which is based on the Swin Transformer . The model comprises three modules: shallow feature extraction, deep feature extraction, and human-recognition reconstruction. For deep feature extraction, the researchers employ a stack of residual Swin Transformer blocks (RSTB), each formed of Swin Transformer layers, a convolution layer, and a residual connection.

This research article is ranked #1 on Image Super-Resolution on Manga109 – 4x upscaling.

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Ways to incorporate historical data are still unclear: initialising reward estimates with historical samples can suffer from bogus and imbalanced data coverage, leading to computational and storage issues—particularly in continuous action spaces. The paper addresses the obstacles by proposing ‘Artificial Replay’, an algorithm to incorporate historical data into any arbitrary base bandit algorithm. 

Read the full paper here . 

Bootstrapped Meta-Learning

Author(s) – Sean R. Sinclair et al.

The paper proposes an algorithm in which the meta-learner teaches itself to overcome the meta-optimisation challenge. The algorithm focuses on meta-learning with gradients, which guarantees performance improvements. Furthermore, the paper also looks at how bootstrapping opens up possibilities. 

Read the full paper here .

LaMDA: Language Models for Dialog Applications

Author(s) – Sebastian Flennerhag et al.

The research describes the LaMDA system which caused chaos in AI this summer when a former Google engineer claimed that it had shown signs of sentience. LaMDA is a family of large language models for dialogue applications based on Transformer architecture. The interesting feature of the model is its fine-tuning with human-annotated data and the possibility of consulting external sources. This is a very interesting model family, which we might encounter in many applications we use daily. 

Competition-Level Code Generation with AlphaCode

Author(s) – Yujia Li et al.

Systems can help programmers become more productive. The following research addresses the problems with incorporating innovations in AI into these systems. AlphaCode is a system that creates solutions for problems that require deeper reasoning. 

Privacy for Free: How does Dataset Condensation Help Privacy?

Author(s) – Tian Dong et al.

The paper focuses on Privacy Preserving Machine Learning, specifically deducting the leakage of sensitive data in machine learning. It puts forth one of the first propositions of using dataset condensation techniques to preserve the data efficiency during model training and furnish membership privacy.

Why do tree-based models still outperform deep learning on tabular data?

Author(s) – Léo Grinsztajn, Edouard Oyallon and Gaël Varoquaux

The research answers why deep learning models still find it hard to compete on tabular data compared to tree-based models. It is shown that MLP-like architectures are more sensitive to uninformative features in data compared to their tree-based counterparts. 

Multi-Objective Bayesian Optimisation over High-Dimensional Search Spaces 

Author(s) – Samuel Daulton et al.

The paper proposes ‘MORBO’, a scalable method for multiple-objective BO as it performs better than that of high-dimensional search spaces. MORBO significantly improves the sample efficiency and, where existing BO algorithms fail, MORBO provides improved sample efficiencies over the current approach. 

A Path Towards Autonomous Machine Intelligence Version 0.9.2

Author(s) – Yann LeCun

The research offers a vision about how to progress towards general AI. The study combines several concepts: a configurable predictive world model, behaviour driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised

learning. 

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Author(s) –  Shreshth Tuli, Giuliano Casale and Nicholas R. Jennings

This is a specialised paper applying transformer architecture to the problem of unsupervised anomaly detection in multivariate time series. Many architectures which were successful in other fields are, at some point, also being applied to time series. The research shows improved performance on some known data sets. 

Differentially Private Bias-Term only Fine-tuning of Foundation Models

Author(s) – Zhiqi Bu et al. 

In the paper, researchers study the problem of differentially private (DP) fine-tuning of large pre-trained models—a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraints yet requires significant computational overhead or modifications to the network architecture.

ALBERT: A Lite BERT

Usually, increasing model size when pretraining natural language representations often result in improved performance on downstream tasks, but the training times become longer. To address these problems, the authors in their work presented two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. The authors also used a self-supervised loss that focuses on modelling inter-sentence coherence and consistently helped downstream tasks with multi-sentence inputs. According to results, this model established new state-of-the-art results on the GLUE, RACE, and squad benchmarks while having fewer parameters compared to BERT-large. 

Check the paper here .

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Microsoft Research, along with the University of Washington and the University of California, in this paper, introduced a model-agnostic and task agnostic methodology for testing NLP models known as CheckList. This is also the winner of the best paper award at the ACL conference this year. It included a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. 

Linformer is a Transformer architecture for tackling the self-attention bottleneck in Transformers. It reduces self-attention to an O(n) operation in both space- and time complexity. It is a new self-attention mechanism which allows the researchers to compute the contextual mapping in linear time and memory complexity with respect to the sequence length. 

Read more about the paper here .

Plug and Play Language Models

Plug and Play Language Models ( PPLM ) are a combination of pre-trained language models with one or more simple attribute classifiers. This, in turn, assists in text generation without any further training. According to the authors, model samples demonstrated control over sentiment styles, and extensive automated and human-annotated evaluations showed attribute alignment and fluency. 

Reformer 

The researchers at Google, in this paper , introduced Reformer. This work showcased that the architecture of a Transformer can be executed efficiently on long sequences and with small memory. The authors believe that the ability to handle long sequences opens the way for the use of the Reformer on many generative tasks. In addition to generating very long coherent text, the Reformer can bring the power of Transformer models to other domains like time-series forecasting, music, image and video generation. 

An Image is Worth 16X16 Words

The irony here is that one of the popular language models, Transformers have been made to do computer vision tasks. In this paper , the authors claimed that the vision transformer could go toe-to-toe with the state-of-the-art models on image recognition benchmarks, reaching accuracies as high as 88.36% on ImageNet and 94.55% on CIFAR-100. For this, the vision transformer receives input as a one-dimensional sequence of token embeddings. The image is then reshaped into a sequence of flattened 2D patches. The transformers in this work use constant widths through all of its layers.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects

Winner of the CVPR best paper award, in this work, the authors proposed a method to learn 3D deformable object categories from raw single-view images, without external supervision. This method uses an autoencoder that factored each input image into depth, albedo, viewpoint and illumination. The authors showcased that reasoning about illumination can be used to exploit the underlying object symmetry even if the appearance is not symmetric due to shading.

Generative Pretraining from Pixels

In this paper, OpenAI researchers examined whether similar models can learn useful representations for images. For this, the researchers trained a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, the researchers found that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, it achieved 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning and matching the top supervised pre-trained models. An even larger model, trained on a mixture of ImageNet and web images, is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of their features.

Deep Reinforcement Learning and its Neuroscientific Implications

In this paper, the authors provided a high-level introduction to deep RL , discussed some of its initial applications to neuroscience, and surveyed its wider implications for research on brain and behaviour and concluded with a list of opportunities for next-stage research. Although DeepRL seems to be promising, the authors wrote that it is still a work in progress and its implications in neuroscience should be looked at as a great opportunity. For instance, deep RL provides an agent-based framework for studying the way that reward shapes representation, and how representation, in turn, shapes learning and decision making — two issues which together span a large swath of what is most central to neuroscience. 

Dopamine-based Reinforcement Learning

Why humans doing certain things are often linked to dopamine , a hormone that acts as the reward system (think: the likes on your Instagram page). So, keeping this fact in hindsight, DeepMind with the help of Harvard labs, analysed dopamine cells in mice and recorded how the mice received rewards while they learned a task. They then checked these recordings for consistency in the activity of the dopamine neurons with standard temporal difference algorithms. This paper proposed an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. The authors hypothesised that the brain represents possible future rewards not as a single mean but as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. 

Lottery Tickets In Reinforcement Learning & NLP

In this paper, the authors bridged natural language processing (NLP) and reinforcement learning (RL). They examined both recurrent LSTM models and large-scale Transformer models for NLP and discrete-action space tasks for RL. The results suggested that the lottery ticket hypothesis is not restricted to supervised learning of natural images, but rather represents a broader phenomenon in deep neural networks.

What Can Learned Intrinsic Rewards Capture

In this paper, the authors explored if the reward function itself can be a good locus of learned knowledge. They proposed a scalable framework for learning useful intrinsic reward functions across multiple lifetimes of experience and showed that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. 

AutoML- Zero

The progress of AutoML has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks, or similarly restrictive search spaces. In this paper , the authors showed that AutoML could go further with AutoML Zero, that automatically discovers complete machine learning algorithms just using basic mathematical operations as building blocks. The researchers demonstrated this by introducing a novel framework that significantly reduced human bias through a generic search space.

Rethinking Batch Normalization for Meta-Learning

Batch normalization is an essential component of meta-learning pipelines. However, there are several challenges. So, in this paper, the authors evaluated a range of approaches to batch normalization for meta-learning scenarios and developed a novel approach — TaskNorm. Experiments demonstrated that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient-based and gradient-free meta-learning approaches. The TaskNorm has been found to be consistently improving the performance.

Meta-Learning without Memorisation

Meta-learning algorithms need meta-training tasks to be mutually exclusive, such that no single model can solve all of the tasks at once. In this paper, the authors designed a meta-regularisation objective using information theory that successfully uses data from non-mutually-exclusive tasks to efficiently adapt to novel tasks.

Understanding the Effectiveness of MAML

Model Agnostic Meta-Learning (MAML) consists of optimisation loops, from which the inner loop can efficiently learn new tasks. In this paper, the authors demonstrated that feature reuse is the dominant factor and led to ANIL (Almost No Inner Loop) algorithm — a simplification of MAML where the inner loop is removed for all but the (task-specific) head of the underlying neural network. 

Your Classifier is Secretly an Energy-Based Model

This paper proposed attempts to reinterpret a standard discriminative classifier as an energy-based model. In this setting, wrote the authors, the standard class probabilities can be easily computed. They demonstrated that energy-based training of the joint distribution improves calibration, robustness, handout-of-distribution detection while also enabling the proposed model to generate samples rivalling the quality of recent GAN approaches. This work improves upon the recently proposed techniques for scaling up the training of energy-based models. It has also been the first to achieve performance rivalling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Reverse-Engineering Deep ReLU Networks

This paper investigated the commonly assumed notion that neural networks cannot be recovered from its outputs, as they depend on its parameters in a highly nonlinear way. The authors claimed that by observing only its output, one could identify the architecture, weights, and biases of an unknown deep ReLU network. By dissecting the set of region boundaries into components associated with particular neurons, the researchers showed that it is possible to recover the weights of neurons and their arrangement within the network.

Cricket Analytics and Predictor

Authors: Suyash Mahajan,  Salma Shaikh, Jash Vora, Gunjan Kandhari,  Rutuja Pawar,

Abstract:   The paper embark on predicting the outcomes of Indian Premier League (IPL) cricket match using a supervised learning approach from a team composition perspective. The study suggests that the relative team strength between the competing teams forms a distinctive feature for predicting the winner. Modeling the team strength boils down to modeling individual player‘s batting and bowling performances, forming the basis of our approach.

Research Methodology: In this paper, two methodologies have been used. MySQL database is used for storing data whereas Java for the GUI. The algorithm used is Clustering Algorithm for prediction. The steps followed are as

  • Begin with a decision on the value of k being the number of clusters.
  • Put any initial partition that classifies the data into k clusters.
  • Take every sample in the sequence; compute its distance from centroid of each of the clusters. If sample is not in the cluster with the closest centroid currently, switch this sample to that cluster and update the centroid of the cluster accepting the new sample and the cluster losing the sample.

For the research paper, read here

2.Real Time Sleep / Drowsiness Detection – Project Report

Author : Roshan Tavhare

Institute : University of Mumbai

Abstract : The main idea behind this project is to develop a nonintrusive system which can detect fatigue of any human and can issue a timely warning. Drivers who do not take regular breaks when driving long distances run a high risk of becoming drowsy a state which they often fail to recognize early enough.

Research Methodology : A training set of labeled facial landmarks on an image. These images are manually labeled, specifying specific (x, y) -coordinates of regions surrounding each facial structure.

  • Priors, more specifically, the probability on distance between pairs of input pixels. The pre-trained facial landmark detector inside the dlib library is used to estimate the location of 68 (x, y)-coordinates that map to facial structures on the face.

A Study of Various Text Augmentation Techniques for Relation Classification in Free Text

Authors: Chinmaya Mishra Praveen Kumar and Reddy Kumar Moda,  Syed Saqib Bukhari and Andreas Dengel

Institute: German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Abstract: In this paper, the researchers explore various text data augmentation techniques in text space and word embedding space. They studied the effect of various augmented datasets on the efficiency of different deep learning models for relation classification in text.

Research Methodology: The researchers implemented five text data augmentation techniques (Similar word, synonyms, interpolation, extrapolation and random noise method)  and explored the ways in which we could preserve the grammatical and the contextual structures of the sentences while generating new sentences automatically using data augmentation techniques.

Smart Health Monitoring and Management Using Internet of Things, Artificial Intelligence with Cloud Based Processing

Author : Prateek Kaushik

Institute : G D Goenka University, Gurugram

Abstract : This research paper described a personalised smart health monitoring device using wireless sensors and the latest technology.

Research Methodology: Machine learning and Deep Learning techniques are discussed which works as a catalyst to improve the  performance of any health monitor system such supervised machine learning algorithms, unsupervised machine learning algorithms, auto-encoder, convolutional neural network and restricted boltzmann machine .

Internet of Things with BIG DATA Analytics -A Survey

Author : A.Pavithra,  C.Anandhakumar and V.Nithin Meenashisundharam

Institute : Sree Saraswathi Thyagaraja College,

Abstract : This article we discuss about Big data on IoT and how it is interrelated to each other along with the necessity of implementing Big data with IoT and its benefits, job market

Research Methodology : Machine learning, Deep Learning, and Artificial Intelligence are key technologies that are used to provide value-added applications along with IoT and big data in addition to being used in a stand-alone mod.

Single Headed Attention RNN: Stop Thinking With Your Head 

Author: Stephen Merity

In this work of art, the Harvard grad author, Stephen “Smerity” Merity, investigated the current state of NLP, the models being used and other alternate approaches. In this process, he tears down the conventional methods from top to bottom, including etymology.

The author also voices the need for a Moore’s Law for machine learning that encourages a minicomputer future while also announcing his plans on rebuilding the codebase from the ground up both as an educational tool for others and as a strong platform for future work in academia and industry.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Authors: Mingxing Tan and Quoc V. Le 

In this work, the authors propose a compound scaling method that tells when to increase or decrease depth, height and resolution of a certain network.

Convolutional Neural Networks(CNNs) are at the heart of many machine vision applications. 

EfficientNets are believed to superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).

Deep Double Descent By OpenAI

Authors: Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

In this paper , an attempt has been made to reconcile classical understanding and modern practice within a unified performance curve. 

The “double descent” curve overtakes the classic U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. 

The Lottery Ticket Hypothesis

Authors: Jonathan Frankle, Michael Carbin

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. 

The authors find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, they introduce the “lottery ticket hypothesis:”

On The Measure Of Intelligence 

Authors: Francois Chollet

This work summarizes and critically assesses the definitions of intelligence and evaluation approaches, while making apparent the historical conceptions of intelligence that have implicitly guided them.

The author, also the creator of keras, introduces a formal definition of intelligence based on Algorithmic Information Theory and using this definition, he also proposes a set of guidelines for what a general AI benchmark should look like. 

Zero-Shot Word Sense Disambiguation Using Sense Definition Embeddings via IISc Bangalore & CMU

Authors: Sawan Kumar, Sharmistha Jat, Karan Saxena and Partha Talukdar

Word Sense Disambiguation (WSD) is a longstanding  but open problem in Natural Language Processing (NLP).  Current supervised WSD methods treat senses as discrete labels  and also resort to predicting the Most-Frequent-Sense (MFS) for words unseen  during training.

The researchers from IISc Bangalore in collaboration with Carnegie Mellon University propose  Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD  by predicting over a continuous sense embedding space as opposed to a discrete label space.

Deep Equilibrium Models 

Authors: Shaojie Bai, J. Zico Kolter and Vladlen Koltun 

Motivated by the observation that the hidden layers of many existing deep sequence models converge towards some fixed point, the researchers at Carnegie Mellon University present a new approach to modeling sequential data through deep equilibrium model (DEQ) models. 

Using this approach, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network.

IMAGENET-Trained CNNs are Biased Towards Texture

Authors: Robert G, Patricia R, Claudio M, Matthias Bethge, Felix A. W and Wieland B

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. The authors in this paper , evaluate CNNs and human observers on images with a texture-shape cue conflict. They show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence.

A Geometric Perspective on Optimal Representations for Reinforcement Learning 

Authors: Marc G. B , Will D , Robert D , Adrien A T , Pablo S C , Nicolas Le R , Dale S, Tor L, Clare L

The authors propose a new perspective on representation learning in reinforcement learning

based on geometric properties of the space of value functions. This work shows that adversarial value functions exhibit interesting structure, and are good auxiliary tasks when learning a representation of an environment. The authors believe this work to open up the possibility of automatically generating auxiliary tasks in deep reinforcement learning.

Weight Agnostic Neural Networks 

Authors: Adam Gaier & David Ha

In this work , the authors explore whether neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. In this paper, they propose a search method for neural network architectures that can already perform a task without any explicit weight training. 

Stand-Alone Self-Attention in Vision Models 

Authors: Prajit Ramachandran, Niki P, Ashish Vaswani,Irwan Bello Anselm Levskaya, Jonathon S

In this work, the Google researchers verified that content-based interactions can serve the vision models . The proposed stand-alone local self-attention layer achieves competitive predictive performance on ImageNet classification and COCO object detection tasks while requiring fewer parameters and floating-point operations than the corresponding convolution baselines. Results show that attention is especially effective in the later parts of the network. 

High-Fidelity Image Generation With Fewer Labels 

Authors: Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Z, Olivier B and Sylvain Gelly 

Modern-day models can produce high quality, close to reality when fed with a vast quantity of labelled data. To solve this large data dependency, researchers from Google released this work , to demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.

The proposed approach is able to match the sample quality of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin G, Piyush Sharma and Radu S

The authors present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT and to address the challenges posed by increasing model size and GPU/TPU memory limitations, longer training times, and unexpected model degradation

As a result, this proposed model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

GauGANs-Semantic Image Synthesis with Spatially-Adaptive Normalization 

Author: Taesung Park, Ming-Yu Liu, Ting-Chun Wang and Jun-Yan Zhu

Nvidia in collaboration with UC Berkeley and MIT proposed a model which has a spatially-adaptive normalization layer for synthesizing photorealistic images given an input semantic layout.

This model retained visual fidelity and alignment with challenging input layouts while allowing the user to control both semantic and style.

📣 Want to advertise in AIM? Book here

machine learning research paper

Subscribe to The Belamy: Our Weekly Newsletter

Biggest ai stories, delivered to your inbox every week..

discord icon

Discover how Cypher 2024 expands to the USA, bridging AI innovation gaps and tackling the challenges of enterprise AI adoption

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2024

  • Terms of use
  • Privacy Policy

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.

Subscribe to Our Youtube channel

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Machine Learning: Algorithms, Real-World Applications and Research Directions

Iqbal h. sarker.

1 Swinburne University of Technology, Melbourne, VIC 3122 Australia

2 Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349 Chattogram, Bangladesh

In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated  applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Introduction

We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.

Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. ​ Fig.1, 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of 0 ( m i n i m u m ) to 100 ( m a x i m u m ) has been shown in y - axis . According to Fig. ​ Fig.1, 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig1_HTML.jpg

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score

In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.

Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.

The key contributions of this paper are listed as follows:

  • To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.
  • To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
  • To discuss the applicability of machine learning-based solutions in various real-world application domains.
  • To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.

The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.

Types of Real-World Data and Machine Learning Techniques

Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.

Types of Real-World Data

Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.

  • Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.
  • Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.
  • Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.
  • Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.

In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.

Types of Machine Learning Techniques

Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. ​ Fig.2. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig2_HTML.jpg

Various types of machine learning techniques

  • Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.
  • Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.
  • Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.
  • Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.

Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table ​ Table1, 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

Various types of machine learning techniques with examples

Learning typeModel buildingExamples
SupervisedAlgorithms or models learn from labeled data (task-driven approach)Classification, regression
UnsupervisedAlgorithms or models learn from unlabeled data (Data-Driven Approach)Clustering, associations, dimensionality reduction
Semi-supervisedModels are built using combined data (labeled + unlabeled)Classification, clustering
ReinforcementModels are based on reward or penalty (environment-driven approach)Classification, control

Machine Learning Tasks and Algorithms

In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. ​ Fig.3, 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig3_HTML.jpg

A general structure of a machine learning based predictive model considering both the training and testing phase

Classification Analysis

Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.

  • Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.
  • Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.
  • Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].

Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.

  • Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].
  • Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.
  • Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification. g ( z ) = 1 1 + exp ( - z ) . 1
  • K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.
  • Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig4_HTML.jpg

An example of a decision tree structure

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig5_HTML.jpg

An example of a random forest structure considering multiple decision trees

  • Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.
  • Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.
  • Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, α is the learning rate, and J i is the training example cost of i th , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the j th iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations. w j : = w j - α ∂ J i ∂ w j . 4
  • Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.

Regression Analysis

Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure ​ Figure6 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.

  • Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations: y = a + b x + e 5 y = a + b 1 x 1 + b 2 x 2 + ⋯ + b n x n + e , 6 where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .
  • Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of n th in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below: y = b 0 + b 1 x + b 2 x 2 + b 3 x 3 + ⋯ + b n x n + e . 7 Here, y is the predicted/target output, b 0 , b 1 , . . . b n are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is n th degree of polynomial then we use polynomial regression to get desired output.
  • LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig6_HTML.jpg

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables

Cluster Analysis

Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.

  • Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.
  • Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig7_HTML.jpg

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique

  • Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.
  • Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.
  • Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.

Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

  • K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.
  • Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.
  • DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.
  • GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.
  • Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.

Dimensionality Reduction and Feature Learning

In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.

  • Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.
  • Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].

Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

  • Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.
  • Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is [ - 1 , 1 ] , where - 1 means perfect negative correlation, + 1 means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ] r ( X , Y ) = ∑ i = 1 n ( X i - X ¯ ) ( Y i - Y ¯ ) ∑ i = 1 n ( X i - X ¯ ) 2 ∑ i = 1 n ( Y i - Y ¯ ) 2 . 8
  • ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.
  • Chi square: The chi-square χ 2 [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on χ 2 . The chi-square χ 2 is commonly used for testing relationships between categorical variables. If O i represents observed value and E i represents expected value, then χ 2 = ∑ i = 1 n ( O i - E i ) 2 E i . 9
  • Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.
  • Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig8_HTML.jpg

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space

Association Rule Learning

Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].

In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.

  • AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.
  • Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.
  • ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.
  • FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].
  • ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.

Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.

Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.

RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.

  • Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.
  • Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.
  • Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.

Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.

Artificial Neural Network and Deep Learning

Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure ​ Figure9 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig9_HTML.jpg

Machine learning and deep learning performance in general with the amount of data

The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig10_HTML.jpg

A structure of an artificial neural network modeling with multiple processing layers

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_592_Fig11_HTML.jpg

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

  • LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.

In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].

Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.

Applications of Machine Learning

In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.

  • Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.
  • Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.
  • Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.
  • Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO 2 pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.
  • Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.
  • E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.
  • NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.
  • Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.
  • Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.
  • User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.

In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.

Challenges and Research Directions

Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.

In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.

To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.

Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.

In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.

Declaration

The author declares no conflict of interest.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Introduction
  • Conclusions
  • Article Information

AI indicates artificial intelligence; AIPA, artificial intelligence predictive algorithm; CE, Conformité Européene; FDA, US Food and Drug Administration.

CVD indicates cardiovascular disease; UTI, urinary tract infection.

Light blue indicates the public availability of evidence. Orange indicates that the available evidence partially covered the requirement. Dark blue indicates no evidence was publicly available. AI indicates artificial intelligence.

eAppendix 1. Search Strategies Used Up to July 7, 2023

eAppendix 2. Selection Process

eAppendix 3. Online Questionnaire for Information From Authors and Commercial Product Owners

eTable 1. The Evidence Requirements Established per Life Cycle Phase as Described in the Dutch AIPA Guideline

eFigure. Flowchart of Literature Inclusion for Assessment of the Six Phases

eTable 2. Overview of Publication Characteristics per Predictive ML Algorithm

eTable 3. Overview of the Availability of Evidence per Predictive ML Algorithm

Data Sharing Statement

  • The Need for Continuous Evaluation of AI Prediction Algorithms JAMA Network Open Invited Commentary September 12, 2024 Nigam H. Shah, MBBS, PhD; Michael A. Pfeffer, MD; Marzyeh Ghassemi, PhD

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health

Primary Care

  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Rakers MM , van Buchem MM , Kucenko S, et al. Availability of Evidence for Predictive Machine Learning Algorithms in Primary Care : A Systematic Review . JAMA Netw Open. 2024;7(9):e2432990. doi:10.1001/jamanetworkopen.2024.32990

Manage citations:

© 2024

  • Permissions

Availability of Evidence for Predictive Machine Learning Algorithms in Primary Care : A Systematic Review

  • 1 Department of Public Health and Primary Care, Leiden University Medical Centre, ZA Leiden, the Netherlands
  • 2 National eHealth Living Lab, Leiden University Medical Centre, ZA Leiden, the Netherlands
  • 3 Department of Information Technology and Digital Innovation, Leiden University Medical Center, ZA Leiden, the Netherlands
  • 4 Hamburg University of Applied Sciences, Department of Health Sciences, Ulmenliet 20, Hamburg, Germany
  • 5 Department of Digital Health, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
  • 6 Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
  • Invited Commentary The Need for Continuous Evaluation of AI Prediction Algorithms Nigam H. Shah, MBBS, PhD; Michael A. Pfeffer, MD; Marzyeh Ghassemi, PhD JAMA Network Open

Question   Which machine learning (ML) predictive algorithms have been implemented in primary care, and what evidence is publicly available for supporting their quality?

Findings   In this systematic review of 43 predictive ML algorithms in primary care from scientific literature and the registration databases of the US Food and Drug Administration and Conformité Européene, there was limited publicly available evidence across all artificial intelligence life cycle phases from development to implementation. While the development phase (phase 2) was most frequently reported, most predictive ML algorithms did not meet half of the predefined requirements of the Dutch artificial intelligence predictive algorithm guideline.

Meaning   Findings of this study underscore the urgent need to facilitate transparent and consistent reporting of the quality criteria in literature, which could build trust among end users and facilitate large-scale implementation.

Importance   The aging and multimorbid population and health personnel shortages pose a substantial burden on primary health care. While predictive machine learning (ML) algorithms have the potential to address these challenges, concerns include transparency and insufficient reporting of model validation and effectiveness of the implementation in the clinical workflow.

Objectives   To systematically identify predictive ML algorithms implemented in primary care from peer-reviewed literature and US Food and Drug Administration (FDA) and Conformité Européene (CE) registration databases and to ascertain the public availability of evidence, including peer-reviewed literature, gray literature, and technical reports across the artificial intelligence (AI) life cycle.

Evidence Review   PubMed, Embase, Web of Science, Cochrane Library, Emcare, Academic Search Premier, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI.org (Association for the Advancement of Artificial Intelligence), arXiv, Epistemonikos, PsycINFO, and Google Scholar were searched for studies published between January 2000 and July 2023, with search terms that were related to AI, primary care, and implementation. The search extended to CE-marked or FDA-approved predictive ML algorithms obtained from relevant registration databases. Three reviewers gathered subsequent evidence involving strategies such as product searches, exploration of references, manufacturer website visits, and direct inquiries to authors and product owners. The extent to which the evidence for each predictive ML algorithm aligned with the Dutch AI predictive algorithm (AIPA) guideline requirements was assessed per AI life cycle phase, producing evidence availability scores.

Findings   The systematic search identified 43 predictive ML algorithms, of which 25 were commercially available and CE-marked or FDA-approved. The predictive ML algorithms spanned multiple clinical domains, but most (27 [63%]) focused on cardiovascular diseases and diabetes. Most (35 [81%]) were published within the past 5 years. The availability of evidence varied across different phases of the predictive ML algorithm life cycle, with evidence being reported the least for phase 1 (preparation) and phase 5 (impact assessment) (19% and 30%, respectively). Twelve (28%) predictive ML algorithms achieved approximately half of their maximum individual evidence availability score. Overall, predictive ML algorithms from peer-reviewed literature showed higher evidence availability compared with those from FDA-approved or CE-marked databases (45% vs 29%).

Conclusions and Relevance   The findings indicate an urgent need to improve the availability of evidence regarding the predictive ML algorithms’ quality criteria. Adopting the Dutch AIPA guideline could facilitate transparent and consistent reporting of the quality criteria that could foster trust among end users and facilitating large-scale implementation.

In most high-income countries, primary health care is affected by the increasing burden of illness experienced by aging and multimorbid populations along with personnel shortages. 1 Primary care generates large amounts of routinely collected coded and free-text clinical data, which can be used by flexible and powerful machine learning (ML) techniques to facilitate early diagnosis, enhance treatment, and prevent adverse effects and outcomes. 2 - 5 Therefore, primary care is a highly interesting domain for implementing predictive ML algorithms in daily clinical practice. 6 - 8

Nevertheless, scientific literature describes the implementation of artificial intelligence (AI), especially predictive ML algorithms, as limited and far behind other sectors in data-driven technology. Predictive ML algorithms in health care often face criticism regarding the lack of comprehensibility and transparency for health care professionals and patients as well as lack of explainability and interpretability. 8 - 11 Additionally, the reporting of peer-reviewed evidence is limited, and the utility of predictive ML algorithms in clinical workflows is often unclear. 8 , 12 - 14 In response to these challenges, the Dutch Ministry of Health, Welfare, and Sports commissioned the development and validation of a Dutch guideline for high-quality diagnostic and prognostic applications of AI in health care. Published in 2022, the Dutch Artificial Intelligence Predictive Algorithm (AIPA) guideline is applicable to predictive ML algorithms. 15 , 16 The guideline encourages the collection of data and evidence consistent with the 6 phases and criteria outlined in the AI life cycle (requirements), providing a comprehensive overview of existing research guideline aspects across the AI life cycle.

In this systematic review, we aimed to (1) systematically identify predictive ML algorithms implemented in primary care from peer-reviewed literature and US Food and Drug Administration (FDA) and Conformité Européene (CE) registration databases and (2) ascertain the public availability of evidence, including peer-reviewed literature, gray literature, and technical reports, across the AI life cycles. For this purpose, the Dutch AIPA guideline was adapted into a practical evaluation tool to assess the quality criteria of each predictive ML algorithm.

We conducted the systematic review in 2 steps. First, we systematically identified predictive ML algorithms by searching peer-reviewed literature and FDA and CE registration databases. Second, we ascertained the availability of evidence for the identified algorithms across the AI life cycle by systematically searching literature databases and technical reports, examining references in relevant studies, conducting product searches, visiting manufacturer websites, and contacting authors and product owners. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses ( PRISMA ) reporting guideline. 17

Peer-reviewed studies were included if they met all of the following eligibility criteria: (1) published between January 2000 and July 2023; (2) written in English; (3) published as original results; (4) concerned a predictive ML algorithm intended for primary care; and (5) focused on the implementation of the predictive ML algorithm in a research setting or clinical practice with, for example, pilot, feasibility, implementation, or clinical validation study designs. This review examined ML techniques (eg, [deep] neural networks, support vector machines, and random forests) developed for tasks such as computer vision and natural language processing that generated the prediction of health outcomes in individuals. We classified a study as applying ML if it used a nonregression statistical technique to develop or validate a prediction model, similar to Andaur Navarro et al, 9 excluding traditional statistical approaches, such as expert systems and decision trees based on expert knowledge. Studies that addressed predictive ML algorithm development or external validation without implementation in primary care were excluded. Given that over 60% of CE-marked AI tools are not found in electronic research databases, 12 this review included CE-marked or FDA-approved predictive ML algorithms published in FDA and CE registration databases. 18 - 20

Searches were conducted using the following electronic databases: PubMed, Embase, Web of Science, Cochrane Library, Emcare, Academic Search Premier, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI.org (Association for the Advancement of Artificial Intelligence), arXiv, Epistemonikos, PsycINFO, and Google Scholar. All databases were searched in July 2023 for entries from January 2000 to July 2023. The search terms, derived from the National Library of Medicine MeSH (Medical Subject Headings) Tree Structures and the review team’s expertise, formed a combination related to AI, primary care, 21 and implementation 22 (defined in Box 1 ). The full search strategy is provided in eAppendix 1 in Supplement 1 .

Definitions

Predictive algorithm.

“An algorithm leading to a prediction of a health outcome in individuals. This includes, but is not limited to, predicting the probability or classification of having (diagnostic or screening predictive algorithm) or developing over time (prognostic or prevention predictive algorithm) desirable or undesirable health outcomes.” 16

Implementation

“An intentional effort designed to change, adapt, or uptake interventions into routines, including pilot and feasibility studies.” 22 Implemented predictive ML algorithms can be placed in phases 5 and 6 of the AI life cycle, as defined by the Dutch AIPA guideline. 16

“Universal access to essential healthcare in communities, facilitated by practical, scientifically sound, socially acceptable methods and technology, sustainably affordable at all developmental stages, fostering self-reliance and self-determination.” 21

Abbreviations: AI, artificial intelligence; AIPA, Artificial Intelligence Predictive Algorithm; ML, machine learning.

Three of us (M.M.R., M.M.vB., and S.K.) conducted an independent review of the selection process, resolving disagreements among us through discussion with a senior reviewer (H.J.A.vO.). The full selection process is detailed in eAppendix 2 in Supplement 1 .

Five strategies were used to gather publicly available evidence for all identified predictive ML algorithms: (1) searches of PubMed and Google Scholar using product, company, and author names; (2) searches of technical reports from the FDA and CE registration online databases; (3) exploration of references within selected studies; (4) visits to predictive ML algorithm manufacturer websites; and (5) solicitation of information from authors and product owners via email or telephone, with a request to complete an online questionnaire about the reported evidence (eAppendix 3 in Supplement 1 ). Accepted data sources included original, peer-reviewed articles in English as well as posters, conference papers, and data management plans (DMPs).

The availability of evidence was categorized according to the life cycle phases ( Box 2 ) and the requirements per phase set forth by the Dutch AIPA guideline. 15 , 16 These requirements ( Table 1 ) are defined as aspects necessary to address during the AI predictive algorithm life cycle. Therefore, developers, researchers, or owners of predictive ML algorithms should ideally provide data and evidence regarding these aspects.

Summary of the 6 Life Cycle Phases a 

Phase 1: preparation and verification of the data.

A DMP should be used to prepare for the collection and management of the necessary data for phases 2 to 5. In this plan, agreements and established procedures for collecting, processing, and storing data and managing this data are captured. During the implementation of the DMP, any changes should be continuously updated.

Phase 2: Development of the AI Model

The development of the AI model, which results from the analysis of the training data, entails the development of the algorithm and the set of algorithm-specific data structures.

Phase 3: Validation of the AI Model

The AI model undergoes external validation, which involves evaluating its performance using data not used in phase 2. The validation process assesses the statistical or predictive value of the model and examines issues related to fairness and algorithmic bias.

Phase 4: Development of the Necessary Software Tool

The focus shifts to developing the necessary software tool around the AI model. This phase encompasses designing, developing, conducting user testing, and defining the system requirements for the software.

Phase 5: Impact Assessment of the AI Model in Combination With the Software

This phase determines the impact or added value of integrating the AI model and software within the intended medical practice or context. It evaluates how these advancements affect medical actions and the health outcomes of the target group, such as patients, clients, or citizens. Additionally, conducting a health technology assessment is part of this phase.

Phase 6: Implementation and Use of the AI Model With Software in Daily Practice

The AI model and software are implemented, monitored, and incorporated into daily practice. Efforts are made to ensure smooth integration, continuous monitoring, and appropriate education and training related to their use.

Abbreviations: AI, artificial intelligence; DMP, data management plan.

a As established in the Dutch AIPA guideline. 15 , 16

The extent to which the evidence of each predictive ML algorithm aligned with the requirements of the Dutch AIPA guideline was assessed per life cycle phase ( Table 1 ; Box 2 ; eTable 1 in Supplement 1 ), using availability scores (2 for complete, 1 for partial, and 0 for none). Two analyses were conducted. First, availability of evidence per requirement was represented as a percentage, considering the requirements per life cycle phase. The availability of evidence per life cycle phase was reported as a percentage and calculated by dividing the sum of scores of a specific life cycle phase by the maximum possible score. Second, evidence availability per predictive ML algorithm was calculated as the sum of values for all requirements divided by the total applicable requirements, excluding the requirements that were not applicable because of the life cycle phase of the algorithm (eTable 1 in Supplement 1 ; requirements are shaded in orange); the denominator value was 48. These availability scores aimed to provide an overview of implemented predictive ML algorithms and evidence per life cycle phase.

The analysis was conducted independently by 3 of us (M.M.R., M.M.vB., and S.K.), who resolved discrepancies through discussion with another author (H.J.A.vO.). Data were analyzed using Microsoft Excel for Windows 11 (Microsoft Corp).

Of the 5994 studies identified initially, 20 (comprising 19 predictive ML algorithms) met the inclusion criteria and were included in this systematic review. One algorithm was excluded after personal communication confirmed that the tool used was a rule-based expert system. 23 Additionally, 25 commercially available CE-marked or FDA-approved predictive ML algorithms in primary care were included. 18 Only 2 predictive ML algorithms were found in the FDA or CE registration databases and the literature databases searched. 24 - 27 Forty-three AIPAs were included in the analysis ( Figure 1 ). 24 - 65

Table 2 provides an overview of the key characteristics of the 43 predictive ML algorithms included in this review. 24 - 67 Most studies (35 [81%]) were published in the past 5 years (2018-2023). 24 , 26 - 36 , 40 - 46 , 50 - 55 , 57 - 62 , 64 - 66 , 68 Most predictive ML algorithms (36 [84%]) fit under the category of clinical decision support systems for either diagnosis or treatment indication ( Figure 2 ). 24 - 29 , 31 - 35 , 37 - 42 , 44 , 46 , 47 , 49 , 51 - 53 , 55 - 61 , 63 - 67 Twenty-seven predictive ML algorithms (63%) focused on cardiovascular diseases and diabetes ( Figure 2 ). 24 - 28 , 32 - 34 , 37 - 47 , 50 , 57 , 60 , 62 , 64 , 66 , 68 Furthermore, 9 AIPAs (21%) mentioned the use of AI in their product descriptions but did not offer any specific details about the AI technique applied to develop the model. 37 - 39 , 42 , 45 , 47 , 48 , 50 , 54 These 9 predictive ML algorithms were FDA approved or CE marked. Twelve of 43 (28%) predictive ML algorithms were implemented (life cycle phase 6) in a research setting but not in practice. 26 , 31 - 34 , 58 , 61 - 64 , 66 , 67

The search for public availability of evidence on the 43 predictive ML algorithms resulted in 1541 hits, of which 33 were duplicates. Eighty-two publications met the inclusion criteria. Additionally, 80 publications were provided by the product owners or authors or obtained from vendors’ websites. A total of 162 publications were included in the study, which included peer-reviewed articles, technical reports, posters, conference papers, and DMPs (eFigure in Supplement 1 ). Nine authors and product owners responded and completed the online questionnaire about the reported evidence of the predictive ML algorithm. An overview of the publication characteristics per predictive ML algorithm is provided in eTable 2 in Supplement 1 . An overview of the availability of evidence per predictive ML algorithm is provided in eTable 3 in Supplement 1 .

An overview of the availability of evidence per requirement and according to the life cycle phase is shown in Figure 3 . The 3 most commonly available types of evidence per requirement were a clear definition of the target use of the AI model, evaluation of its statistical characteristics, and adherence to software standards and regulations (78%, 47%, and 66% availability, respectively). Conversely, the least available evidence pertained to the implementation plan, monitoring, and health technology assessment (2%, 2%, and 14% availability, respectively), largely due to a lack of information from FDA-approved and CE-marked predictive ML algorithms.

The life cycle phase with the most comprehensive evidence was phase 2 (development), where 46% evidence availability for the relevant requirements was identified. This finding was followed by life cycle phase 3 (validation), with a 39% evidence availability. The life cycle phases with the most limited availability of evidence were phase 1 (preparation) at 19% and phase 5 (impact assessment) at 30%. Commercially available CE-marked and FDA-approved AIPAs offered less evidence across all life cycle phases compared with AIPAs found in the literature database search (29% vs 48% of the overall score to be determined).

For 5 predictive ML algorithms (12%), evidence was available only for 2 requirements: definition of the target use and required standards and regulations (availability per predictive ML algorithm score: 4 of 48 possible points). 37 , 38 , 47 , 48 , 69 Twelve (28%) predictive ML algorithms obtained approximately half of their individual maximum attainable evidence availability score. 24 , 26 - 28 , 30 , 31 , 33 , 34 , 36 , 41 , 43 , 65 Twelve (28%) did not reach life cycle phase 6 (implementation). 26 , 31 - 34 , 36 , 58 , 61 - 64 , 66 The predictive ML algorithms that reported the highest availability of evidence per predictive ML algorithm score were a risk-prediction algorithm for identifying undiagnosed atrial fibrillation 28 (36 of 48 possible points) and an AI-powered clinical decision support tool that enabled early diagnosis of low ejection fraction 34 (37 of 42 possible points). Both predictive ML algorithms were neither CE-marked nor FDA-approved at the time of publication. Overall, predictive ML algorithms identified through the peer-reviewed literature database search yielded more publicly available evidence 24 , 26 , 28 - 36 , 58 , 61 - 66 compared with the predictive ML algorithms identified solely from FDA-approved or CE-marked databases 25 , 27 , 37 - 55 , 57 , 59 , 60 , 68 (45% vs 29%) (eTable 2 in Supplement 1 ).

To our knowledge, this systematic review provides the most comprehensive overview of predictive ML algorithms implemented in primary care to date and reveals insufficient public availability of evidence of a broad set of predictive ML algorithm quality criteria. The availability of evidence was highly inconsistent across the included predictive ML algorithms, life cycles, and individual quality criteria. Predictive ML algorithms identified from peer-reviewed literature generally provided more publicly available evidence compared with predictive ML algorithms identified solely from FDA or CE registration databases.

The results align with those of previously published research. The scarcity of evidence is particularly pronounced among predictive ML algorithms that have received FDA approval or CE marking. 9 , 12 , 13 , 70 - 72 Many AI developers in the health care sector are known not to disclose information in the literature about the development, validation, evaluation, or implementation of AI tools. 12 , 19 , 73 There may be tension between protecting intellectual property and being transparent. 74 Moreover, not all evidence requires peer review, including regulatory processes such as obtaining a CE mark, where notified bodies assess the high-risk medical devices’ evidence for compliance. However, concerns may arise regarding the complexity of methodologies when reporting on effectiveness in a clinical setting. In such cases, there might be a preference for a peer-reviewed process to ensure that evaluation does not solely rely on notified bodies. 75 Although the FDA and the European Union Medical Device Regulation and, more recently, the AI Act, have released new initiatives to enhance transparency, disclosure of evidence was not mandatory at the time of writing this systematic review. 76 - 80 It would be interesting to assess the impact of new regulation in the future.

The availability of evidence fosters transparency and trust among end users, allowing other investigators to scrutinize the data and methods used and thus ensuring ethical and unbiased research and development practices. 81 , 82 Researchers can build on previous work, advancing scientific knowledge by making evidence available. If studies lack the necessary details, subsequent researchers may be more likely to create a new AI model instead of validating or updating an existing one. In addition, transparent reporting of predictive ML algorithms encourages vigilance among users, increasing the level of trust humans have in AI, as shown by human factors research. 82 On the other hand, failing to provide evidence can hamper patient safety due to, for example, algorithmically generated outcomes, interpretations, and recommendations that exhibit unfair advantages or disadvantages for specific individuals or groups. 83

The results show that evidence was the most scarce regarding the availability of, or reference to, a DMP. The DMP, while not necessarily required to be publicly accessible, is critical to preparing for collecting, managing, and processing data. The DMP plays an overarching role in the entire trajectory toward structurally implementing and using the AI model in daily practice. 84 It forms an essential component for every stage of the predictive ML algorithm life cycle and can ensure and safeguard data quality, reproducibility, and transparency while striving for Findable, Accessible, Interoperable, and Reusable (FAIR) data. 85 - 88 The FAIR principles aim to support the reuse of scholarly data, including algorithms, and to focus on making data findable and reusable by humans and machines. 87 Although FAIR principles have been widely adopted in academic contexts, the response of the industry has been less consistent. 89

Evidence was also limited regarding the impact and health technology assessments of predictive ML algorithms. The lack of accessible evaluations of the outcome and implementation in everyday clinical practice may hinder the translation of research findings into practical applications in health care. 90 Lack of such information may also impede adoption, as medical professionals need robust evidence to gain trust in these technologies and consistently integrate them into their everyday workflow. Medical professionals stress the importance of adhering to legal, ethical, and quality standards. They voice the need to be trained to interpret the availability of evidence supporting the safety of AI systems, including predictive ML algorithms and their effectiveness. 8 , 81 Without this information, it is challenging to ascertain whether the success of a predictive ML algorithm model is attributable to the model itself, the elements of its implementation, or both. As a result, it can be challenging to inform stakeholders about which, how, and for whom predictive ML algorithms are most effective.

Applying the Dutch AIPA guideline requirements to structure the availability of evidence, as demonstrated in this systematic review, can serve as a blueprint for showcasing to policymakers, primary care practitioners, and patients the reliability, transparency, and advancement of predictive ML algorithms. The guideline also has the potential to accelerate the process of complying with regulations. 16 Although not legally binding, the guideline can be used by developers and researchers as the basis for self-assessment. Furthermore, in the context of Dutch primary care, wherein general practitioners often operate within smaller organizations, limited resources may impede their ability to evaluate complex AI models effectively. 91 Therefore, comprehensive tools for assessing the availability of evidence on predictive ML algorithms, such as the Dutch AIPA guideline and the practical evaluation tool we developed, are valuable to primary care professionals and may aid large-scale adoption of predictive ML algorithms in practice. Since primary care worldwide is under substantial pressure, from a health systems perspective, it is essential to remove barriers to implementing innovation such as predictive ML algorithms.

This study has methodological limitations that should be taken into account. First, the scope of the systematic review excluded regression-based predictive models and simple rule-based systems. Although these approaches can be of substantial value in primary care, the focus on predictive ML algorithms enabled us to provide an in-depth overview of the aspects of model complexity and interpretability. Although most accepted definitions of ML do not exclude simple regression, the scope was beneficial for maintaining a manageable overview of more complex models that pose unique challenges for standardized reporting of model development, validation, and implementation. Second, we restricted the systematic review to articles published in English. We believe this restriction does not substantially affect the generalizability of the results since previous research has found no evidence of systematic bias due to English-language limitations. 92 Third, we could not formally compare the predictive validity across predictive ML algorithms due to the substantial variations in the types of AI models and heterogeneous methods between studies. Additionally, the availability scores presented in this study should be seen as an approximation of the degree of public availability of evidence, in line with the objectives described . Fourth, the Dutch AIPA guideline is a local norm, which is not legally binding. Several international AI guidelines exist that apply to predictive ML algorithms, such as TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) for development and validation, DECIDE-AI (Developmental and Exploratory Clinical Investigations of Decision Support Systems Driven by Artificial Intelligence) for feasibility studies, and SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence) and CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) for impact assessments. 93 - 97 These international guidelines, however, are aimed primarily at researchers. We chose to build on the Dutch AIPA guideline because it provides complete, structured, and pragmatic quality assessment derived from existing guidelines across the entire AI life cycle, and it is specific for predictive algorithms, which are considered to be of great potential in the medical field. 15 It specifically emphasizes implementation aspects and practical applications in clinical practice and may therefore be useful for primary care professionals.

In this systematic review, we comprehensively identified the availability of evidence of predictive ML algorithms in primary care, using the Dutch AIPA guideline as a reference. We found a scarcity of evidence across the AI life cycle phases for implemented predictive ML algorithms, particularly from algorithms published in FDA-approved or CE-marked databases. Adopting guidelines such as the Dutch AIPA guideline can improve the availability of evidence regarding the predictive ML algorithms’ quality criteria. It could facilitate transparent and consistent reporting of the quality criteria in literature, potentially fostering trust among end users and facilitating large-scale implementation.

Accepted for Publication: July 10, 2024.

Published: September 12, 2024. doi:10.1001/jamanetworkopen.2024.32990

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Rakers MM et al. JAMA Network Open .

Corresponding Author: Margot M. Rakers, MD, Department of Public Health and Primary Care, Leiden University Medical Centre, 2333 ZA Leiden, the Netherlands ( [email protected] ).

Author Contributions: Dr van Os had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Rakers, van Buchem, Kant, van Smeden, Moons, Chavannes, van Os.

Acquisition, analysis, or interpretation of data: Rakers, van Buchem, Kucenko, de Hond, van Smeden, Moons, Leeuwenberg, Villalobos-Quesada, van Os.

Drafting of the manuscript: Rakers, van Buchem, Moons, Leeuwenberg, Villalobos-Quesada, van Os.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: van Buchem, Kucenko, van Smeden, Moons, van Os.

Obtained funding: van Os.

Administrative, technical, or material support: Rakers, Moons, Chavannes.

Supervision: de Hond, Kant, van Smeden, Moons, Chavannes, Villalobos-Quesada, van Os.

Conflict of Interest Disclosures: None reported.

Funding/Support: This work was supported by grant LSHM21009 from Innovative Medical Devices Initiative (Dr van Os).

Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 2 .

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 02 September 2024

Integrating machine and deep learning technologies in green buildings for enhanced energy efficiency and environmental sustainability

  • Shahid Mahmood 1 ,
  • Huaping Sun 1 , 2 ,
  • El-Sayed M. El-kenawy 3 , 6 ,
  • Asifa Iqbal 4 ,
  • Amal H. Alharbi 5 &
  • Doaa Sami Khafaga 5  

Scientific Reports volume  14 , Article number:  20331 ( 2024 ) Cite this article

89 Accesses

1 Altmetric

Metrics details

  • Energy and society
  • Sustainability

A green building (GB) is a design idea that integrates environmentally conscious technology and sustainable procedures throughout the building’s life cycle. However, because different green requirements and performances are integrated into the building design, the GB design procedure typically takes longer than conventional structures. Machine learning (ML) and other advanced artificial intelligence (AI), such as DL techniques, are frequently utilized to assist designers in completing their work more quickly and precisely. Therefore, this study aims to develop a GB design predictive model utilizing ML and DL techniques to optimize resource consumption, improve occupant comfort, and lessen the environmental effect of the built environment of the GB design process. A dataset ASHARE-884 is applied to the suggested models. An Exploratory Data Analysis (EDA) is applied, which involves cleaning, sorting, and converting the category data into numerical values utilizing label encoding. In data preprocessing, the Z-Score normalization technique is applied to normalize the data. After data analysis and preprocessing, preprocessed data is used as input for Machine learning (ML) such as RF, DT, and Extreme GB, and Stacking and Deep Learning (DL) such as GNN, LSTM, and RNN techniques for green building design to enhance environmental sustainability by addressing different criteria of the GB design process. The performance of the proposed models is assessed using different evaluation metrics such as accuracy, precision, recall and F1-score. The experiment results indicate that the proposed GNN and LSTM models function more accurately and efficiently than conventional DL techniques for environmental sustainability in green buildings.

Similar content being viewed by others

machine learning research paper

Active learning-based machine learning approach for enhancing environmental sustainability in green building energy consumption

machine learning research paper

Grasshopper platform-assisted design optimization of fujian rural earthen buildings considering low-carbon emissions reduction

machine learning research paper

Deep learning-based forecasting of electricity consumption

Introduction.

The building and construction industry is recognized for using excessive amounts of natural resources, which has a detrimental impact on the environment 1 . Buildings and construction consume the most energy (36%) and emit the most CO2 (37%) globally 2 , 3 , 4 . The Architecture, Engineering, and Construction (AEC) industry has a significant issue in the form of sustainability, which encompasses resource efficiency. Researchers, specialists, and practitioners in the building and construction sector have endeavored to identify substitute methods for implementing energy conservation throughout the building life cycle. The green building (GB) idea is being implemented as one of the initiatives 5 , 6 .

GB is being endorsed worldwide as an approach to enhancing the building industry’s sustainability 7 . The Green Building Concept pertains to using environmentally friendly and sustainable concepts throughout the building life cycle, starting with the first stages of project development and continuing through to the decommissioning phase. It is frequently considered a method to reduce energy consumption in the building and construction industry 8 , 9 , 10 . At every project step, the contractor, the architects, the engineers, and the client must all support this. The concerns about the economy, well-being, utility, and durability associated with traditional design are lessened by green building practices. More and more people are embracing the notion of green building since it is advantageous from a financial, health, and, most importantly, environmental standpoint 11 .

The GB paradigm reduces the adverse environmental consequences that buildings and human activity within them have by introducing concepts and technology into buildings at every stage of their existence 12 . The environment may be greatly impacted by decisions made at the first stages of building design 13 . However, the design of GB is typically more difficult than conventional structures because of the different design components and building performances that need to be adjusted to accomplish sustainability ideally 14 , 15 . As a result, the GB design process may take longer than expected since it requires a multidisciplinary team project in which team members must elaborate on every GB component of the design 16 .

Technological advancements in construction have enabled digitization, automation, and integration, improving decision making and productivity in (GB) initiatives 17 , 18 , 19 . This has led to global interest and numerous empirical studies on GB’s benefits 20 , 21 . AI, which encompasses intelligent systems capable of learning and problem-solving, further enhances communication and productivity in GB design 17 , 18 . AI includes machine learning (ML) as a subset. It’s a method that gives a system the capacity to grow and learn from its own experiences without programming 22 . It has been thoroughly studied and used throughout the construction process 23 . This method was created during the building design phase to maximize the GB design’s building performance. The growth of digital technology adoption in the building design process has been greatly aided by earlier research on (ML) use in the process carried out in the last few years. The whole design process has been changed by it 24 . For example, in research 25 , an ML model to forecast dependable energy performance in office buildings was developed using the artificial neural network (ANN) approach. This model required 50 times less computing time than the industry standard building performance simulation tools.

However, it has been demonstrated that the Statistical Neural Network and Gaussian Regression techniques used by Rahman and Smith 26 to create an ML model to forecast fuel usage in a commercial building were more accurate. In addition, Geyer and Singaravel 27 created a component-based machine-learning model to forecast the thermal energy performance of office buildings by employing the ANN approach. With an error of less than 3.9%, the forecast may be generated with a significantly smaller computing time. This earlier research shows that the suggested predictive models utilizing machine learning techniques might drastically reduce the time needed for calculation during the design phase, improving the efficiency of engineers and architects creating GB. Though there have been several advancements, further proof is still required to support the use of (ML) in creating a prediction model for GB design. This study attempts to create a design prediction model for GB using the ML and DL classifiers method as one of the techniques to address this gap. This paper is expected to provide references and give insights to building practitioners regarding the utilization of the ML and DL approach to optimize resource consumption, improve occupant comfort, and lessen the environmental effect of the built environment of the GB design process, which can significantly contribute to accelerating technology-based development in the building and construction sector.

Research contribution

The following are this paper’s primary contributions:

We propose a technique for green building design by applying machine and deep learning techniques that can maximize resource use, minimize energy consumption, and reduce the built environment impact to enhance environmental sustainability.

We apply the suggested models to the ASHARE-884 dataset. We perform data preprocessing by applying Exploratory Data Analysis (EDA), which involves cleaning, sorting, and converting the category data into numerical values utilizing label encoding. Also, the Z-Score normalization technique was applied to normalize the data.

The experiment’s findings show that the suggested GNN and LSTM design performs better in terms of accuracy and efficiency in terms of environmental sustainability in green buildings when compared to traditional DL methodologies.

Research organization

This paper is structured as follows: section “ Literature review ” presents the background and relevant works. Section “ Proposed methodology ” introduces the proposed approach to machine learning and the deep learning method for green building design to enhance environmental sustainability. Section “ Results ” assesses the performance of our technique and contrasts it with the baseline methods. Section “ Conclusion ” concludes the paper and provides future direction.

Literature review

This section examines previous studies on environmental sustainability and artificial intelligence (AI) techniques for green buildings to pinpoint research gaps and support the necessity of the suggested strategy.

Environmental sustainability for green buildings

Sustainability is growing in the building and construction sector as a significant driver of social, economic, and environmental benefits with fewer adverse environmental effects. Green and sustainable practices must be established to increase energy efficiency in the building and construction sector. This is especially true when applying the most recent green technologies. The study 28 aims to find the most applicable techniques for employment in green building, assess the advantages of implementing green building, and examine the best practices of green building attributes. The results of this study demonstrated that green buildings can be created with less energy consumption and a lower long-term operating and maintenance cost by utilizing sustainable practices and energy-efficient systems. According to the study’s author 29 , achieving the objectives of resource protection, pollution reduction, and ecological environment improvement requires careful consideration of the environmental benefit analysis of green buildings. This includes efficient energy and resource utilization. The findings demonstrate that the incremental environmental advantages in terms of land saving, energy saving, water saving, material saving, indoor environmental quality, and operation management are increased by 23.15%, 10.37%, 19.30%, 18.25%, and 22.53%, respectively. In contrast, the measured results of the incremental environmental costs of the office building in Taiyuan City are decreased by 13.56%, 11.02%, 25.17%, 14.43%, and 15.25%, respectively. According to those, BIM technology used in the full life cycle cost assessment of green buildings can evaluate, quantify, and direct every stage of building design, construction, and upkeep while also largely satisfying resource and energy conservation requirements.

Green buildings are seen as a vital aspect of attaining sustainability since they incorporate green and natural components that reduce pollution and usage of resources. The goal of 30 research is to define the terms “sustainability” and “green building” as they relate to residential building design since it’s important to comprehend how sustainable design principles can help mitigate negative effects on the environment and society. The case-study methodology is employed. The study focuses on the innovative and sustainable design elements utilized in three case studies of green buildings—one each in China, Indonesia, and Dubai. The results showed that these nations seek to encourage the construction of environmentally friendly structures, especially homes, to achieve maximum social, economic, and environmental sustainability. In 31 , a study was conducted to determine the stages of action and challenges involved in implementing sustainable building practices, as well as to assess the extent of integrating these techniques into professional practice. The study’s suggested purpose was to ensure the best possible outcomes and qualitative and quantitative methodologies were used to examine. The study set a descriptive analysis based on survey analysis for the quantitative approach. The results showed that stakeholders have a respectable degree of awareness and knowledge. The results of this study may close a significant knowledge gap about the benefits of green building practices that exist in the absence of empirical research in developing nations.

The author of article 12 suggested a practical mapping tool that assesses how much a (GB) contributes to the Sustainable Development Goals (SDGs) by applying green building rating tools (GBRTs). It then used the analytic hierarchy technique to examine this contribution quantitatively. The findings demonstrated that GBRTs greatly assist SDGs 3, 7, 11, and 12, with SDG 12 benefiting the most. SDG Target 7.3, on the other hand, is the most notable since it offers the most significant avenue for GBRT to contribute to the SDGs. The study 32 investigates how the information modules suggested by EN 15978 and the three elements of sustainabilityenvironmental, social, and economic are covered by the indicators in GBRS as it moves through the life cycle phases of building development. The 387 sustainability indicators that were part of the eight chosen GBRS were examined and grouped based on three distinct classification criteria: the sustainability dimension, information modules, and stage of the construction process life cycle. Four rounds and meetings of an iterative process of indicator analysis and clustering were conducted by a panel of diverse academic and professional experts in the subject of study, leading to a consensus on the results. According to the analysis’s findings, the environmental dimension is the one that is most valued among the instruments, and to strike a fair balance, more focus needs to be paid to both the social and economic dimensions.

Recent advances in green building technologies (GBTs) have increased significantly due to environmental, economic, and societal benefits. The primary goal of GBTs is to use resources like water, energy, and other materials sparingly and in a balanced manner. As a result, the environment will be better. GBs improve productivity and health, reduce maintenance and operating costs, and reduce energy use and emissions. The goal of 33 study is to identify important concerns in the field of green building research that pertain to sustainable building practices that are low-impact on the environment, economical, and long-term in nature while also taking future developments into account. To ensure a sustainable future, this article analyzes the current status of green building construction and recommends more research and development. This study also suggested a few potential paths for sustainable development research to stimulate more investigation. The author of 34 study creates a set of valid and reliable social sustainability metrics for evaluating green buildings in China. Indicators of green buildings are required to support practitioners in comprehending social sustainability indicators and to validate different studies. Therefore, the fuzzy Delphi approach is applied to examine these indicators. The findings indicate that the most crucial elements of green building social sustainability are durability and safety. Health, comfort, accessibility, and convenience are further important factors. To attain social sustainability, this set of indicators helps practitioners make decisions and offers appropriate, useful guidance for many stakeholders.

Artificial intelligence techniques for green buildings

In particular, for sustainable projects (i.e., green buildings), a precise expense forecast is essential. In the construction sector, where stakeholders require more knowledge in contract cost estimating, green building construction contracts are still relatively new. Green buildings, in contrast to conventional construction, are made to make use of innovative technology to lessen the negative effects that their operations have on society and the environment. To anticipate the costs of green buildings, the author of the 35 paper proposes machine learning-based techniques such as random forest (RF), deep neural network (DNN), and extreme gradient boosting (XGBOOST). The impact of both hard and soft cost-related attributes is taken into account in the construction of the suggested models. The accuracy of the created algorithms is assessed and compared using evaluation measures. When compared to the DNN’s 0.91 accuracy, XGBOOST’s 0.96 accuracy was the greatest; RF’s 0.87 accuracy came in second.

The Artificial Intelligence-based Energy Management Method (AI-EMM) for green buildings is recommended in 36 article. It can respond intelligently to improve user comfort, safety, and energy efficiency in response to human choices. The AI-EMM model includes subsystems for smart user identification and monitoring of the interior and external surroundings, as well as a universal infrared communication system. Energy usage is improved using Long Short-Term Memory (LSTM) models. The recommended methodology is applied to analyzing energy usage data from green buildings. The proposed method for investigating the interaction between Heating, Ventilation, and Air Conditioning (HVAC) systems should emphasize airside design optimization for the improved interior climate. The results show that environmentally friendly buildings and financial rewards are compatible together. The AI-EMM’s experimental results showed a 94.3% high-performance ratio, a 15.7% lower energy consumption ratio, a 97.4% accuracy ratio, a 95.7% energy management level, and a 97.1% prediction ratio. In research 37 , the author has examined how AI and DL have recently advanced and been applied to promote sustainability in a number of areas, such as attaining the (SDGs), energy efficiency, healthy environments, and energy management for smart buildings. AI has the potential to support 134 of the 169 SDGs targets, making it a valuable instrument for encouraging sustainable practices. However, considering the rapid pace at which these technologies are developing, extensive regulatory control is required to guarantee ethical standards, safety, and transparency. AI and DL have been successfully applied in the renewable energy sector to optimize power grid stability, fault detection, and energy management.

The objective of the 38 study is to create a mathematical model that will investigate the supply and demand balance for external green construction support and the related spending adjustment procedures in a deflationary environment. To determine the key parameters influencing the green building cost prediction process, the most recent datasets from 3578 green projects in Northern America were gathered, pre-processed, analyzed, post-processed, and evaluated using state-of-the-art (ML) techniques. The results indicate that green building costs are expected to decline due to governmental and private expenditures in green development. Moreover, public and private investment has a greater effect on reducing the cost of green construction during deflation than during inflation. Consequently, the proposed approach can be employed by decision-makers to oversee and assess the annual ideal external investment in the development of green buildings.

Proposed methodology

This section explains the whole procedure of the suggested approach. The proposed method comprises multiple steps, such as obtaining datasets, preparing data, and making model predictions. Figure 1 provided a graphic representation of the suggested architecture. The ASHRAE dataset is explored first in the design, and then data preprocessing, such as label encoding and Z-score normalization, is applied. After that, ML and DL classifiers are trained. The ensemble model forecasts energy use, and models are combined for increased accuracy. Metrics are used in evaluation to measure performance, and the most effective model is ultimately selected before the process is accomplished.

figure 1

Proposed architecture.

Experimental dataset

The dataset was acquired through field investigations of 160 distinct building sites worldwide. It is provided in support of the ASHRAE initiative to create a model of preferred thermal convenience. ASHRAE monitors an accumulation of data from several investigations by various investigators as a component of the RP-884 accessible repository. Numerous climate zones dispersed throughout various regions provide data files 39 .

The dataset was selected because individual turnover depends on a pleasant temperature. Residents of the building might experience discomfort due to an absence of temperature regulation. There are 56 features and 12,595 records in the ASHRAE RP-884 dataset used for this research. The objective of this dataset is to create an adaptive classifier. It comprises over 20,000 consumer convenience scores from 52 polls conducted across 10 climatic regions. The dataset’s primary identifiers are the reading day, age, year, subject, building code, and time of day. A thermal survey that residents submit, including the ASHRAE, the warmth scale (ash), clothing insulation, comfort level, metabolic rate, and high air temperature. Indices that are calculated from existing data, average air temperature, average radiation temperature, operational temperature, an average of three heights airspeed, an average of three heights airspeed, an average of three heights turbulence, air pressure, relative humidity, new standard practical temperature index set, two-node disc index, predicted mean vote, predicted percentage dissatisfied, etc. Perceived control over the thermal environment (PCC), PCED from 1 to 7, and other aspects of private environmental oversight combined with outdoor climate information, the outdoor average of the min/max air temperature, and outdoor maximum humidity percentage.

Dataset preprocessing

Data preprocessing is important since it enhances the model’s effectiveness and produces more precise attributes. In this step, data preprocessing is carried out using Exploratory Data Analysis (EDA), which involves cleaning, sorting, and converting the category data into numerical values utilizing label encoding. The present study utilized a Z-score normalization for cleaning data.

Exploratory data analysis

An essential component of any research endeavor is exploratory data analysis (EDA). Finding patterns and abnormalities in the data that can be utilized to focus the hypothesis testing is the primary objective of exploratory analysis. It also provides resources for data visualization and evaluation, usually through graphical representation, to help in hypothesis generation. After data collection, EDA is completed. Without making any assumptions, the data is effectively evaluated, plotted, and updated to assess the quality of the data and build models 40 . The ASHRAE dataset was subjected to the EDA approach. The dataset consists of one categorical and 55 numeric features, with a normal distribution for most. UW (Uncomfortable Warm), N (Neutral), and UC (Uncomfortable Cold) are the three classes that make up the categorical target attributes. 12,595 records of these classes are found in the dataset; the UW class has 1029, the N class has 10,061, and the UC has 1505 data entries. Figure  2 illustrates the unbalanced quality of the dataset.

figure 2

Graphical visualization of dataset classes neutral (N), uncomfortable cold (UC), uncomfortable warm (UW).

A measure of imbalance or divergence from normative patterns in an ensemble of data is called skewness. To determine the direction of the outlier, this study measures the data skewness. After determining the kurtosis of the information set and verifying its skewness, if the skewness value is positive, the distribution is asymmetrical. It has an extended tail on the right side. Kurtosis is the total weight of the distribution’s tails expressed proportionately to the distribution’s center. In this work, the kurtosis of the normal distribution is analyzed before the log transformation of skewed data is carried out. The log transformation can be used to approximate normalcy for skewed data. The outcome shows that the normal distribution’s kurtosis is 0.

Encoding categorical variables

Categorical variables challenge certain ML techniques. The classification variable must be transformed into numerical data, which is crucial for the designed algorithms to function as intended. The categorical variables’ coding determines the way various algorithms function. One or more labels in word or numerical form can be present in a feature’s dataset. This makes it simpler for people to evaluate the data, but it is incomprehensible to machines 41 . We utilize an encoding that renders these labels interpretable by machines. There are other encoding techniques, such as one-hot and hash encoding. The label encoding approach is used in this work to encode categorical information.

Label encoder

Numerical label input is made possible in an ML model using label encoding. Label Encoder uses numbers to assign a value to each label, replacing the values of each label in the dataset. When they have divergent priorities, the labels can be employed. This step is crucial in the data preparation process for supervised learning methods 41 . Usually, this technique replaces each value in a category column with a number between 0 and N − 1. In this study, a label encoder assigns a value of 0 to 1 or 2 to each categorical variable.

Data cleaning

The Z-score normalization technique is used in this study to identify and eliminate anomalies while cleaning the data. This study cleans the data after converting the category variable to a numerical value. The data distribution was 0.5 and 0.98 before the cleanup. Figure 3 represents the distribution of features before data cleaning.

figure 3

Distribution of features before data cleaning.

Z-Score: The degree to which a value resembles the average of a group of values is measured by a Z-score. Standard deviations are used in the average calculation of the Z-score. A data point’s Z-score of zero indicates that it has the same value as the average represented in Eq. ( 1 ). The data distribution is 0.5, 0.98 after the Z-score is applied and the outliers are eliminated. Figure  4 represents the distribution of features after data cleaning.

figure 4

Distribution of features after data cleaning.

The prediction portion counts the label data as class 0, which has 6919 values; class 1, which has 1014 values; and class 2, which has 651, following the completion of EDA, label encoding, and data cleaning. After preprocessing the data, the modeling phase is carried out. The data is divided into training and testing data, keeping a ratio of 70% training data and 30% testing data to increase accuracy and efficacy for this phase. The machine and deep learning models are trained after splitting.

Model selection

Model selection involves choosing the appropriate hyperparameters, optimizing strategies, and neural network architecture. Configure hyperparameters such as batch size, learning rate, the number of layers, activation parameters, dropout rate, etc. Hyperparameter tuning can significantly impact the model’s performance. Each model architecture is trained on the training set with a distinct set of hyperparameters using the appropriate evaluation criteria. This study utilized the multiple ML and DL models: RF, DT, XGB classifier, Stacking, GNN, LSTM, and RNN for green building sustainable environment.

Random forest

An ensemble learning technique called Random Forest combines several decision trees to produce a model that is more reliable and accurate. It is a popular model for regression and classification applications and is a member of the tree-based model class. Using random feature choice, Random Forest models build several decision trees independently, each trained on bootstrapped dataset samples. The model is less likely to overfit than individual trees because it aggregates predictions from individual trees by majority voting or averaging, which lowers variance and improves generalization performance.

Decision tree

Decision trees are supervised learning algorithms that create regions in the feature area by making decisions based on the supplied characteristic values. At each node, the tree finds which attributes best separate the data by optimizing a chosen criterion, such as data acquisition or Gini impurity. This process iterates backward and forwards until an end condition is met, such as reaching a maximum depth or a minimum number of samples per leaf. Decision trees are widely used in many different sectors because of their ease of understanding and ability to handle numerical and categorical data. However, in large, complex datasets, they may overfit and have difficulty generalizing to new data if the proper regularization procedures are not used.

Extreme gradient boosting

A potent ensemble learning technique built on gradient boosting is called XGBoost (Extreme Gradient Boosting). Gradient descent is used to maximize the weak learners, which are usually decision trees that are constructed one after the other. Rapid processing on structured/tabular data, efficiency, and scalability are well-known attributes of XGBoost. It has built-in functionality for handling missing values and uses regularization techniques to avoid overfitting. Furthermore, XGBoost provides sophisticated functionalities such as cross-validation and early stopping to optimize model performance.

Stacking, also called stacked expansion, is an ensemble learning method that uses a meta-model (logistic regression) to aggregate predictions from several base models (RF, DT). After training fundamental models on the dataset, a meta-model is trained using the predictions of the base models as input characteristics. By determining which to mix the outputs of several models best, stacking attempts to enhance overall forecasting accuracy by utilizing the strengths of each model. It is a well-liked option in machine learning contests and ensemble learning contexts since it frequently performs better than individual models and conventional ensemble techniques.

Graph neural network

Graph Neural Networks (GNNs) were developed to arrange and visualize data in topologies or networks. Graphs are made up of vertices, or nodes, joined by vertices and, in this case, hyperlinks. To acquire information and insights, GNNs are used in data mining 42 . A GNN’s function is to process organized data. In a graph, each point is connected to a feature vector. Node i ’s beginning position is represented by \(g_{i}^{0}\) .

m represents the GNN layer and K ( i ) indicated the node i neighborhood. An aggregation method called Agg collects data from nearby nodes. The edge attribute is \(c_{in}^{m}\) among node i and node n in layer m , and the non-linear activation function is denoted as h . To generate a graph-level participation, data from each node is combined after multiple layers.

b is the final layer, and an aggregation function called readout is used to determine the illustration at the graph scale. A loss function, which is usually a measurement of the discrepancy between the true and anticipated categories, is minimized to train the GNN.

The GNN model output is denoted by Prediction ( f j ). Training uses optimization algorithms and backpropagation to change the model variables θ .

where η represents the learning rate.

Long short-term memory (LSTM)

The enhanced form of the recurrent neural network (RNN) is the long-short-term memory. It is proposed that memory blocks, not normal RNN units, but long-term memory, can address the increasing slope and vanishing issue. The main distinction between LSTM and RNN is that LSTM incorporates the cell state to preserve the long-term states. An LSTM network can retrieve and link previous data to present information 43 . Three distinct gates are used in the architecture of long short-term memory: the input, forget, and output gates. The cell’s new and prior states are designated by n t and n t −1 , respectively, while the current and previous outputs are indicated by o t and o t −1 , respectively. The current input is represented by c t . The following equations provide the rules for the LSTM’s input gate.

In Eq. ( 6 ), the input gate is represented by g i , and the preceding outputs, v t −1 and p t , are passed through the sigmoid layer for deciding which portion of the information needs to be added.

After transferring the old information, p t −1 , and the current information, c t , by tanh layer using the input gate a i , the Eq. ( 7 ) is utilized to obtain the updated information U t . Equation ( 8 ) integrates the information of long-term memory S t −1 into S t and the present state of information C t . The sigmoid output is denoted by D i , while S t is denoted by the tanh output. D i represents the weight metrics, and a i represents the LSTM’s input gate. Using the dot product of the input information J t and the current state of information s t and sigmoid layer, the forget gate of the LSTM then enables the particular transmission of the data.

Equation ( 9 ) is utilized to determine the specific probability of deleting the linked data by the final cell. The weight matrix is represented by D f , the offset is a f , and the sigmoid function is σ .

The inputs in Eqs. ( 10 ) and ( 11 ) establish that the states required for the continuation through the previous and current outputs, o t −1 and p t , respectively, are described by the output gate T t of the LSTM. The decision vector of the condition that transmits new information N t by the tanh layer is multiplied and acquired by the final output F t .

where a o denotes the bias of the LSTM of the output gate and D o is its weighted matrix 44 .

The input, forget, and output gates are the three gates in the model. The first gate, called a forget gate ( h t ), uses a sigmoid function of σ to take the previous output, o t −1 , and the current input, c t , from the prior state, s t −1 . The input gate uses the sigmoid function σ and the tanh layer to take in input information J t after adding the prior data. The information is obtained from the input gate and fed into the output gate, which utilizes the sigmoid function σ to compute all the information and deliver the current state where the output is kept.

Recurrent neural network

An artificial neural network that processes sequential data by preserving a hidden state that records details about earlier inputs is called a recurrent neural network (RNN). Because RNNs feature connections that allow information to remain over time, they are excellent for tasks requiring sequence or time series, in contrast to feedforward neural networks, which analyze each input sequentially.

Recurrent connections, which enable information to move from a single phase to the next, distinguish an RNN. The RNN gets an input i _ t at each time step t , processes it to generate an output o _ t , and changes its hidden state h _ t h, which stores data from earlier time steps. The hidden state at time t is calculated mathematically as an expression of the input i t and the hidden state that came before it, h _ t . An RNN can learn to analyze sequences of varied lengths since an identical set of parameters (weights and biases) are utilized at each time step. Parameter sharing simplifies This design, enabling the RNN to generalize across multiple time steps.

This section examines the suggested model’s effectiveness. Employing the ASHARE-884 dataset, the suggested model applies various DL classifiers. The parameters used to assess the model are f1-score, recall, accuracy, and precision. By comparing the current methods, these standards assess the proposed model’s performance.

Evaluation metrics

This study assesses the framework’s efficacy using extensive evaluation criteria, each offering valuable perspectives on the model’s operation. The first metric, accuracy, is typically used as the standard to evaluate performance. It is computed as the part of accurately recognized samples based on the total sample amount. The procedure is made simpler by Eq. ( 12 ), which emphasizes the measure’s simplicity despite its substantial influence.

The accuracy is the ratio of all positive forecasts the model produces to effectively precise projections; it is a crucial assessment metric utilized in performance evaluation. Equation ( 12 ) proportionally illustrates this value, making the metric notional equation easier to understand.

A model or system’s precision indicates how it forecasts the positive class. It represents the accuracy of the model and the degree of confidence in its ability to produce good predictions. This value is shown proportionately in Eq. ( 13 ), facilitating comprehension of the metric basic equation.

Recall, also called sensitivity, is an evaluation metric that centers on the ratio of every positive instance to the percentage of precise positive predictions. This balanced viewpoint offers a special benefit while estimating, as the computation of Eq. ( 14 ) demonstrates.

The appropriately identified F1 score functions as an equilibrium of memory and precision because it can effectively communicate the essence of a balanced performance. Combining these two metrics yields the F1-score, a popular estimate of model performance that is especially useful for evaluation. This basic estimating procedure is well described by Eq. ( 15 ), which looks complicated but provides much information.

One significant and unique indicator used in the evaluation process is the Confusion Matrix (CM), which is carefully designed to provide precise data regarding the efficacy of the classification model. This essential tool illustrates the model’s efficacy by comparing the anticipated and actual data. True positive (TP), false negative (FN), false positive (FP), and true negative (TN) are the four values displayed by the CM, a unique kind of display. The matrix sections’ labels, which show the actual class designations, correspond to the columns. The correctly recognized samples are arranged along the diagonal, while the incorrectly categorized cases are situated on the diagonal portions. The CM values are an essential tool for assessment that can highlight the advantages and disadvantages of the model. They also provide insightful data that improves the model and produces favorable outcomes.

ML models result analysis

The outcomes of ML models (RF, DT, XGB and Stacking) on the ASHARE-884 dataset are displayed in Table 1 . The models with the best accuracy are the XGB and RF models (0.84), the Stacking model (0.84), and the DT model (0.76). On the other hand, the XGB model has the highest precision (0.83), while the RF model has the lowest (0.82). The RF and XGB versions have the highest recall (0.84), while the DT model has the poorest recall (0.76). The F1-score considers both the precision and recall of a model since it is a harmonic mean of both metrics. The models with the greatest F1-score (0.80) are the XGB and RF models, trailed by the DT (0.77) and the stacking (0.80) models.

Ml model confusion matrix graphical representation is displayed in Fig.  5 . Figure  5 a, the confusion matrix of the RF is shown graphically. The rows of the table indicate the actual classes of the occurrences, and the columns indicate the anticipated classes. The diagonal cells of the matrix show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 1997 instances were projected for class 0, 92 instances for class 1, and 24 cases are accurately anticipated. Figure  5 b, the confusion matrix of the DT is shown graphically. It shows the number of occurrences of correctly identified diagonal cells, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 1744 instances were projected for class 0, 124 instances for class 1, and 59 cases are accurately anticipated.

figure 5

Performance visualization of ML model results.

Figure  5 c shows the confusion matrix of the XGB. The diagonal cells of the matrix show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 1992 instances were projected for class 0, 100 instances for class 1, and 34 cases are accurately anticipated. Figure  5 d shows the confusion matrix of the Stacking. It shows the number of occurrences of correctly identified diagonal cells, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 1997 instances were projected for class 0, 92 instances for class 1, and 18 cases are accurately anticipated.

DL models result analysis on training and testing data

The outcomes of deep learning models (GNN, LSTM, RNN) on the ASHARE-884 dataset are displayed in Table 2 . The models employ various building energy consumption parameters from the dataset to forecast the energy utilization of a building. DL is compared according to their evaluation, which indicates if they were trained on the test or training dataset. The outcomes demonstrate that the training dataset yielded outstanding results from all three models compared to the test dataset. Overall, the GNN model performed exceptionally well, with an accuracy of 0.83 on the test dataset and 0.85 on the training dataset. Accuracy for the LSTM and RNN models was approximately 0.81 on the training dataset and 0.79 on the test dataset. Compared to the LSTM and RNN models, the GNN model performs better on the test and training datasets regarding accuracy. This implies that capturing the relationships between the various features in the data constitutes a task the GNN model does better. Of the three models, the accuracy is higher than the precision and recall. This shows that false positive predictions are more common in the models than false negative ones. Comparing the test dataset to the training dataset, the F1-score of the three models is declining.

Figure  6 shows the performance visualization of the GNN model in terms of accuracy, loss, and roc curve. Figure  6 a demonstrates the training and testing accuracy of the GNN model. The x-axis displays the number of epochs or the number of instances in which the model has evaluated the training data. The percentage of accurate predictions the model generates is displayed on the y-axis, representing accuracy. The training accuracy, or the model’s accuracy using the training data set, is represented by the blue line. The model’s accuracy, represented by the green line, represents the test accuracy. The training accuracy in the figure starts at about 0.80% and rises to about 0.84%. The test accuracy rises to approximately 0.82% from a starting point of about 0.80%. There is a 0.02% applicability gap.

figure 6

Performance visualization of GNN model results.

Figure  6 b shows the training and testing loss of the model. The training loss starts at about 0.625 and goes down to about 0.450. The test loss begins around 0.600 and gradually drops to about 0.500. Although there is no substantial distinction between the two lines, the training loss is always less than the test loss. This implies that the model is effectively expanding with new data. Over time, there has been a decrease in both training and test loss. From this, the model continues to evolve. The test loss is not that different from the training loss. This implies that there is not a significant overfitting of the model. A visual tool used to assess a classification model’s efficacy is the ROC curve. The y-axis displays the true positive rate (TPR), while the x-axis displays the false positive rate (FPR). In Fig. 6 c, the model performs better; the train ROC curve has an AUC of 0.82, and the test ROC curve has an AUC of 0.71.

The confusion matrix of the GNN model is shown graphically individually for train data and for test data in Fig.  7 . It provides an overview of the operation of a classification algorithm. Because the suggested strategy produces fewer false positive and negative data and more constant, better true positive and negative values, it performs better. The rows of the table indicate the actual classes of the occurrences, and the columns indicate the anticipated classes. The diagonal cells of the matrix in Fig.  7 a show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 7,020 instances were projected for class 0, 497 instances for class 1, and 114 cases are accurately anticipated on training data.

figure 7

Graphical Visualization of GNN Model Results.

The diagonal cells of the matrix in Fig.  7 b show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. Based on test data, 19 cases are successfully predicted, whereas 1982 occurrences for class 0 and 83 instances for class 1 were forecasted.

Figure  8 shows the performance visualization of the LSTM model in terms of accuracy, loss, and roc curve. Figure  8 a demonstrates the training and testing accuracy Curve of the LSTM model. The training accuracy in the figure starts at about 0.81% and rises to about 0.82%. The test accuracy rises to approximately 0.82% from a starting point of about 0.78%.

figure 8

Performance visualization of LSTM model results.

Figure  8 b shows the training and testing loss of the LSTM model. The training loss starts at about 0.645 and goes down to about 0.44. The test loss begins around 0.62 and gradually drops to about 0.57. Although there is no substantial distinction between the two lines, the training loss is always less than the test loss. This implies that the model is effectively expanding with new data. Over time, there has been a decrease in both training and test loss. From this, the model continues to evolve. The test loss is not that different from the training loss. This implies that there is not a significant overfitting of the model. A visual tool used to assess a classification model’s efficacy is the ROC curve. The y-axis displays the true positive rate (TPR), while the x-axis displays the false positive rate (FPR). In Fig. 8 c, the model performs better; the train ROC curve has an AUC of 0.68, and the test ROC curve has an AUC of 0.65.

The confusion matrix of the GNN model is shown graphically individually for train data and for test data in Fig.  9 . The rows of the table indicate the actual classes of the occurrences, and the columns indicate the anticipated classes. The diagonal cells of the matrix in Fig.  9 a show the number of occurrences that were correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 7997 instances were projected for class 0, 212 instances for class 1, and 15 cases are accurately anticipated on training data.

figure 9

Graphical visualization of LSTM model results.

The diagonal cells of the matrix in Fig.  9 b show the number of occurrences that were correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. Based on test data, 2025 cases are successfully predicted for class 0, whereas 30 occurrences for class 1 and 2 instances for class 1 were forecasted.

Figure  10 shows the performance visualization of the GNN model in terms of accuracy, loss, and roc curve. Figure  10 a demonstrates the training and testing accuracy of the RNN model. The training accuracy in the figure starts at about 0.79% and rises to about 0.81%. The test accuracy rises to approximately 0.806% from a starting point of about 0.82%. Figure  10 b shows the training and testing loss of the RNN model. The training loss starts at about 0.65 and goes down to about 0.54. The test loss begins around 0.62 and gradually drops to about 0.57. A visual tool used to assess a classification model’s efficacy is the ROC curve. The y-axis displays the true positive rate (TPR), while the x-axis displays the false positive rate (FPR). In Fig.  10 c, the model performs better; the train ROC curve has an AUC of 0.77, and the test ROC curve has an AUC of 0.66.

figure 10

Graphical visualization of RNN model results.

The confusion matrix of the GNN model is shown graphically individually for train data and for test data in Fig.  11 . The diagonal cells of the matrix in Fig.  11 a show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. 7985 instances were projected for class 0, 212 instances for class 1, and 0 cases are accurately anticipated on training data. The diagonal cells of the matrix in Fig.  11 b show the number of occurrences correctly identified, and the off-diagonal cells show the proportion of instances that were wrongly classified. The anticipated proportions are represented by the class labels 0, 1, and 2. Based on test data, 2027 cases are successfully predicted for class 0, whereas 38 occurrences for class 1 and 0 instances for class 1 were forecasted.

figure 11

Findings and discussion

This study utilized the ASHARE-884 dataset for green building energy consumption to predict the temperature and environmental sustainability of a smart building. In particular, for green buildings that significantly depend on local climate circumstances, the resolution of the climate statistics was not precise enough to capture localized temperature fluctuations, which are critical for accurately projecting energy usage. The dataset might not encompass enough time to include upward trends or variations in climatic patterns, which are crucial for comprehending energy usage patterns and producing precise forecasts. When combined with climatic data, historical energy consumption data makes it possible for researchers to validate and calibrate forecasting models successfully. Validated models can help make evidence-based choices for green building and operations and increase trust in the reliability of energy consumption projections.

There are advantages and disadvantages to using real-world datasets for DL and ML in the design of green buildings. The complexity and volume of the data, which need a large amount of processing power, as well as problems with data quality and consistency, such as inconsistent or incomplete data, are challenges. Complicating matters further are worries about data security and privacy as well as regulatory compliance. Furthermore, integration and benchmarking are challenging due to the dynamic and heterogeneous nature of the data and the absence of uniformity across many sources. Nonetheless, there are numerous opportunities in the areas of better occupant comfort and health through optimized indoor environments, enhanced predictive modeling for energy efficiency and sustainability, and better design and operation through data-driven insights. Optimization of operations and resource allocation can lead to cost savings, and regulatory adherence can be streamlined with automated compliance and sustainability reporting. Moreover, real-world data may encourage inventiveness, early adopters of ML and DL technology a competitive advantage. Applying machine and deep learning techniques to green buildings utilizing factors such as maximizing resource use, enhancing occupant comfort, and reducing the built environment impact to enhance environmental sustainability.

DL and ML Building energy modeling technologies can benefit from artificial intelligence approaches by increasing forecasting accuracy and calibrating models using actual data. Deep learning algorithms can discover links between building parameters and energy usage, making more realistic simulations that consider intricate interconnections and uncertainties possible. The effectiveness of the suggested model is evaluated using optimal indicators necessary for statistical analysis. Statistical analysis assesses the effectiveness, standardization potential, and usefulness of DL models. The degree to which a deep learning model can identify patterns and correlations in data, as well as the intricacy and refinement of its structure, indicate its complexity. Multiple architectural features determine the DL model’s complexity. A model becomes more complex as its number of parameters increases. Even though sophisticated models can capture complicated relationships in the data, they are increasingly prone to overfitting when improperly regularized. The quantity and kind of features a model uses can affect its complexity. Even while adding more characteristics can make the model more complex, not all characteristics can have a significant impact on the model’s performance. By adding penalty components and regularizing the loss function, various ways lower the complexity of the model. A reduction in the likelihood of overfitting occurs when extremely complex metrics are avoided. The study uses the GNN, LSTM, and RNN models based on DL to tackle green building environmental sustainability. The experiment results indicate that the proposed GNN and LSTM architecture functions more accurately and efficiently than conventional DL techniques for environmental sustainability in green buildings.

Developers often use machine learning and other forms of advanced artificial intelligence, such as deep learning approaches, to help them complete their tasks more quickly and accurately. The goal of this research is to create a predictive model for GB design using ML and DL approaches to maximize resource usage, enhance occupant comfort, and decrease the environmental impact of the built environment throughout the GB design process. The proposed models are applied to a dataset, ASHARE-884. An exploratory data analysis (EDA) and data preprocessing techniques are applied, including cleaning, sorting, converting categorical data into numerical data, and normalizing the data. ML and DL techniques for green building design to enhance environmental sustainability. DL models such as GNN and LSTM perform more accurately and efficiently than all models and outperform conventional DL techniques for environmental sustainability in green buildings. However, since this research is limited regarding the dataset, this study can be extended by adding more feature datasets in further studies.

This study encourages future studies to develop a more robust ML and DL model with improved accuracy performance. Furthermore, other data preprocessing techniques will enhance the performance of models in the future. The study’s future directions will concentrate on addressing the limits of climate data resolution and period to increase the accuracy of energy consumption estimates. Longer periods and higher temporal and spatial resolution of climate information added to the ASHARE-884 dataset will improve its ability to capture long-term climatic trends as well as localized temperature changes. Moreover, integrating meteorological data with large historical energy usage data improves forecasting model calibration and validation considerably. Future research endeavors can leverage the current foundation to enhance the precision, dependability, and adaptability of models for energy consumption prediction and comprehension of the interaction between climate and energy use by exploring these study avenues.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

2021 global status report for buildings and construction | unep-un environment programme. https://www.unep.org/resources/report/2021-global-status-report-buildings-and-construction (Accessed on 02/11/2024).

Peck, D. Buildings. In Handbook of Recycling 235–247 (Elsevier, 2024).

Chapter   Google Scholar  

2021 global status report for buildings and construction | unep-un environment programme. https://www.unep.org/resources/report/2021-global-status-report-buildings-and-construction (Accessed on 02/08/2024).

Nur-E-Alam, M. et al. Machine learning-enhanced all-photovoltaic blended systems for energy-efficient sustainable buildings. Sustain. Energy Technol. Assess. 62 , 103636 (2024).

Google Scholar  

Miraj, P., Berawi, M. A. & Utami, S. R. Economic feasibility of green office building: Combining life cycle cost analysis and cost–benefit evaluation. Build. Res. Inf. 49 , 624–638 (2021).

Article   Google Scholar  

Basher, M. K., Nur-E-Alam, M., Rahman, M. M., Alameh, K. & Hinckley, S. Aesthetically appealing building integrated photovoltaic systems for net-zero energy buildings. Current status, challenges, and future developments—A review. Buildings 13 , 863 (2023).

Basher, M. K., Nur-E-Alam, M., Rahman, M. M., Hinckley, S. & Alameh, K. Design, development, and characterization of highly efficient colored photovoltaic module for sustainable buildings applications. Sustainability 14 , 4278 (2022).

Article   CAS   Google Scholar  

Zuo, J. & Zhao, Z.-Y. Green building research—Current status and future agenda: A review. Renew. Sustain. Energy Rev. 30 , 271–281. https://doi.org/10.1016/j.rser.2013.10.021 (2014).

Venkataraman, V. & Cheng, J. C. Critical success and failure factors for managing green building projects. J. Archit. Eng. 24 , 04018025 (2018).

Vasiliev, M., Nur-E-Alam, M. & Alameh, K. Initial field testing results from building-integrated solar energy harvesting windows installation in Perth, Australia. Appl. Sci. 9 , 4002 (2019).

Sharanya, B. et al. Green and sustainable building practices for museums. AIP Conf. Proc. 2039 , 20010 (2018).

Wen, B. et al. The role and contribution of green buildings on sustainable development goals. Build. Environ. 185 , 107091 (2020).

Basbagill, J., Flager, F., Lepech, M. & Fischer, M. Application of life-cycle assessment to early stage building design for reduced embodied environmental impacts. Build. Environ. 60 , 81–92. https://doi.org/10.1016/j.buildenv.2012.11.009 (2013).

Svalestuen, F., Knotten, V., Lædre, O. & Lohne, J. Planning the building design process according to level of development (2018).

Thomas, J. A., Vasiliev, M., Nur-E-Alam, M. & Alameh, K. Increasing the yield of Lactuca sativa , L. in glass greenhouses through illumination spectral filtering and development of an optical thin film filter. Sustainability 12 , 3740 (2020).

Li, Y., Song, H., Sang, P., Chen, P.-H. & Liu, X. Review of critical success factors (CSFs) for green building projects. Build. Environ. https://doi.org/10.1016/j.buildenv.2019.05.020 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Oesterreich, T. D. & Teuteberg, F. Understanding the implications of digitisation and automation in the context of industry 4.0: A triangulation approach and elements of a research agenda for the construction industry. Comput. Ind. 83 , 121–139 (2016).

Berawi, M. A. Managing artificial intelligence technology for added value. Int. J. Technol. 11 , 1–4 (2020).

Hwang, B.-G., Zhu, L. & Ming, J. T. T. Factors affecting productivity in green building construction projects: The case of singapore. J. Manag. Eng. 33 , 04016052 (2017).

Darko, A., Chan, A. P., Owusu-Manu, D.-G. & Ameyaw, E. E. Drivers for implementing green building technologies: An international survey of experts. J. Clean. Prod. 145 , 386–394. https://doi.org/10.1016/j.jclepro.2017.01.043 (2017).

Chan, A. P. C., Darko, A., Olanipekun, A. O. & Ameyaw, E. E. Critical barriers to green building technologies adoption in developing countries: The case of Ghana. J. Clean. Prod. 172 , 1067–1079. https://doi.org/10.1016/j.jclepro.2017.10.235 (2018).

Sahli, H. An introduction to machine learning. TORUS 1—Toward an open resource using Serv. Cloud computing for environmental data 61–74 (2020).

Khean, N., Fabbri, A. & Haeusler, M. H. Learning machine learning as an architect, how to. In Proceedings of the 36th eCAADe Conference , vol. 1, 95–102 (2018).

Karan, E. & Asadi, S. Intelligent designer: A computational approach to automating design of windows in buildings. Autom. Constr. 102 , 160–169 (2019).

Ascione, F., Bianco, N., De Stasio, C., Mauro, G. M. & Vanoli, G. P. Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: A novel approach. Energy 118 , 999–1017 (2017).

Rahman, A. & Smith, A. D. Predicting fuel consumption for commercial buildings with machine learning algorithms. Energy Build. 152 , 341–358 (2017).

Geyer, P. & Singaravel, S. Component-based machine learning for performance prediction in building design. Appl. Energy 228 , 1439–1453 (2018).

Article   ADS   Google Scholar  

Sapuan, N. M., Haron, N. F., Kumaran, V. V., Saudi, N. S. & Ridzuan, A. R. Green building best practices in achieving energy and environmental sustainability. Environ. Manag. Sustain. Dev. https://doi.org/10.5296/emsd.v11i4.21052 (2022).

Jiang, L. Environmental benefits of green buildings with BIM technology. Ecol. Chem. Eng. S 30 , 191–199 (2023).

Boshi, A. A. Sustainable design and green building for the design of residential buildings with high environmental value. Tex. J. Eng. Technol. 17 , 7–14 (2023).

Jaradat, H., Alshboul, O. A. M., Obeidat, I. M. & Zoubi, M. K. Green building, carbon emission, and environmental sustainability of construction industry in Jordan: Awareness, actions and barriers. Ain Shams Eng. J. 15 , 102441 (2024).

Braulio-Gonzalo, M., Jorge-Ortiz, A. & Bovea, M. D. How are indicators in green building rating systems addressing sustainability dimensions and life cycle frameworks in residential buildings?. Environ. Impact Assess. Rev. 95 , 106793 (2022).

Meena, C. S. et al. Innovation in green building sector for sustainable future. Energies 15 , 6631 (2022).

Tseng, M.-L., Li, S.-X., Lin, C.-W.R. & Chiu, A. S. Validating green building social sustainability indicators in china using the fuzzy Delphi method. J. Ind. Prod. Eng. 40 , 35–53 (2023).

Alshboul, O., Shehadeh, A., Almasabha, G. & Almuflih, A. S. Extreme gradient boosting-based machine learning approach for green building cost prediction. Sustainability 14 , 6651 (2022).

Xiang, Y., Chen, Y., Xu, J. & Chen, Z. Research on sustainability evaluation of green building engineering based on artificial intelligence and energy consumption. Energy Rep. 8 , 11378–11391 (2022).

Fan, Z., Yan, Z. & Wen, S. Deep learning and artificial intelligence in sustainability: A review of SDGs, renewable energy, and environmental health. Sustainability 15 , 13493 (2023).

Alshboul, O., Shehadeh, A., Almasabha, G., Mamlook, R. E. A. & Almuflih, A. S. Evaluating the impact of external support on green building construction cost: A hybrid mathematical and machine learning prediction approach. Buildings 12 , 1256 (2022).

De Dear, R. & Schiller Brager, G. The adaptive model of thermal comfort and energy conservation in the built environment. Int. J. Biometeorol. 45 , 100–108 (2001).

Article   ADS   PubMed   Google Scholar  

Komorowski, M., Marshall, D. C., Salciccioli, J. D. & Crutain, Y. Exploratory data analysis. Second Analysis Electronic Health Records 185–203 (2016).

Sharma, N., Bhandari, H. V., Yadav, N. S. & Shroff, H. Optimization of ids using filter-based feature selection and machine learning algorithms. Int. J. Innov. Technol. Explor. Eng. 10 , 96–102 (2020).

Zhou, J. et al. Graph neural networks: A review of methods and applications. AI Open 1 , 57–81 (2020).

Chen, G. A gentle tutorial of recurrent neural network with error backpropagation. arXiv preprint arXiv:1610.02583 (2016).

Islam, M. Z., Islam, M. M. & Asraf, A. A combined deep CNN-LSTM network for the detection of novel coronavirus (Covid-19) using x-ray images. Inform. Med. Unlocked 20 , 100412 (2020).

Download references

Acknowledgements

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R120), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Author information

Authors and affiliations.

School of Finance and Economics, Jiangsu University, Zhenjiang, China

Shahid Mahmood & Huaping Sun

School of Economics and Management, University of Science and Technology Beijing, Beijing, 100083, China

Huaping Sun

Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura, 35111, Egypt

El-Sayed M. El-kenawy

School of International Studies, Zhengzhou University, Zhengzhou, China

Asifa Iqbal

Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia

Amal H. Alharbi & Doaa Sami Khafaga

MEU Research Unit, Middle East University, Amman, Jordan

You can also search for this author in PubMed   Google Scholar

Contributions

Shahid Mahmood: Writing—original draft, Project administration, Conceptualization. Huaping Sun: Supervision, Investigation. El-Sayed M. El-kenawy: Visualization, Software, Project administration. Asifa Iqbal: Writing—review and editing. Amal H. Alharbi: Software, Data curation. Doaa Sami Khafaga: Visualization, Formal analysis.

Corresponding author

Correspondence to Shahid Mahmood .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Mahmood, S., Sun, H., El-kenawy, ES.M. et al. Integrating machine and deep learning technologies in green buildings for enhanced energy efficiency and environmental sustainability. Sci Rep 14 , 20331 (2024). https://doi.org/10.1038/s41598-024-70519-y

Download citation

Received : 19 February 2024

Accepted : 19 August 2024

Published : 02 September 2024

DOI : https://doi.org/10.1038/s41598-024-70519-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Green building
  • Environmental sustainability
  • Artificial intelligence
  • Machine learning
  • Deep learning

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

machine learning research paper

machine learning research paper

Frequently Asked Questions

JMLR Papers

Select a volume number to see its table of contents with links to the papers.

Volume 25 (January 2024 - Present)

Volume 24 (January 2023 - December 2023)

Volume 23 (January 2022 - December 2022)

Volume 22 (January 2021 - December 2021)

Volume 21 (January 2020 - December 2020)

Volume 20 (January 2019 - December 2019)

Volume 19 (August 2018 - December 2018)

Volume 18 (February 2017 - August 2018)

Volume 17 (January 2016 - January 2017)

Volume 16 (January 2015 - December 2015)

Volume 15 (January 2014 - December 2014)

Volume 14 (January 2013 - December 2013)

Volume 13 (January 2012 - December 2012)

Volume 12 (January 2011 - December 2011)

Volume 11 (January 2010 - December 2010)

Volume 10 (January 2009 - December 2009)

Volume 9 (January 2008 - December 2008)

Volume 8 (January 2007 - December 2007)

Volume 7 (January 2006 - December 2006)

Volume 6 (January 2005 - December 2005)

Volume 5 (December 2003 - December 2004)

Volume 4 (Apr 2003 - December 2003)

Volume 3 (Jul 2002 - Mar 2003)

Volume 2 (Oct 2001 - Mar 2002)

Volume 1 (Oct 2000 - Sep 2001)

Special Topics

Bayesian Optimization

Learning from Electronic Health Data (December 2016)

Gesture Recognition (May 2012 - present)

Large Scale Learning (Jul 2009 - present)

Mining and Learning with Graphs and Relations (February 2009 - present)

Grammar Induction, Representation of Language and Language Learning (Nov 2010 - Apr 2011)

Causality (Sep 2007 - May 2010)

Model Selection (Apr 2007 - Jul 2010)

Conference on Learning Theory 2005 (February 2007 - Jul 2007)

Machine Learning for Computer Security (December 2006)

Machine Learning and Large Scale Optimization (Jul 2006 - Oct 2006)

Approaches and Applications of Inductive Programming (February 2006 - Mar 2006)

Learning Theory (Jun 2004 - Aug 2004)

Special Issues

In Memory of Alexey Chervonenkis (Sep 2015)

Independent Components Analysis (December 2003)

Learning Theory (Oct 2003)

Inductive Logic Programming (Aug 2003)

Fusion of Domain Knowledge with Data for Decision Support (Jul 2003)

Variable and Feature Selection (Mar 2003)

Machine Learning Methods for Text and Images (February 2003)

Eighteenth International Conference on Machine Learning (ICML2001) (December 2002)

Computational Learning Theory (Nov 2002)

Shallow Parsing (Mar 2002)

Kernel Methods (December 2001)

.

machine learning research paper

Frequently Asked Questions

JMLR Papers

Select a volume number to see its table of contents with links to the papers.

Volume 23 (January 2022 - Present)

Volume 22 (January 2021 - December 2021)

Volume 21 (January 2020 - December 2020)

Volume 20 (January 2019 - December 2019)

Volume 19 (August 2018 - December 2018)

Volume 18 (February 2017 - August 2018)

Volume 17 (January 2016 - January 2017)

Volume 16 (January 2015 - December 2015)

Volume 15 (January 2014 - December 2014)

Volume 14 (January 2013 - December 2013)

Volume 13 (January 2012 - December 2012)

Volume 12 (January 2011 - December 2011)

Volume 11 (January 2010 - December 2010)

Volume 10 (January 2009 - December 2009)

Volume 9 (January 2008 - December 2008)

Volume 8 (January 2007 - December 2007)

Volume 7 (January 2006 - December 2006)

Volume 6 (January 2005 - December 2005)

Volume 5 (December 2003 - December 2004)

Volume 4 (Apr 2003 - December 2003)

Volume 3 (Jul 2002 - Mar 2003)

Volume 2 (Oct 2001 - Mar 2002)

Volume 1 (Oct 2000 - Sep 2001)

Special Topics

Bayesian Optimization

Learning from Electronic Health Data (December 2016)

Gesture Recognition (May 2012 - present)

Large Scale Learning (Jul 2009 - present)

Mining and Learning with Graphs and Relations (February 2009 - present)

Grammar Induction, Representation of Language and Language Learning (Nov 2010 - Apr 2011)

Causality (Sep 2007 - May 2010)

Model Selection (Apr 2007 - Jul 2010)

Conference on Learning Theory 2005 (February 2007 - Jul 2007)

Machine Learning for Computer Security (December 2006)

Machine Learning and Large Scale Optimization (Jul 2006 - Oct 2006)

Approaches and Applications of Inductive Programming (February 2006 - Mar 2006)

Learning Theory (Jun 2004 - Aug 2004)

Special Issues

In Memory of Alexey Chervonenkis (Sep 2015)

Independent Components Analysis (December 2003)

Learning Theory (Oct 2003)

Inductive Logic Programming (Aug 2003)

Fusion of Domain Knowledge with Data for Decision Support (Jul 2003)

Variable and Feature Selection (Mar 2003)

Machine Learning Methods for Text and Images (February 2003)

Eighteenth International Conference on Machine Learning (ICML2001) (December 2002)

Computational Learning Theory (Nov 2002)

Shallow Parsing (Mar 2002)

Kernel Methods (December 2001)

.

IMAGES

  1. Machine Learning Projects Research Papers

    machine learning research paper

  2. Top Machine Learning (ML) Research Papers Released in 2022

    machine learning research paper

  3. Top 10 Machine Learning Research Papers of 2021

    machine learning research paper

  4. This Research Paper Explain The Compute Trends Across Three Eras Of

    machine learning research paper

  5. (PDF) An Overview of Machine Learning and its Applications

    machine learning research paper

  6. machine learning research papers 2019 ieee

    machine learning research paper

VIDEO

  1. Why you should read Research Papers in ML & DL? #machinelearning #deeplearning

  2. MLDescent #1: Can Anyone write a Research Paper in the Age of AI?

  3. Extreme Learning Machine: Learning Without Iterative Tuning

  4. 2024 Empowering Minds Through Data Science and Machine Learning Symposium: Jinferg Zhang PHD

  5. Azure Machine Learning

  6. Topological Machine Learning I: Features and Kernels

COMMENTS

  1. The latest in Machine Learning

    answerdotai/rerankers • • 30 Aug 2024. This paper presents rerankers, a Python library which provides an easy-to-use interface to the most commonly used re-ranking approaches. Re-Ranking Retrieval. 725. 1.20 stars / hour. Paper. Code.

  2. Journal of Machine Learning Research

    The Journal of Machine Learning Research (JMLR), established in 2000, provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing.

  3. Machine Learning: Algorithms, Real-World Applications and Research

    To discuss the applicability of machine learning-based solutions in various real-world application domains. To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services. The rest of the paper is organized as follows.

  4. Machine learning

    Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have ...

  5. The Journal of Machine Learning Research

    The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning.JMLR seeks previously unpublished papers that contain:new algorithms with empirical, theoretical, psychological, or biological justification; experimental and/or theoretical studies yielding new insight into ...

  6. The Journal of Machine Learning Research

    Benjamin Recht. Article No.: 20, Pages 724-750. This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite ...

  7. Journal of Machine Learning Research

    Finite-time Koopman Identifier: A Unified Batch-online Learning Framework for Joint Learning of Koopman Structure and Parameters. Majid Mazouchi, Subramanya Nageshrao, Hamidreza Modares; (336):1−35, 2023. [abs] [pdf] [bib] The Art of BART: Minimax Optimality over Nonhomogeneous Smoothness in High Dimension.

  8. Journal of Machine Learning Research

    Journal of Machine Learning Research. JMLR Volume 21. A Low Complexity Algorithm with O (√T) Regret and O (1) Constraint Violations for Online Convex Optimization with Long Term Constraints. Hao Yu, Michael J. Neely; (1):1−24, 2020. [abs] [pdf] [bib] A Statistical Learning Approach to Modal Regression.

  9. Journal of Machine Learning Research

    Journal of Machine Learning Research. The Journal of Machine Learning Research (JMLR), established in 2000, provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning.All published papers are freely available online. News. 2022.02.18: New blog post: Retrospectives from 20 Years of JMLR .

  10. Journal of Machine Learning Research

    JMLR Volume 23. Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models. Subhabrata Majumdar, George Michailidis; (1):1−53, 2022. [abs] [pdf] [bib] [code] Debiased Distributed Learning for Sparse Partial Linear Models in High Dimensions. Shaogao Lv, Heng Lian; (2):1−32, 2022.

  11. Home

    Improves how machine learning research is conducted. Prioritizes verifiable and replicable supporting evidence in all published papers. Editor-in-Chief. Hendrik Blockeel; Journal Impact Factor 4.3 (2023) 5-year Journal Impact Factor 5.8 (2023) Submission to first decision (median) 16 days.

  12. Machine Learning

    Title:Planning In Natural Language Improves LLM Search For Code Generation. Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

  13. Machine Learning

    Title:Maximum a Posteriori Estimation for Linear Structural Dynamics Models Using Bayesian Optimization with Rational Polynomial Chaos Expansions. Felix Schneider, Iason Papaioannou, Bruno Sudret, Gerhard Müller. Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Total of 74 entries : 1-25 26-50 51-74.

  14. Forecasting the future of artificial intelligence with machine learning

    Specifically, in the field of artificial intelligence (AI) and machine learning (ML), the number of papers every month is growing exponentially with a doubling rate of roughly 23 months (Fig. 1 ...

  15. Machine learning

    Weather and climate predicted accurately — without using a supercomputer. A cutting-edge global model of the atmosphere combines machine learning with a numerical model based on the laws of ...

  16. Machine learning-based approach: global trends, research directions

    Since ML appeared in the 1990s, all published documents (i.e., journal papers, reviews, conference papers, preprints, code repositories and more) related to this field from 1990 to 2020 have been selected, and specifically, within the search fields, the following keywords were used: "machine learning" OR "machine learning-based approach" OR ...

  17. Journal of Machine Learning Research

    MushroomRL: Simplifying Reinforcement Learning Research Carlo D'Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters; (131):1−5, 2021. (Machine Learning Open Source Software Paper) Locally Differentially-Private Randomized Response for Discrete Distribution Learning

  18. Machine Learning: Models, Challenges, and Research Directions

    Machine learning techniques have emerged as a transformative force, revolutionizing various application domains, particularly cybersecurity. The development of optimal machine learning applications requires the integration of multiple processes, such as data pre-processing, model selection, and parameter optimization. While existing surveys have shed light on these techniques, they have mainly ...

  19. [2104.05314] Machine learning and deep learning

    Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep ...

  20. Interpretable machine learning for weather and climate prediction: A

    In this paper, we review current interpretable machine learning approaches applied to meteorological predictions. We categorize methods into two major paradigms: (1) Post-hoc interpretability techniques that explain pre-trained models, such as perturbation-based, game theory based, and gradient-based attribution methods.

  21. Full article: Explaining machine learning practice: findings from an

    From XAI to machine learning practice. XAI is a fast-growing and vibrant field concerned with making ML models interpretable, intelligible or explainable (Doshi-Velez & Kim, Citation 2017; Gunning et al., Citation 2019).This includes attention to transparency - rendering the internal working of models open to inspection - and/or post hoc explanations, explaining how models have reached a ...

  22. Performance and reliability evaluation of an improved machine learning

    Although recent technological innovation in clinical applications through artificial intelligence and machine learning (ML) is starting to have a widespread impact in different clinical settings, including the detection of retinal diseases 5, 6 and medical imaging, 7, 8 they remain rarely applied in the hearing healthcare system (e.g., speech ...

  23. Journal of Machine Learning Research

    Journal of Machine Learning Research. JMLR Volume 23. Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models. Subhabrata Majumdar, George Michailidis; (1):1−53, 2022. [abs] [pdf] [bib] [code] Debiased Distributed Learning for Sparse Partial Linear Models in High Dimensions.

  24. Top Machine Learning Research Papers 2024

    Machine learning and deep learning have accomplished various astounding feats, and key research articles have resulted in technical advances used by billions of people. The research in this sector is advancing at a breakneck pace and assisting you to keep up. Here is a collection of the most important scientific study papers in machine learning.

  25. Machine Learning: Algorithms, Real-World Applications and Research

    To discuss the applicability of machine learning-based solutions in various real-world application domains. To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services. The rest of the paper is organized as follows.

  26. Evidence for Predictive Machine Learning Algorithms in Primary Care

    This systematic review assesses the quality of evidence from scientific literature and registration databases for machine learning algorithms implemented ... Kortekaas MF, Rutten FH, et al. Routine primary care data for scientific research, quality of care programs and educational purposes: the Julius General Practitioners' Network ...

  27. Integrating machine and deep learning technologies in green buildings

    To anticipate the costs of green buildings, the author of the 35 paper proposes machine learning-based techniques such as random forest (RF), deep neural network (DNN), and extreme gradient ...

  28. Enhancing Cybersecurity: A Comprehensive Analysis of Machine Learning

    Machine learning is a powerful tool used to strive against phishing attacks. This paper surveys the features used for detection and detection techniques using machine learning.

  29. JMLR Papers

    JMLR Papers. Select a volume number to see its table of contents with links to the papers. Volume 25 (January 2024 - Present) . Volume 24 (January 2023 - December 2023) . Volume 23 (January 2022 - December 2022) . Volume 22 (January 2021 - December 2021) . Volume 21 (January 2020 - December 2020) . Volume 20 (January 2019 - December 2019) ...

  30. JMLR Papers

    JMLR Papers. Select a volume number to see its table of contents with links to the papers. Volume 23 (January 2022 - Present) . Volume 22 (January 2021 - December 2021) . Volume 21 (January 2020 - December 2020) . Volume 20 (January 2019 - December 2019) . Volume 19 (August 2018 - December 2018) . Volume 18 (February 2017 - August 2018) . Volume 17 (January 2016 - January 2017)