Machine learning (ML) is essentially a sub-branch of computer science after the studies of numerical learning and model recognition in artificial intelligence in 1959.Machine learning and artificial intelligence are often considered together. In some case, they are used interchangeably, but they do not mean the same. An important distinction is that while all machine learning solutions are artificial intelligence but all artificial intelligence solutions are not machine learning.
ML is a system that can learn as a structural function and investigate the work and construction of algorithms that can make predictions over data. Such algorithms work by building a model to make data-based predictions and decisions from sample inputs rather than following static program instructions strictly. The definition of the “Machine Learning” by some researcher as follows:
The definition of ML by Kevin Murphy: “as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty”.
The other definition from Thomas W. Edgar, David O. Manz: Machine learning is a field of study that looks at using computational algorithms to turn empirical data into usable models.
TYPES OF MACHINE LEARNING
Machine Learning types vary according to the intended of use and target outputs. As shown as Figure 1, ML have three main categories: Supervised Learning, Unsupervised Learning and Reinforcement Learning.
A supervised learning train a model with a dataset which include input(X) and labeled output(Y) variables. According to the model trained in supervised learning, our output data is estimated according to the new input data. Supervised Learning also categorized as Classification and Regression as shown as Figure 2
Supervised Learning algorithm mostly used are: Linear Regression; Logistical Regression; Random Forest; Gradient Boosted Trees; Support Vector Machines (SVM); Neural Networks; Decision Trees; Naive Bayes; Nearest Neighbor.
It is a learning method with no data set and no outputs. The purpose of unsupervised learning is to model the structure or distribution that underlies the data to learn more about the data. Unsupervised Learning algorithms are Descriptive Modelling and use unlabeled data.
Clustering: Grouping according to similarity and relationship among the data given.
Unsupervised Learning algorithm mostly used are: K-means clustering; t-SNE (t-Distributed Stochastic Neighbor Embedding); PCA (Principal Component Analysis); Association rule.
Reinforcement learning is a machine learning approach inspired by behaviorism that deals with what actions subjects must take in order to achieve the highest amount of reward in an environment. Due to its generality, this problem is also studied in many other branches such as game theory, control theory, operations research, information theory, simulation-based optimization, and statistics. In machine learning, the environment is often modeled as a Markov Decision Process (MDP), in this context many reinforcement learning algorithms use dynamic programming techniques.
SEMI-SUPERVISED MACHINE LEARNING METHODS
Semi-Supervised methods are methods introduced due to two big disadvantages of supervised and unsupervised methods. The costly process of supervised learning on the big volume of the data and limited spectrum of application of unsupervised machine learning methods. Semi-supervised methods need a smaller amount of labeled data, and the rest are unlabeled. So typically, we use clustering (unsupervised method) to cluster bigger part of the data, and the rest we label with some supervised method.
MACHINE LEARNING PYTHON LIBRARIES
The programming language Python was chosen for all tools implemented in this thesis. This decision was made because the language is easy to use, freely available, there are many useful libraries for the usage area and it is becoming a default language for scientific computing . As long as Python is a high-level scripting language, the tools don’t depend on the type of operating system. Most well-known machine learning libraries are written in Python: Scikit-learn is an open-source machine learning library. It provides a lot of models for supervised and unsupervised machine learning together with tools for model fitting, data pre- processing, and so on. It is the most suitable library for this thesis.
This library is built on the top of numpy (scientific computing) and scipy (mathematics, science, and engineering) libraries. This library offers model persistence. So a trained model can be easily saved to a file and load for data prediction anytime. There are some security and dependency limitations, but it is beneficial to save the trained model and use it when it is needed.
CNTK is a unified deep-learning toolkit from Microsoft Corporation. It is open-source since April 2015, and it provides us well-documented Python API. Keras is a high-level neural networks API. Library with a focus on fast experimentation. Keras is capable of running on the top of TensorFlow, CNTK, and Theano. PyTorch is an open-source machine learning framework that accelerates the path from research prototyping to production deployment.