Deep learning enables the establishment of more complex concepts from simple concepts. Deep Learning solves the confusion in the definition of an image by breaking it down into simpler nested matches using the simplest pixel inputs.
One of the greatest dreams of inventors throughout history has been to make machines that can think. The point that computer technologies have reached today allows the realization of machines that can think, dreamed of by inventors. This idea, which has ceased to be a dream, has become a branch of science defined as “Artificial Intelligence” in the literature, which is actively researched at the same time, on which many applications are developed. Thanks to these developments, smart software is used in many areas from automation systems to image recognition, from speech understanding to natural language processing and even medical diagnosis.
In the early days of artificial intelligence, problems that were difficult for humans to solve but that could be defined by formal or mathematical rules could easily be solved. However, it has been revealed that artificial intelligence has problems in solving more intuitive problems that cannot be expressed formally or mathematically, such as face recognition and voice recognition, which people constantly encounter in their daily life.
Many approaches have been studied to solve such heuristic problems. The “Deep Learning” approach, which we also use in our thesis, is the most used machine learning algorithm for solving problems that are difficult to express intuitively and mathematically.
Deep learning associates each concept with more basic concepts without the need for human input of formal or mathematical rules required by computers. It enables students to learn the concept hierarchy created after associating. With the help of computer concept hierarchy, it creates basic complex concepts starting from simpler concepts. The hierarchy of concepts created becomes a multi-layered structure and gains a depth because the layers are formed by overlapping each other. Because of this depth, it is called “Deep Learning”.
The Challenges of Machine Learning
Machine learning algorithms work well on a wide variety of important problems. However, these algorithms have not been successful in many central artificial intelligence problems such as speech recognition and object recognition.
Classical machine learning algorithms fail to learn complex functions in high-dimensional spaces. Deep learning is designed to eliminate these and similar problems.
The ability of previously unobserved inputs to work well in machine learning is called generalization. One of the most fundamental problems of machine learning is the inability to develop an algorithm that will perform well not only on the training set but also on new inputs. Many strategies used in machine learning aim to directly reduce test error, but doing so often increases training error. Changes made to reduce test errors without changing the training error are called regularization. Unlike machine learning, there are many regularization forms available to deep learning practitioners.
The correct operation of algorithms in machine learning depends on the correct creation of features and data representations from raw data. Problem solving can become easier if the algorithm automatically learns the properties of the raw data. Deep Learning has overcome this challenge by providing automatic learning on raw data in the learning process.
In deep learning, there is no need to define the items given as input to the algorithm with certain criteria. Because the algorithm can process large amounts of data and identify details without outside intervention. For this reason, large datasets are needed to train deep learning networks. Thus, instead of learning by inferring the characteristics and criteria that define the data, learning is provided without any problems by accessing information on millions of data with the large data sets provided to the system.
Convolutional Neural Networks (CNN)
Convolutional Neural Network or Convolutional Neural Networks (CNN) is a Deep Learning model that is frequently used in solving natural language processing and computer vision problems today. It is a special neural network used to process data with a grid-like structure.
Convolution is mathematically combining two signals and producing a third signal. Its mathematical representation is as shown in Equation 3.6.
[ ∗ ]( ) ≝ ∞ ( ) ( − )
Convolution in neural networks can be defined as a filtering operation. The aim is to extract features or information from the raw data (text, image, etc.) given as input. Each data is considered as a matrix of values. In convolutional neural networks, a filter is represented by a vector of scales. The filter measures how similar an input piece is to a feature.
Convolutional Neural Networks are very similar to normal neural networks. They also consist of neurons with learnable weights and biases. The most important difference is the number of layers. CNN is just a few layers of folding with nonlinear activation functions applied to the results. In a conventional neural network, each input neuron is connected to each output neuron in the next layer. This is called a fully connected layer. CNNs instead use convolutions on the input layer to calculate the output. This results in local connections where each region of the input connects to a neuron at the output. Each layer usually applies hundreds or thousands of different filters and combines their results.
The main feature of Convolutional Neural Networks is the use of pooling layers, which are typically applied after the convolutional layers. Pool layers sample their input. A feature of pooling is that it provides a fixed-size output matrix, which is usually required for classification. Pooling also reduces output dimensionality while preserving the most salient information. We can think that each filter detects a certain feature.
Recurrent Neural Networks (RNN)
Recurrent Neural Networks are a family of specialized neural networks for processing sequential data. RNNs have a structure that can model the ability to establish a relationship between past and future information, which is a feature of human memory. In traditional neural networks, all inputs and outputs are independent of each other. E.g; Remembering previous words is imperative when we want to predict the next word of a sentence.
Unlike in Feed Forward neural networks, where information flows in one direction from layer to layer, in Recurrent Neural Networks, information progresses in cycles from layer to layer so that it can be affected by previous states of the situation. Also, RNNs have a memory that allows the model to store information about its past computations. This enables recurrent neural networks to exhibit dynamic transient behavior.
Recurrent neural networks have had great success in natural language processing (NLP) applications because they can model temporal sequences of input-output pairs.
Long Short Continuous Memory Networks (LSTM)
Long Short-Term Memory networks, in short, LSTMs, are a type of RNN developed to avoid the long-term dependency problem experienced in repetitive neural networks. RNNs work very well in modeling short-term memories. However, these models are not effective in long-term addictions. This is because after a certain time step, the model’s gradient calculations go to infinity or reset.
RNNs use distributed internal memories to map real-valued input sequences to real-valued output sequences. However, RNNs potentially have gradient descent problems in a very general and continuous domain of noise-resistant algorithms. Noise is a factor that makes learning difficult. The long short-term memory (LSTM) method is not affected by this problem.
Deep Learning Development Environments
Anaconda is a data science platform widely used by data scientists and computer programmers, with the increasing interest in machine learning and deep learning in recent years. Anaconda is the distributor of the Python and R programming languages. It is also an open source package manager. Designed for machine learning and data science, it is often used for large-scale data processing, scientific computing, and predictive analytics. It offers a collection of over 1,000 data science packages.
It is an open source machine learning library based on the concept of data flow graphs, developed by Google in 2015. It provides the coding and differentiation of the data given in the system. Tensorflow uses objects called tensor to represent deep learning data. Tensors are multidimensional data arrays that allow us to represent data with high dimensions.
Keras is an open source high-level neural network library that can run on top of TensorFlow, CNTK, Theano, and MXNet. In addition to standard neural networks, Keras also supports convolutional and recurrent neural networks. It supports other general helper layers such as dropout, batch normalization, and pooling. Keras is an easy-to-use library that supports a simple and intuitive syntax.
Keras uses the Sequential wrapper class for a neural network model consisting of arrayed layers. It uses the add() command to add a new layer while creating the neural network, and the compile(), fit() and evaluate() commands to train and test the created network.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Dropout from keras.layers.embeddings import Embedding
# random dataset generation
X_train = np.random.random((1000, 20))
Y_train = np.random.random(2, size=(1000, 1))
X_test = np.random.random((100, 20))
Y_test = np.random.random(2, size=(100, 1))
# creation of artificial neural network
model_lstm = Sequential()
model_lstm.add(Embedding(1000, 64, input_length=100))
model_lstm.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2))
# compiling the model
model_lstm.compile(loss=’binary_crossentropy’, optimizer=’man’, metrics=[‘accuracy’])
#training of the model
#testing the model
score= model_conv.evaluate(X_test, Y_test, batch_size = 128)
Above is an example of a simple model created in Keras. For the compile() method used in compiling the model;
• optimizer() is used to update the parameters of the neural network. It does this by looking at the “loss” function and the data.
• loss() is used to measure how accurate the model is during optimization.
• metrics() is a metric extracted from the entire data set in training and testing steps. “Accuracy” measures how accurately samples are classified.
The nodes and layers of the model architecture can be seen with the model.summary() command. For the training of the model, the epoch number and the training set produced for the test are specified in the fit() method. The evaluate() method is used for post-training evaluation.