UNIT-1
Syllabus:
Introduction - Well-posed learning problems, designing a learning system,
Perspectives and issues in machine learning.
Concept learning and the general to specific ordering – introduction, a concept learning
task, concept learning as search, find-S: finding a maximally specific
hypothesis, version spaces and the candidate elimination algorithm, remarks on
version spaces and candidate elimination, inductive bias, Gradient Descent
Algorithm and its variants.
In the real
world, we are surrounded by humans who
can learn everything from their
experiences with their learning
capability, and we have computers or machines which work on our instructions. But can a machine also learn from experiences or past data like
a human does? So here comes the role of Machine Learning.
What is learning?
Learning is any process
by which a system improves performance from experience.
· Identifying patterns.
· Recognizing those patterns when you see them again.
Well-defined learning problem or well-posed learning problem:
A computer
program is said to
learn from experience E with respect
to some class of tasks T and performance measure P , if its
performance at tasks T, as measured by P , improves with experience E.
“Learning is any process by
which a system
improves performance from experience.”
Herbert Simon Definition by Tom Mitchell
(1998):
Machine Learning is the study of algorithms that
•
improve their
performance P
•
at some task T
•
With experience E.
•
A well-defined learning problem or well-posed learning problem is given by <P, T, E>
Defining the Learning Task:
Improve on task T, with respect
to performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against
an arbitrary opponent
E: Playing practice games against
itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving
on four-lane highways
using vision sensors
P: Average
distance traveled before
a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email
messages as spam or legitimate.
P: Percentage of email
messages correctly classified.
E: Database
of emails, some with
human-given labels
Machine
learning (ML) is defined as a discipline of
artificial intelligence (AI) that provides machines the ability to automatically learn from data and past
experiences to identify patterns and make predictions with minimal human
intervention.
Machine Learning is an application of Artificial
Intelligence that enables systems to learn from vast volumes of data and solve
specific problems. It uses computer algorithms that improve their efficiency
automatically through experience.
How does Machine
Learning work?
A
Machine Learning system learns from
historical data, builds the prediction models, and whenever it receives new
data, predicts the output for it.
The
accuracy of predicted output depends upon the
amount of data, as the huge
amount of data helps to build a better model which predicts the output more
accurately.
Suppose
we have a complex problem, where we need to perform some predictions, so
instead of writing a code for it, we
just need to feed the data to generic algorithms, and with the help of these
algorithms, machine builds the logic as per the data and predict the output.
Machine learning has changed our way of thinking about the problem. The below
block diagram explains the working of Machine Learning algorithm:
The below block diagram
explains the working
of Machine Learning
algorithm:
The need
for machine learning is increasing day by day. The
reason behind the need for machine learning is that it is capable of doing
tasks that are too complex for a person to implement directly. As a human,
we have some limitations as we
cannot access the huge amount of
data manually, so for this, we need some computer systems and here comes the machine learning to make things easy for us.
We can
train machine learning algorithms by providing them the huge amount of data and
let them explore the data,
construct the models,
and predict the required output
automatically. The
performance
of the machine learning algorithm depends on the amount of data, and it can be
determined by the cost function. With the help
of machine learning, we can save both time and money.
The importance of machine learning can
be easily understood by its uses cases, currently, machine learning is used in self-driving
cars, cyber fraud detection, face recognition, and friend suggestion by Facebook,
etc. Various top companies such as Netflix and Amazon have built machine
learning models that are using a vast amount of data to analyze the user
interest and recommend product accordingly.
How to get started
with Machine Learning?
To get started, let’s take a look at some of the important terminologies.
Terminology:
· Model:
Also known as “hypothesis”, a machine learning model is the mathematical
representation of a real-world process. A machine
learning algorithm along
with the training
data builds a machine
learning model.
· Feature: A feature is a measurable property or parameter of the data-set.
· Feature Vector:
It is a set of multiple numeric features. We use it as an input to the machine
learning model for training and prediction purposes.
· Training:
An algorithm takes a set of data known as “training data” as input. The
learning algorithm finds patterns in the input data and trains the model for
expected results (target). The output of the training process is the machine
learning model.
·
Prediction: Once the machine learning model is ready, it can
be fed with input data to provide a predicted output.
·
Target (Label): The value that the machine learning
model has to predict is called the target or label.
·
Over fitting: When a model performs very well for training data but has poor performance with test data
(new data), it is known as over fitting. In this case, the machine learning model learns
the details and noise
in the training data such that it negatively affects the performance of the model
on test data. Over fitting can happen due to low bias and high variance.
· Under fitting:
When a model has not learned the patterns in the training data
well and is unable to generalize well
on the new data, it is known as under fitting. An under fit model has poor
performance on the training data and
will result in unreliable predictions. Under fitting occurs due to high bias and low variance.
Machine learning Life cycle:
Machine learning has
given the computer systems the abilities to automatically learn without being
explicitly programmed. But how does a machine learning system work? So, it can
be described using the life cycle of machine learning. Machine learning life
cycle is a cyclic process to build an efficient machine learning project. The
main purpose of the life cycle is to find a solution to the problem or project.
Machine learning
life cycle involves
seven major steps, which are given below:
1.
Gathering Data
2. Preparing that data
3. Choosing
a model
4.
Training
5. Evaluation
6.
Hyper parameter Tuning
7. Prediction
Machine Learning Steps
The
task of imparting intelligence to machines seems daunting and impossible. But
it is actually really easy. It can be
broken down into 7 major steps:
Collecting
Data: - The
quality and quantity of your data directly impact the performance and
reliability of your machine learning model. Poor quality data will lead to poor
model performance, regardless of the sophistication of the algorithms used.
Key
Considerations in Data Collection:
- Data Sources:
Identify and access relevant data sources (e.g., databases, APIs, sensors,
public datasets).
- Data Quality:
Ensure data accuracy, completeness, consistency, and relevance.
- Data Volume:
Collect enough data to train a robust and reliable model.
- Data Diversity:
Collect data that represents the real-world scenarios the model will
encounter.
- Data Bias:
Be mindful of potential biases in the data and take steps to mitigate
them.
Figure 2: Collecting Data
After you have
your data, you have to prepare it. You
can do this by:
·
Putting
together all the data you have and randomizing it. This helps make sure that
data is evenly distributed, and the ordering does not affect the learning
process.
·
Cleaning the
data to remove unwanted data, missing values, rows, and columns, duplicate
values, data type conversion, etc. You might even have to restructure the
dataset and change the rows and
columns or index of rows and columns.
·
Visualize the data to
understand how it is structured and understand the relationship between various
variables and classes present.
·
Splitting
the cleaned data into two sets - a training set and a testing set. The training
set is the set your model learns from. A testing set is used to check the
accuracy of your model after training.
Figure
3: Cleaning and Visualizing Data
2. Choosing a Model: -
A machine
learning model determines the output you get after running a machine learning
algorithm on the collected data. It is important to choose a model which is
relevant to the task at hand. Over the years, scientists and engineers
developed various models suited for different tasks like speech recognition,
image recognition, prediction, etc. Apart from this, you also have to see if
your model is suited for numerical or categorical data and choose accordingly.
Figure 4: Choosing a model
3. Training the Model: -
Training is the most important step in machine
learning. In training, you pass the prepared data to
your machine learning model to find patterns and make predictions. It results
in the model learning from the data so that it can accomplish the task set.
Over time, with training, the model gets better at predicting.
Figure
5: Training a model
4.
Evaluating the Model:
-
After training your
model, you have to check to see how it’s performing. This is done by testing
the performance of the model on previously unseen data. The unseen data used is
the testing set that you split our data into earlier. If testing was done on
the same data which is used for training, you will not get an accurate measure,
as the model is already used to the data, and finds the same patterns in it, as
it previously did. This will give you disproportionately high accuracy. When
used on testing data, you get an accurate measure of how your model will
perform and its speed.
Figure 6: Evaluating a model
5. Parameter Tuning: -
Once you have created
and evaluated your model, see if its accuracy can be improved in any way. This is done by tuning the parameters present in your model. Parameters are the variables
in the model that the programmer generally decides. At a particular value
of your parameter, the accuracy will be the maximum. Parameter tuning refers to
finding these values.
Figure 7: Parameter Tuning
6. Making Predictions: -
In the end, you can use your model on
unseen data to make predictions accurately.
Examples of Machine Learning
Applications:
Machine learning is a buzzword
for today's technology, and it is growing very rapidly day by day. We are using
machine learning in our daily life even without knowing it such as Google Maps,
Google assistant, Alexa, etc.
Given below are some real examples of ML:
If you have used Netflix, then you must know that it recommends you some movies or
shows for watching based on what you have watched earlier. Machine Learning is used for this
recommendation and to select the data which matches your choice. It uses the earlier data.
The second example would be Facebook.
When you upload a photo on
Facebook, it can recognize a person in that photo and suggest you, mutual friends. ML is used for
these predictions. It uses data like your friend-list, photos available etc.
and it makes predictions based on that.
The third example is Software,
which shows how you will look when you get older. This image processing also
uses machine learning.
Below are some most trending real-world applications of
Machine Learning:
1. Image Recognition: -
Image recognition is
one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic
friend tagging suggestion.
Facebook provides us a
feature of auto friend tagging suggestion. Whenever we upload a photo with our
Facebook friends, then we automatically get a tagging suggestion with name, and
the technology behind this is machine learning's face detection and recognition
algorithm.
It is based on the
Facebook project named "Deep Face,"
which is responsible for face recognition and person identification in the
picture.
2. Speech Recognition: -
While using Google, we get an
option of "Search by voice,"
it comes under speech recognition, and it's a popular application of machine
learning.
Speech recognition is a process
of converting voice
instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are
widely used by various
applications of speech recognition.
Google
assistant, Siri,
Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.
3. Traffic prediction: -
If we want to visit a new place,
we take help of Google Maps, which shows us the correct path with the shortest
route and predicts the traffic conditions.
It predicts
the traffic conditions such as whether
traffic is cleared,
slow-moving, or heavily congested with the help of two
ways:
· Real Time location of the vehicle form Google Map app and sensors
· Average time has
taken on past days at the same time.
Everyone who is using Google Map
is helping this app to make it better. It takes information from the user and
sends back to its database to improve the performance.
4. Product recommendations: -
Machine learning is widely
used by various
e-commerce and entertainment companies such as Amazon, Netflix,
etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is because of
machine learning.
Google understands the user
interest using various machine learning algorithms and suggests the product as per customer
interest.
As similar,
when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
-
One of the most exciting
applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while
driving.
6. Email Spam and
Malware Filtering: -
Whenever we receive a new email,
it is filtered automatically as important, normal, and spam. We always receive
an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning. Below are some spam filters used by Gmail:
· Content Filter
Some machine
learning algorithms such as
Multi-Layer Perceptron, Decision
tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.
7. Virtual Personal
Assistant: -
We have various
virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us in various ways just by our
voice instructions such as
Play music, call someone, Open an email,
Scheduling an appointment, etc. These virtual assistants use machine learning
algorithms as an important part. These assistant record our voice instructions,
send it over the server on a cloud, and decode it using ML algorithms and act
accordingly.
Online Fraud Detection: -
Machine learning is making our
online transaction safe and secure by detecting fraud transaction. Whenever we
perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake
accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
For each genuine transaction, the
output is converted into some hash values, and these values become the input for the
next round. For each genuine transaction, there is a specific pattern which
gets change for the fraud transaction hence, it detects it and makes our online
transactions more secure.
8. Stock Market
trading: -
Machine learning is widely used
in stock market trading. In the stock market, there is always a risk of up and
downs in shares, so for this machine learning's long short term memory neural network is used for the prediction of
stock market trends.
9. Medical Diagnosis: -
In medical science, machine
learning is used for diseases diagnoses. With this, medical technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
It helps in finding
brain tumors and other brain-related diseases easily.
10. Automatic Language
Translation: -
Nowadays, if we visit a new place
and we are not aware of the language then it is not a problem at all, as for
this also machine learning helps us by converting the text into our known
languages. Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text into our
familiar language, and it called as automatic translation.
The technology behind the
automatic translation is a sequence to sequence learning algorithm, which is
used with image recognition and translates the text from one language to
another language.
11. Analyzing the user’s feedback
using Sentiment Analysis: -
Sentiment analysis is a top-notch
machine learning application that refers to sentiment classification, opinion
mining, and analyzing emotions. Using this model, machines groom themselves to
analyze sentiments based on the words. They can identify if the words are said
in a positive, negative,
or neutral notion. Also, they can define the magnitude of these words.
With the help of the
process called Natural Language Processing (NLP), data-miners automatically
extract and conclude the opinion by analyzing both types of machine learning
algorithms – supervised and unsupervised data. Companies that are dealing with
customers use this model to improve customer experience based on feedback.
Another machine learning example
is – Music applications. Apps like Ganna.com, Jiosaavn also suggest music based
on user sentiments by analyzing the history of songs played, favorite playlists,
and even the time of listings music.
Types of Machine
Learning: -
Machine learning
is a subset of AI, which enables
the machine to automatically learn from data, improve performance from past experiences, and make predictions. Machine learning
contains a set of
algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and on the
basis of training, they build the model & perform a specific task.
These ML algorithms help to solve different business
problems like Regression, Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of
learning, machine learning is divided into mainly four types, which are:
1.
Supervised Machine
Learning
2. Unsupervised Machine
Learning
3.
Semi-Supervised Machine
Learning
4. Reinforcement Learning
Reinforcement Learning
Applications of Machine Learning
|
|
Supervised learning |
Unsupervised learning |
Reinforcement learning |
||||
|
Definition |
Makes predictions
from data |
Segments and
groups data |
Reward-punishment system and interactive environment |
||||
|
Types of data |
Labelled data |
Unlabeled data |
Acts according to a policy with a final goal to reach (No
or predefined data) |
||||
|
Commercial value |
High commercial and business value |
Medium commercial and business value |
Little commercial use yet |
||||
|
Types of problems |
Regression and classification |
Association and Clustering |
Exploitation or Exploration |
||||
|
Supervision |
Extra
supervision |
No |
No supervision |
|
||||
|
Algorithms |
Linear
Regression, Logistic Regression,
SVM, KNN and so forth |
K – Means clustering, C – Means,
Apriori |
Q – Learning, SARSA |
|
||||
|
Aim |
Calculate
outcomes |
Discover underlying patterns |
Learn a series of action |
|
||||
|
|
|
|
|
|
||||
|
Application |
Risk Evaluation,
Forecast Sales |
Recommendation
System, Anomaly Detection |
Self-Driving Cars, Gaming, Healthcare |
|
||||
Design
a Learning System in Machine Learning
When we fed the Training Data to Machine
Learning Algorithm, this algorithm will produce a mathematical model and with
the help of the mathematical model, the machine will make a prediction and take
a decision without being explicitly programmed. Also, during training data, the
more machine will work with it the more it will get experience and the more
efficient result is produced.
Designing a Learning System in
Machine Learning:
According
to Tom Mitchell, “A computer program is said to be learning from experience
(E), with respect to some task (T). Thus, the performance measure (P) is the
performance at task T, which is measured by P, and it improves with experience
E.”
Example: In Spam E-Mail detection,
·
Task, T: To
classify mails into Spam or Not Spam.
·
Performance measure, P: Total percent of mails being correctly classified
as being “Spam” or “Not Spam”.
·
Experience, E: Set
of Mails with label “Spam”
Steps for Designing Learning
System are:
Step 1) Choosing the Training
Experience: The
very important and first task is to choose the training data or training
experience which will be fed to the Machine Learning Algorithm. It is important
to note that the data or experience that we fed to the algorithm must have a
significant impact on the Success or Failure of the Model. So training data or
experience should be chosen wisely.
Below
are the attributes which will impact on Success and Failure of Data:
·
The
training experience will be able to provide direct or indirect feedback
regarding choices. For example: While Playing chess the training data will
provide feedback to itself like instead of this move if this is chosen the
chances of success increases.
·
Second
important attribute is the degree to which the learner will control the
sequences of training examples. For example: when training data is fed to the
machine then at that time accuracy is very less but when it gains experience
while playing again and again with itself or opponent the machine algorithm
will get feedback and control the chess game accordingly.
·
Third
important attribute is how it will represent the distribution of examples over
which performance will be measured. For example, a Machine learning algorithm
will get experience while going through a number of different cases and
different examples. Thus, Machine Learning Algorithm will get more and more
experience by passing through more and more examples and hence its performance
will increase.
·
Step 2- Choosing target function: The next important step is choosing the target function.
It means according to the
knowledge fed to the algorithm the machine learning will choose NextMove
function which will describe what type of legal moves should be taken.
For example : While playing chess with the opponent, when opponent will
play then the machine learning algorithm will decide what be the number of
possible legal moves taken in order to get success.
·
Step 3- Choosing Representation for Target function: When the machine algorithm
will know all the possible legal moves, the next step is to choose the optimized move using any
representation i.e. using linear Equations, Hierarchical Graph Representation,
Tabular form etc. The NextMove function will move the Target move like out of
these move which will provide more success rate.
For Example:
while playing chess machine have 4 possible moves, so the machine will choose
that optimized move which will provide success to it.
·
Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with the training
data. The
training data had to go through the set of example and through these examples the training data will
approximates which steps are chosen and after that machine will provide
feedback on it.
For
Example: When a training data of playing chess is fed to algorithm so at that
time it is not machine algorithm will fail or get success and again from that
failure or success it will measure while next move what step should be chosen
and what is its success rate.
·
Step 5- Final Design: The
final design is created at last when system goes from number of examples,
failures and success, correct and incorrect decision and what will be the next
step etc. Example:
DeepBlue is an intelligent computer which is ML-based won chess game against
the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.
PERSPECTIVES AND Common
issues in Machine Learning
Although machine learning is being used
in every industry and helps organizations make more informed and data-driven
choices that are more effective than classical methodologies, it still has so
many problems that cannot be ignored. Here are some common issues in Machine
Learning that professionals face to inculcate ML skills and create an
application from scratch.
1. Inadequate Training Data
The major issue that comes while using
machine learning algorithms is the lack of quality as well as quantity of data.
Although data plays a vital role in the processing of machine learning
algorithms, many data scientists claim that inadequate data, noisy data, and
unclean data are extremely exhausting the machine learning algorithms. For
example, a simple task requires thousands of sample data, and an advanced task
such as speech or image recognition needs millions of sample data examples. Further,
data quality is also important for the algorithms to work ideally, but the
absence of data quality is also found in Machine Learning applications.
2. Poor quality of data
As we have discussed above, data plays a
significant role in machine learning, and it must be of good quality as well.
Noisy data, incomplete data, inaccurate data, and unclean data lead to less
accuracy in classification and low-quality results. Hence, data quality can
also be considered as a major common problem while processing machine learning
algorithms.
Data quality can be affected by some
factors as follows:
- Noisy Data- It is
responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
- Incorrect
data- It
is also responsible for faulty programming and results obtained in machine
learning models. Hence, incorrect data may affect the accuracy of the
results also.
3. Non-representative training data
To make sure our training model is generalized well or not, we
have to ensure that training
data must be representative of new cases that we need to generalize. The
training data must cover all cases that are already occurred as well as
occurring. Further, if we are using non-representative training data in
the model, it results in less accurate predictions. A machine learning model is
said to be ideal if it predicts well for generalized cases and provides
accurate decisions. If there is less training data, then there will be a
sampling noise in the model, called the non-representative training set. It
won't be accurate in predictions. To overcome this, it will be biased against
one class or a group.
4. Overfitting and Underfitting
Overfitting:
Overfitting is one of the most common
issues faced by Machine Learning engineers and data scientists. Whenever a
machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example
where we have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000
papayas. Then there is a considerable probability of identification of an apple
as papaya because we have a massive amount of biased data in the training data
set; hence prediction got negatively affected. The main reason behind
overfitting is using non-linear methods used in machine learning algorithms as
they build non-realistic data models. We can overcome overfitting by using
linear and parametric algorithms in the machine learning models.
Underfitting:
Underfitting is just the opposite of
overfitting. Whenever a
machine learning model is trained with fewer amounts of data, and as a result,
it provides incomplete and inaccurate data and destroys the accuracy of the
machine learning model. Underfitting occurs when our model is too simple
to understand the base structure of the data, just like an undersized pant.
This generally happens when we have limited data into the data set, and we try
to build a linear model with non-linear data. In such scenarios, the complexity
of the model destroys, and rules of the machine learning model become too easy
to be applied on this data set, and the model starts doing wrong predictions as
well.
5. Monitoring and maintenance
As we know that generalized output data
is mandatory for any machine learning model; hence, regular monitoring and
maintenance become compulsory for the same. Different results for different
actions require data change; hence editing of codes as well as resources for
monitoring them also become necessary.
6. Getting bad recommendations
A machine learning model operates under
a specific context which results in bad recommendations and concept drift in
the model. Let's
understand with an example where at a specific time customer is looking for
some gadgets, but now customer requirement changed over time but still machine
learning model showing same recommendations to the customer while customer
expectation has been changed. This incident is called a Data Drift. It
generally occurs when new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and monitoring data
according to the expectations.
7. Lack of skilled resources
Although Machine Learning and Artificial
Intelligence are continuously growing in the market, still these industries are
fresher in comparison to others. The absence of skilled resources in the form
of manpower is also an issue. Hence, we need manpower having in-depth knowledge of mathematics, science, and
technologies for developing and managing scientific substances for machine
learning.
8.
Process Complexity of Machine Learning
The machine learning process is very
complex, which is also another major issue faced by machine learning engineers
and data scientists. However, Machine Learning and Artificial Intelligence are
very new technologies but are still in an experimental phase and continuously
being changing over time. There is the majority of hits and trial experiments;
hence the probability of error is higher than expected. Further, it also
includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated
and quite tedious.
9. Lack of Explainability
This basically means the outputs cannot
be easily comprehended as it is programmed in specific ways to deliver for
certain conditions. Hence, a lack of explainability is also found in machine
learning algorithms which reduce the credibility of the algorithms.
10. Slow implementations and results
This issue is also very commonly seen in
machine learning models. However, machine learning models are highly efficient
in producing accurate results but are time-consuming. Slow programming,
excessive requirements' and overloaded data take more time to provide accurate
results than expected. This needs continuous maintenance and monitoring of the
model for delivering accurate results.
11. Irrelevant features
Although machine learning models are
intended to give the best possible outcome, if we feed garbage data as input,
then the result will also be garbage. Hence, we should use relevant features in
our training sample. A machine learning model is said to be good if training
data has a good set of features or less to no irrelevant features.
CONCEPT
LEARNING
Definition: Concept learning
- Inferring a Boolean-valued function
from training examples
of its input and output
·
Learning
involves acquiring general concepts from specific training examples. Example: People
continually learn general
concepts or categories such as "bird," "car," "situations in which I should study more
in order to pass the exam," etc.
·
Each such concept can be viewed
as describing some subset of objects or events defined over a larger set.
·
Alternatively,
each concept can be thought of as a Boolean-valued function defined over this larger set. (Example: A function defined
over all animals,
whose value is true
for birds and false for other animals).
A CONCEPT LEARNING
TASK:
Consider the example task of learning
the target concept
"Days on which Aldo enjoys his favourite Water sport”
|
Example |
Sky |
AirTemp |
Humidity |
Wind |
Water |
Forecast |
EnjoySport |
|
1 |
Sunny |
Warm |
Normal |
Strong |
Warm |
Same |
Yes |
|
2 |
Sunny |
Warm |
High |
Strong |
Warm |
Same |
Yes |
|
3 |
Rainy |
Cold |
High |
Strong |
Warm |
Change |
No |
|
4 |
Sunny |
Warm |
High |
Strong |
Cool |
Change |
Yes |
Table: Positive
and negative training
examples for the target concept EnjoySport.
The task is to learn to predict the value of EnjoySport
for an arbitrary day, based on the values of its other attributes?
What hypothesis representation is provided to the learner?
·
Let’s consider
a simple representation in which each hypothesis consists
of a conjunction of
constraints on the instance attributes.
·
Let each hypothesis be a vector
of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind,
Water, and Forecast.
For each attribute, the hypothesis
will either
·
Indicate by a "?'
that any value is acceptable for this attribute,
·
Specify a single required
value (e.g., Warm) for the attribute, or
·
Indicate by a "Φ" that no value is acceptable
If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive
example (h(x) = 1).
The hypothesis that PERSON enjoys
his favorite sport only on cold days with high Humidity
is represented by the expression
(?,
Cold, High, ?, ?, ?)
The most general
hypothesis-that every day is a positive
example-is represented by
(?,
?, ?, ?, ?, ?)
The most specific possible
hypothesis-that no day is a positive
example-is represented by
(Φ,
Φ, Φ, Φ, Φ, Φ)
Notation
·
The set of
items over which the concept
is defined is called the set of instances, which is denoted by X.
Example: X is the set of all possible days, each represented by the attributes: Sky,
AirTemp,
Humidity, Wind,
Water, and Forecast
·
The concept
or function to be learned
is called the target concept,
which is denoted
by
c. c can be a Boolean valued function
defined over the instances X
c: X→ {O, 1}
Example: The target
concept corresponds to the
value of the attribute EnjoySport
(i.e., c(x) =
1 if EnjoySport = Yes,
and c(x) = 0 if EnjoySport =
No).
·
Instances for which c(x) = 1 are called
positive examples, or members
of the target concept.
·
Instances for which c(x) = 0 are called
negative examples, or non-members of the target concept.
·
The ordered pair (x, c(x)) to describe the training example consisting of
the instance x and its target concept value c(x).
·
D to denote the set of available
training examples
·
The
symbol H to denote the set of all possible hypotheses that the learner
may consider regarding the identity
of the target concept. Each hypothesis h in H represents a Boolean valued function defined over X
· h:
X→{O, 1}
·
The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.
Ø
Given:
·
Instances X: Possible days,
each described by the
attributes
o
Sky (with possible values Sunny, Cloudy, and Rainy),
o
AirTemp (with values
Warm and Cold),
o
Humidity (with values
Normal and High),
o
Wind (with values Strong and Weak),
o
Water (with values Warm and Cool),
o
Forecast (with values
Same and Change).
· Hypotheses
H: Each
hypothesis is described by a conjunction of constraints on the attributes Sky,
AirTemp,
Humidity,
Wind,
Water,
and Forecast.
The constraints may be "?" ( any value is acceptable , “Φ” (no value
is acceptable), or a specific value.
· Target concept
c: EnjoySport : X → {0, l}
· Training examples
D: Positive and negative examples
of the target function
Ø Determine:
· A hypothesis h in H such that h(x) = c(x)
for all x in X.
The inductive
learning hypothesis
Any
hypothesis found to approximate the target function well over a sufficiently
large set of training examples will also approximate the target function well
over other unobserved examples.
CONCEPT LEARNING AS SEARCH
·
Concept learning
can be viewed as the task of searching through
a large space of hypotheses
implicitly defined by the hypothesis representation.
·
The goal of
this search is to find the hypothesis
that best fits the training
examples.
Example:
Consider the instances X and
hypotheses H in the
EnjoySport
learning task. The attribute
Sky
has three possible values, and AirTemp, Humidity, Wind,
Water,
Forecast
each have two possible values, the instance space X contains exactly,
so 96
distinct instances and 5120
syntactically distinct hypotheses within H
are possible in X.
Consider the two
hypotheses
h1 = (Sunny,
?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
· Consider
the sets of instances that are classified positive by ℎ1 and by ℎ2.
·
ℎ2
imposes fewer constraints on the instance, it classifies more instances as
positive. So, any instance classified positive by ℎ1 will also be
classified positive by ℎ2. Therefore, h2 is more general than ℎ1.
Given hypotheses ℎ𝑗and ℎ𝑘, ℎ𝑗
is more-general-than or- equal do ℎ𝑘 if and only if any instance that satisfies ℎ𝑘 also satisfies ℎ𝑖.
Definition: Let 𝒉𝒋 and 𝒉𝒌 be Boolean-valued functions defined over X. Then 𝒉𝒋 is more general than-or-equal-to 𝒉𝒌 (written 𝒉𝒋
≥ 𝒉𝒌) if and only if
(∀𝒙∈ 𝑿)[(𝒉𝒌(𝒙) = 𝟏) → (𝒉𝒋(𝒙) = 𝟏)]
·
In
the figure, the box on the left represents the
set X of all instances, the box on the right
the set H of all hypotheses.
·
Each hypothesis corresponds to some subset of X -the subset of instances that it
classifies positive.
·
The arrows
connecting hypotheses represent
the more - general -than relation, with the
arrow pointing toward the less general hypothesis.
·
Note the subset of instances characterized by h2 subsumes the subset characterized by
ℎ1, hence ℎ2is more - general–
than ℎ1
1. FIND-S: FINDING
A MAXIMALLY SPECIFIC
HYPOTHESIS
FIND-S Algorithm
1. Initialize h to
the most specific
hypothesis in H
2. For each positive training
instance x
For each attribute constraint 𝑎𝑖 in h
If the constraint ai is satisfied by x
Then do nothing
Else replace 𝑎𝑖 in
h by the next more general constraint that is satisfied by x
3.
Output hypothesis h
To illustrate this algorithm, assume
the learner is given the sequence of training examples from the EnjoySport task
· The first
step of FIND-S is to
initialize h to the most
specific hypothesis in H
h - (Ø, Ø, Ø, Ø, Ø, Ø)
· Consider the first training
example
x1 = <Sunny Warm Normal Strong Warm Same>, +
Observing
the first training example, it is clear that hypothesis h is too specific. None
of the "Ø" constraints in h are satisfied by this example,
so each is replaced by the next more general constraint that fits the example
h1 = <Sunny Warm Normal Strong Warm Same>
· Consider the second training example
x2 = <Sunny, Warm, High, Strong,
Warm, Same>, +
The second
training example forces
the algorithm to further generalize h, this time substituting
a "?" in place of any attribute value in h that is not satisfied by
the new example
h2 = <Sunny Warm ?
Strong Warm Same>
· Consider the third training example
x3 = <Rainy, Cold,
High, Strong, Warm,
Change>, -
Upon
encountering the third training the algorithm makes no change to h. The FIND-S
algorithm simply ignores every negative example.
h3 = < Sunny Warm ? Strong Warm Same>
· Consider the fourth training example
x4 = <Sunny Warm High
Strong Cool Change>, +
The fourth
example leads to a further
generalization of h
h4 = <
Sunny Warm ? Strong ? ? >
The key property of the
FIND-S algorithm
·
FIND-S
is guaranteed to output the most specific hypothesis within H that is
consistent with the positive training examples
·
FIND-S
algorithm’s final hypothesis will also be consistent with the negative examples
provided the correct target concept is contained in H, and provided the
training examples are correct.
VERSION SPACES AND THE CANDIDATE-ELIMINATION
ALGORITHM KEY TERMS:
Definition: consistent- A hypothesis h is consistent with a set of training examples D if and
only if h(x) = c(x) for each
example (x, c(x)) in D.
𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒉, 𝑫) ≡ (∀(𝒙, 𝒄(𝒙)) ∈ 𝑫)𝒉(𝒙) = 𝒄(𝒙)
Note difference between definitions of consistent and satisfies
·
An
example x is said to satisfy hypothesis h when h(x) = 1, regardless of
whether x is a positive or negative example of the target concept.
·
An example
x is said to consistent
with hypothesis h iff h(x) = c(x)
Definition: version
space- The version space,
denoted 𝑽𝑺𝑯, with respect to hypothesis space H and training examples D, is the subset of hypotheses
from H consistent with the training examples in D
𝑽𝑺𝑯,𝑫 ≡ [𝒉 ∈ 𝑯|𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒉, 𝑫)]
The LIST-THEN-ELIMINATION algorithm
The LIST-THEN-ELIMINATE algorithm first initializes the version space to
contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example.
The LIST-THEN-ELIMINATE Algorithm
1.
VersionSpace c a list containing every hypothesis in H
2.
For each training example, (x, c(x))
remove from VersionSpace any hypothesis h for which h(x ≠ c(x)
3.
Output the list of hypotheses
in VersionSpace
A More Compact Representation for Version Spaces
The
version space is represented by its most general and least general members.
These members form general and specific boundary sets that delimit the version
space within the partially ordered hypothesis space.
Definition: The
general boundary G, with respect to hypothesis space
H and training data D, is the set of maximally general members
of H consistent with D
𝑮 ≡ {𝒈 ∈ 𝑯| 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕 (𝒈, 𝑫) ∧ (¬∃𝒈′ ∈ 𝑯)[(𝒈′ >𝒈 𝒈) ∧ 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒈′, 𝑫)]}
Definition: The specific boundary
S, with respect
to hypothesis space H and training data D,
is the set of minimally general (i.e., maximally specific) members of H
consistent with D.
𝑺 ≡ {𝒔 ∈ 𝑯|𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒔, 𝑫) ∧ (¬∃𝒔′ ∈ 𝑯)[(𝒔 >𝒈 𝒔𝟏) ∧ 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒔′, 𝑫)]}
Theorem: Version
Space representation theorem
Theorem:
Let X be an arbitrary set of instances and Let H be a set of Boolean-valued
hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined
over X, and let D be an arbitrary set of training
examples {(x, c(x))). For all X, H, c, and D such that S
and G are well defined,
𝑽𝑺𝑯,𝑫 = {𝒉 ∈ 𝑯|(∃𝒔 ∈ 𝑺)(∃𝒈 ∈ 𝑮)(𝒈 ≥𝒈 𝒉 ≥𝒈 𝒔)}
let g, h, s be arbitrary
members of G, H, S respectively with 𝑔 ≥𝑔 ℎ ≥𝑔 𝑆
·
By
the definition of S, s must be satisfied by all positive examples in D.
·
By the definition of G, g cannot be satisfied by any negative
example in D.
A version
space with its general and specific boundary
sets. The version
space includes all six
hypotheses shown here, but can be represented more simply by S and G. Arrows
indicate instances of the more_general_than relation. This is the version space
for the EnjoySport concept learning problem and training example.
CANDIDATE-ELIMINATION Learning
Algorithm
The key idea in the CANDIDATE-ELIMINATION algorithm is to output
a description of the
set of all hypotheses consistent with the
training examples
The CANDIDATE-ELIMINTION algorithm
computes the version
space containing all hypotheses from H that are consistent with an observed sequence of
training examples.
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
• If d is a
positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
•
Remove s from
S
•
Add to S all minimal generalizations h of s such that
• h
is consistent with d, and some member of G is more general than h
• Remove from S any hypothesis that is more general than another
hypothesis in S
• If d is a
negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
•
Remove g from G
•
Add to G all minimal specializations h of g such that
• h
is consistent with d, and some member of S is more specific than h
• Remove
from G any hypothesis that is less general than another
hypothesis in G
• An Illustrative Example
CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H;
Initializing the G boundary
set to contain the most
general hypothesis in H
𝑮𝟎 <?
, ? , ? , ? , ? , ? >
Initializing the S boundary
set to contain the most specific (least
general) hypothesis
𝑺𝟎 <
∅, ∅, ∅, ∅, ∅, ∅ >
·
When
the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific
and it fails to cover the positive example.
·
The boundary
is therefore revised
by moving it to the least more general hypothesis that covers this new example
·
No update
of the G boundary is needed in response to this training
example because Go correctly covers this example
·
Consider the third training
example. This negative
example reveals that the G boundary of the version space is overly
general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.
·
The hypothesis in the G boundary must therefore be specialized until it correctly
classifies this new negative example
Given that there are six attributes that could be specified to specialize G2, why are there
only three new hypotheses in G3?
For
example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization
of G2 that correctly labels the new example
as a negative example, but it is not included
in G3. The reason this
hypothesis is excluded is that it is inconsistent with the previously
encountered positive examples
• Consider the fourth training example.
•
This positive
example further generalizes the S boundary
of the version space. It also results
in removing one member
of the G boundary, because
this member fails to cover
the new positive example
After processing these
four examples, the boundary sets S4 and G4 delimit the version space of all
hypotheses consistent with the set of incrementally observed training examples
Gradient Descent: is a
fundamental algorithm in machine learning and optimization. It is used for
tasks like training neural networks, fitting regression lines, and minimizing
cost functions in models.
Learning Rate:
Learning
rate is a important hyperparameter in gradient descent that controls how big or
small the steps should be when going downwards in gradient for updating models
parameters. It is essential to determines how quickly or slowly the algorithm
converges toward minimum of cost function.
1. If Learning rate is too small: The algorithm will take tiny
steps during iteration and converge very slowly. This can significantly
increases training time and computational cost especially for large datasets. This process is termed as vanishing
gradient problem.
Learning rate with small
steps
2. If Learning rate is too big: The algorithm may take huge steps leading
overshooting the minimum of cost function without settling. It fail to converge
causing the algorithm to oscillate. This process is termed as
exploding gradient problem.
Learning rate with big
steps
For simplicity let's consider a linear regression model with a single input
feature x and target y. The loss function (or cost function) for
a single data point is defined as the Mean
Squared Error (MSE):
Here:
·
yp=x⋅w+b: The predicted value.
·
w:
Weight (slope of the line).
·
b: Bias
(intercept).
·
n:
Number of data points.
To optimize the model
parameters w, we compute the gradient
of the loss function with respect to w.
Gradient descent is a
mathematical technique that iteratively finds the weights and bias that produce
the model with the lowest loss. Gradient descent finds the best weight and bias
by repeating the following process for a number of user-defined iterations.
The model begins training
with randomized weights and biases near zero, and then repeats the following
steps:
1.
Calculate the loss
with the current weight and bias.
2.
Determine the
direction to move the weights and bias that reduce loss.
3.
Move the weight and
bias values a small amount in the direction that reduces loss.
4.
Return to step one
and repeat the process until the model can't reduce the loss any further.
The diagram below outlines
the iterative steps gradient descent performs to find the weights and bias that
produce the model with the lowest loss.
Working of Gradient Descent
·
Step 1: we first initialize the
parameters of the model randomly
·
Step 2: Compute the gradient of
the cost function with respect to each parameter. It involves making partial
differentiation of cost function with respect to the parameters.
·
Step 3: Update the parameters of the
model by taking steps in the opposite direction of the model. Here we choose
a hyper parameter learning rate which is denoted by γ.
It helps in deciding the step size of the gradient.
·
Step 4: Repeat steps 2 and 3
iteratively to get the best parameter for the defined model.
Comments
Post a Comment