MACHINE LEARNING UNIT-1

UNIT-1

Syllabus:

Introduction - Well-posed learning problems, designing a learning system, Perspectives and issues in machine learning.

Concept learning and the general to specific ordering – introduction, a concept learning task, concept learning as search, find-S: finding a maximally specific hypothesis, version spaces and the candidate elimination algorithm, remarks on version spaces and candidate elimination, inductive bias, Gradient Descent Algorithm and its variants.

What Is Machine Learning?

In the real world, we are surrounded by humans who can learn everything from their experiences with their learning capability, and we have computers or machines which work on our instructions. But can a machine also learn from experiences or past data like a human does? So here comes the role of Machine Learning.

What is learning?

Learning is any process by which a system improves performance from experience.

· Identifying patterns.

· Recognizing those patterns when you see them again.

Well-defined learning problem or well-posed learning problem:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks T, as measured by P , improves with experience E.

“Learning is any process by which a system improves performance from experience.” Herbert Simon Definition by Tom Mitchell (1998):

Machine Learning is the study of algorithms that

• improve their performance P

• at some task T

• With experience E.

• A well-defined learning problem or well-posed learning problem is given by <P, T, E>

Defining the Learning Task:

Improve on task T, with respect to performance metric P, based on experience E

T: Playing checkers

P: Percentage of games won against an arbitrary opponent

E: Playing practice games against itself

T: Recognizing hand-written words

P: Percentage of words correctly classified

E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error

E: A sequence of images and steering commands recorded while observing a human driver.

T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.

E: Database of emails, some with human-given labels

Machine learning (ML) is defined as a discipline of artificial intelligence (AI) that provides machines the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human intervention.

Machine Learning is an application of Artificial Intelligence that enables systems to learn from vast volumes of data and solve specific problems. It uses computer algorithms that improve their efficiency automatically through experience.

How does Machine Learning work?

A Machine Learning system learns from historical data, builds the prediction models, and whenever it receives new data, predicts the output for it.

The accuracy of predicted output depends upon the amount of data, as the huge amount of data helps to build a better model which predicts the output more accurately.

Suppose we have a complex problem, where we need to perform some predictions, so instead of writing a code for it, we just need to feed the data to generic algorithms, and with the help of these algorithms, machine builds the logic as per the data and predict the output. Machine learning has changed our way of thinking about the problem. The below block diagram explains the working of Machine Learning algorithm:

The below block diagram explains the working of Machine Learning algorithm:

The need for machine learning is increasing day by day. The reason behind the need for machine learning is that it is capable of doing tasks that are too complex for a person to implement directly. As a human, we have some limitations as we cannot access the huge amount of data manually, so for this, we need some computer systems and here comes the machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data and let them explore the data, construct the models, and predict the required output automatically. The

performance of the machine learning algorithm depends on the amount of data, and it can be determined by the cost function. With the help of machine learning, we can save both time and money.

The importance of machine learning can be easily understood by its uses cases, currently, machine learning is used in self-driving cars, cyber fraud detection, face recognition, and friend suggestion by Facebook, etc. Various top companies such as Netflix and Amazon have built machine learning models that are using a vast amount of data to analyze the user interest and recommend product accordingly.

How to get started with Machine Learning?

To get started, let’s take a look at some of the important terminologies.

Terminology:

· Model: Also known as “hypothesis”, a machine learning model is the mathematical representation of a real-world process. A machine learning algorithm along with the training data builds a machine learning model.

· Feature: A feature is a measurable property or parameter of the data-set.

· Feature Vector: It is a set of multiple numeric features. We use it as an input to the machine learning model for training and prediction purposes.

· Training: An algorithm takes a set of data known as “training data” as input. The learning algorithm finds patterns in the input data and trains the model for expected results (target). The output of the training process is the machine learning model.

· Prediction: Once the machine learning model is ready, it can be fed with input data to provide a predicted output.

· Target (Label): The value that the machine learning model has to predict is called the target or label.

· Over fitting: When a model performs very well for training data but has poor performance with test data (new data), it is known as over fitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. Over fitting can happen due to low bias and high variance.

· Under fitting: When a model has not learned the patterns in the training data well and is unable to generalize well on the new data, it is known as under fitting. An under fit model has poor performance on the training data and will result in unreliable predictions. Under fitting occurs due to high bias and low variance.

Machine learning Life cycle:

Machine learning has given the computer systems the abilities to automatically learn without being explicitly programmed. But how does a machine learning system work? So, it can be described using the life cycle of machine learning. Machine learning life cycle is a cyclic process to build an efficient machine learning project. The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

1. Gathering Data

2. Preparing that data

3. Choosing a model

4. Training

5. Evaluation

6. Hyper parameter Tuning

7. Prediction

Machine Learning Steps

The task of imparting intelligence to machines seems daunting and impossible. But it is actually really easy. It can be broken down into 7 major steps:

Collecting Data: - The quality and quantity of your data directly impact the performance and reliability of your machine learning model. Poor quality data will lead to poor model performance, regardless of the sophistication of the algorithms used.

Key Considerations in Data Collection:

Data Sources: Identify and access relevant data sources (e.g., databases, APIs, sensors, public datasets).
Data Quality: Ensure data accuracy, completeness, consistency, and relevance.
Data Volume: Collect enough data to train a robust and reliable model.
Data Diversity: Collect data that represents the real-world scenarios the model will encounter.
Data Bias: Be mindful of potential biases in the data and take steps to mitigate them.

Figure 2: Collecting Data

1. Preparing the Data: -

After you have your data, you have to prepare it. You can do this by:

· Putting together all the data you have and randomizing it. This helps make sure that data is evenly distributed, and the ordering does not affect the learning process.

· Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate values, data type conversion, etc. You might even have to restructure the dataset and change the rows and columns or index of rows and columns.

· Visualize the data to understand how it is structured and understand the relationship between various variables and classes present.

· Splitting the cleaned data into two sets - a training set and a testing set. The training set is the set your model learns from. A testing set is used to check the accuracy of your model after training.

Figure 3: Cleaning and Visualizing Data

2. Choosing a Model: -

A machine learning model determines the output you get after running a machine learning algorithm on the collected data. It is important to choose a model which is relevant to the task at hand. Over the years, scientists and engineers developed various models suited for different tasks like speech recognition, image recognition, prediction, etc. Apart from this, you also have to see if your model is suited for numerical or categorical data and choose accordingly.

Figure 4: Choosing a model

3. Training the Model: -

Training is the most important step in machine learning. In training, you pass the prepared data to your machine learning model to find patterns and make predictions. It results in the model learning from the data so that it can accomplish the task set. Over time, with training, the model gets better at predicting.

Figure 5: Training a model

4. Evaluating the Model: -

After training your model, you have to check to see how it’s performing. This is done by testing the performance of the model on previously unseen data. The unseen data used is the testing set that you split our data into earlier. If testing was done on the same data which is used for training, you will not get an accurate measure, as the model is already used to the data, and finds the same patterns in it, as it previously did. This will give you disproportionately high accuracy. When used on testing data, you get an accurate measure of how your model will perform and its speed.

Figure 6: Evaluating a model

5. Parameter Tuning: -

Once you have created and evaluated your model, see if its accuracy can be improved in any way. This is done by tuning the parameters present in your model. Parameters are the variables in the model that the programmer generally decides. At a particular value of your parameter, the accuracy will be the maximum. Parameter tuning refers to finding these values.

Figure 7: Parameter Tuning

6. Making Predictions: -

In the end, you can use your model on unseen data to make predictions accurately.

Examples of Machine Learning Applications:

Machine learning is a buzzword for today's technology, and it is growing very rapidly day by day. We are using machine learning in our daily life even without knowing it such as Google Maps, Google assistant, Alexa, etc.

Given below are some real examples of ML:

Example 1:

If you have used Netflix, then you must know that it recommends you some movies or shows for watching based on what you have watched earlier. Machine Learning is used for this recommendation and to select the data which matches your choice. It uses the earlier data.

Example 2:

The second example would be Facebook.

When you upload a photo on Facebook, it can recognize a person in that photo and suggest you, mutual friends. ML is used for these predictions. It uses data like your friend-list, photos available etc. and it makes predictions based on that.

Example 3:

The third example is Software, which shows how you will look when you get older. This image processing also uses machine learning.

Below are some most trending real-world applications of Machine Learning:

1. Image Recognition: -

Image recognition is one of the most common applications of machine learning. It is used to identify objects, persons, places, digital images, etc. The popular use case of image recognition and face detection is, Automatic friend tagging suggestion.

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo with our Facebook friends, then we automatically get a tagging suggestion with name, and the technology behind this is machine learning's face detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face recognition and person identification in the picture.

2. Speech Recognition: -

While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech to text", or "Computer speech recognition." At present, machine learning algorithms are

widely used by various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the voice instructions.

3. Traffic prediction: -

If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with the help of two ways:

· Real Time location of the vehicle form Google Map app and sensors

· Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information from the user and sends back to its database to improve the performance.

4. Product recommendations: -

Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product on Amazon, then we started getting an advertisement for the same product while internet surfing on the same browser and this is because of machine learning.

Google understands the user interest using various machine learning algorithms and suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars: -

One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on self-driving car. It is using unsupervised learning method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering: -

Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We always receive an important mail in our inbox with the important symbol and spam emails in our spam box, and the technology behind this is Machine learning. Below are some spam filters used by Gmail:

· Content Filter

· Header filter

· General blacklists filter

· Rules-based filters

· Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant: -

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the information using our voice instruction. These assistants can help us in various ways just by our voice instructions such as Play music, call someone, Open an email, Scheduling an appointment, etc. These virtual assistants use machine learning algorithms as an important part. These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML algorithms and act accordingly.

Online Fraud Detection: -

Machine learning is making our online transaction safe and secure by detecting fraud transaction. Whenever we perform some online transaction, there may be various ways that a fraudulent transaction can take place such as fake accounts, fake ids, and steal money in the middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these values become the input for the next round. For each genuine transaction, there is a specific pattern which gets change for the fraud transaction hence, it detects it and makes our online transactions more secure.

8. Stock Market trading: -

Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and downs in shares, so for this machine learning's long short term memory neural network is used for the prediction of stock market trends.

9. Medical Diagnosis: -

In medical science, machine learning is used for diseases diagnoses. With this, medical technology is growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

10. Automatic Language Translation: -

Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all, as for this also machine learning helps us by converting the text into our known languages. Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural Machine Learning that translates the text into our familiar language, and it called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning algorithm, which is used with image recognition and translates the text from one language to another language.

11. Analyzing the user’s feedback using Sentiment Analysis: -

Sentiment analysis is a top-notch machine learning application that refers to sentiment classification, opinion mining, and analyzing emotions. Using this model, machines groom themselves to analyze sentiments based on the words. They can identify if the words are said in a positive, negative, or neutral notion. Also, they can define the magnitude of these words.

With the help of the process called Natural Language Processing (NLP), data-miners automatically extract and conclude the opinion by analyzing both types of machine learning algorithms – supervised and unsupervised data. Companies that are dealing with customers use this model to improve customer experience based on feedback.

Another machine learning example is – Music applications. Apps like Ganna.com, Jiosaavn also suggest music based on user sentiments by analyzing the history of songs played, favorite playlists, and even the time of listings music.

Types of Machine Learning: -

Machine learning is a subset of AI, which enables the machine to automatically learn from data, improve performance from past experiences, and make predictions. Machine learning contains a set of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and on the basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification, Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

Reinforcement Learning

Applications of Machine Learning

			Supervised learning		Unsupervised learning		Reinforcement learning
	Definition		Makes predictions from data		Segments and groups data		Reward-punishment system and interactive environment
	Types of data		Labelled data		Unlabeled data		Acts according to a policy with a final goal to reach (No or predefined data)
	Commercial value		High commercial and business value		Medium commercial and business value		Little commercial use yet
	Types of problems		Regression and classification		Association and Clustering		Exploitation or Exploration
Supervision		Extra supervision		No		No supervision
Algorithms		Linear Regression, Logistic Regression, SVM, KNN and so forth		K – Means clustering, C – Means, Apriori		Q – Learning, SARSA
Aim		Calculate outcomes		Discover underlying patterns		Learn a series of action

Application		Risk Evaluation, Forecast Sales		Recommendation System, Anomaly Detection		Self-Driving Cars, Gaming, Healthcare

Design a Learning System in Machine Learning

When we fed the Training Data to Machine Learning Algorithm, this algorithm will produce a mathematical model and with the help of the mathematical model, the machine will make a prediction and take a decision without being explicitly programmed. Also, during training data, the more machine will work with it the more it will get experience and the more efficient result is produced.

Designing a Learning System in Machine Learning:

According to Tom Mitchell, “A computer program is said to be learning from experience (E), with respect to some task (T). Thus, the performance measure (P) is the performance at task T, which is measured by P, and it improves with experience E.”

Example: In Spam E-Mail detection,

· Task, T: To classify mails into Spam or Not Spam.

· Performance measure, P: Total percent of mails being correctly classified as being “Spam” or “Not Spam”.

· Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System are:

Step 1) Choosing the Training Experience: The very important and first task is to choose the training data or training experience which will be fed to the Machine Learning Algorithm. It is important to note that the data or experience that we fed to the algorithm must have a significant impact on the Success or Failure of the Model. So training data or experience should be chosen wisely.

Below are the attributes which will impact on Success and Failure of Data:

· The training experience will be able to provide direct or indirect feedback regarding choices. For example: While Playing chess the training data will provide feedback to itself like instead of this move if this is chosen the chances of success increases.

· Second important attribute is the degree to which the learner will control the sequences of training examples. For example: when training data is fed to the machine then at that time accuracy is very less but when it gains experience while playing again and again with itself or opponent the machine algorithm will get feedback and control the chess game accordingly.

· Third important attribute is how it will represent the distribution of examples over which performance will be measured. For example, a Machine learning algorithm will get experience while going through a number of different cases and different examples. Thus, Machine Learning Algorithm will get more and more experience by passing through more and more examples and hence its performance will increase.

· Step 2- Choosing target function: The next important step is choosing the target function. It means according to the knowledge fed to the algorithm the machine learning will choose NextMove function which will describe what type of legal moves should be taken. For example : While playing chess with the opponent, when opponent will play then the machine learning algorithm will decide what be the number of possible legal moves taken in order to get success.

· Step 3- Choosing Representation for Target function: When the machine algorithm will know all the possible legal moves, the next step is to choose the optimized move using any representation i.e. using linear Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove function will move the Target move like out of these move which will provide more success rate.

For Example: while playing chess machine have 4 possible moves, so the machine will choose that optimized move which will provide success to it.

· Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with the training data. The training data had to go through the set of example and through these examples the training data will approximates which steps are chosen and after that machine will provide feedback on it.

For Example: When a training data of playing chess is fed to algorithm so at that time it is not machine algorithm will fail or get success and again from that failure or success it will measure while next move what step should be chosen and what is its success rate.

· Step 5- Final Design: The final design is created at last when system goes from number of examples, failures and success, correct and incorrect decision and what will be the next step etc. Example: DeepBlue is an intelligent computer which is ML-based won chess game against the chess expert Garry Kasparov, and it became the first computer which had beaten a human chess expert.

PERSPECTIVES AND Common issues in Machine Learning

Although machine learning is being used in every industry and helps organizations make more informed and data-driven choices that are more effective than classical methodologies, it still has so many problems that cannot be ignored. Here are some common issues in Machine Learning that professionals face to inculcate ML skills and create an application from scratch.

1. Inadequate Training Data

The major issue that comes while using machine learning algorithms is the lack of quality as well as quantity of data. Although data plays a vital role in the processing of machine learning algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are extremely exhausting the machine learning algorithms. For example, a simple task requires thousands of sample data, and an advanced task such as speech or image recognition needs millions of sample data examples. Further, data quality is also important for the algorithms to work ideally, but the absence of data quality is also found in Machine Learning applications.

2. Poor quality of data

As we have discussed above, data plays a significant role in machine learning, and it must be of good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less accuracy in classification and low-quality results. Hence, data quality can also be considered as a major common problem while processing machine learning algorithms.

Data quality can be affected by some factors as follows:

Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well as accuracy in classification tasks.
Incorrect data- It is also responsible for faulty programming and results obtained in machine learning models. Hence, incorrect data may affect the accuracy of the results also.

3. Non-representative training data

To make sure our training model is generalized well or not, we have to ensure that training data must be representative of new cases that we need to generalize. The training data must cover all cases that are already occurred as well as occurring. Further, if we are using non-representative training data in the model, it results in less accurate predictions. A machine learning model is said to be ideal if it predicts well for generalized cases and provides accurate decisions. If there is less training data, then there will be a sampling noise in the model, called the non-representative training set. It won't be accurate in predictions. To overcome this, it will be biased against one class or a group.

4. Overfitting and Underfitting

Overfitting:

Overfitting is one of the most common issues faced by Machine Learning engineers and data scientists. Whenever a machine learning model is trained with a huge amount of data, it starts capturing noise and inaccurate data into the training data set. It negatively affects the performance of the model. Let's understand with a simple example where we have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a considerable probability of identification of an apple as papaya because we have a massive amount of biased data in the training data set; hence prediction got negatively affected. The main reason behind overfitting is using non-linear methods used in machine learning algorithms as they build non-realistic data models. We can overcome overfitting by using linear and parametric algorithms in the machine learning models.

Underfitting:

Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the accuracy of the machine learning model. Underfitting occurs when our model is too simple to understand the base structure of the data, just like an undersized pant. This generally happens when we have limited data into the data set, and we try to build a linear model with non-linear data. In such scenarios, the complexity of the model destroys, and rules of the machine learning model become too easy to be applied on this data set, and the model starts doing wrong predictions as well.

5. Monitoring and maintenance

As we know that generalized output data is mandatory for any machine learning model; hence, regular monitoring and maintenance become compulsory for the same. Different results for different actions require data change; hence editing of codes as well as resources for monitoring them also become necessary.

6. Getting bad recommendations

A machine learning model operates under a specific context which results in bad recommendations and concept drift in the model. Let's understand with an example where at a specific time customer is looking for some gadgets, but now customer requirement changed over time but still machine learning model showing same recommendations to the customer while customer expectation has been changed. This incident is called a Data Drift. It generally occurs when new data is introduced or interpretation of data changes. However, we can overcome this by regularly updating and monitoring data according to the expectations.

7. Lack of skilled resources

Although Machine Learning and Artificial Intelligence are continuously growing in the market, still these industries are fresher in comparison to others. The absence of skilled resources in the form of manpower is also an issue. Hence, we need manpower having in-depth knowledge of mathematics, science, and technologies for developing and managing scientific substances for machine learning.

8. Process Complexity of Machine Learning

The machine learning process is very complex, which is also another major issue faced by machine learning engineers and data scientists. However, Machine Learning and Artificial Intelligence are very new technologies but are still in an experimental phase and continuously being changing over time. There is the majority of hits and trial experiments; hence the probability of error is higher than expected. Further, it also includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, etc., making the procedure more complicated and quite tedious.

9. Lack of Explainability

This basically means the outputs cannot be easily comprehended as it is programmed in specific ways to deliver for certain conditions. Hence, a lack of explainability is also found in machine learning algorithms which reduce the credibility of the algorithms.

10. Slow implementations and results

This issue is also very commonly seen in machine learning models. However, machine learning models are highly efficient in producing accurate results but are time-consuming. Slow programming, excessive requirements' and overloaded data take more time to provide accurate results than expected. This needs continuous maintenance and monitoring of the model for delivering accurate results.

11. Irrelevant features

Although machine learning models are intended to give the best possible outcome, if we feed garbage data as input, then the result will also be garbage. Hence, we should use relevant features in our training sample. A machine learning model is said to be good if training data has a good set of features or less to no irrelevant features.

CONCEPT LEARNING

Definition: Concept learning - Inferring a Boolean-valued function from training examples of its input and output

· Learning involves acquiring general concepts from specific training examples. Example: People continually learn general concepts or categories such as "bird," "car," "situations in which I should study more in order to pass the exam," etc.

· Each such concept can be viewed as describing some subset of objects or events defined over a larger set.

· Alternatively, each concept can be thought of as a Boolean-valued function defined over this larger set. (Example: A function defined over all animals, whose value is true for birds and false for other animals).

A CONCEPT LEARNING TASK:

Consider the example task of learning the target concept "Days on which Aldo enjoys his favourite Water sport”

Example	*Sky*	*AirTemp*	*Humidity*	*Wind*	*Water*	*Forecast*	*EnjoySport*
1	Sunny	Warm	Normal	Strong	Warm	Same	Yes
2	Sunny	Warm	High	Strong	Warm	Same	Yes
3	Rainy	Cold	High	Strong	Warm	Change	No
4	Sunny	Warm	High	Strong	Cool	Change	Yes

Table: Positive and negative training examples for the target concept EnjoySport.

The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the values of its other attributes?

What hypothesis representation is provided to the learner?

· Let’s consider a simple representation in which each hypothesis consists of a conjunction of constraints on the instance attributes.

· Let each hypothesis be a vector of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.

For each attribute, the hypothesis will either

· Indicate by a "?' that any value is acceptable for this attribute,

· Specify a single required value (e.g., Warm) for the attribute, or

· Indicate by a "Φ" that no value is acceptable

If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 1).

The hypothesis that PERSON enjoys his favorite sport only on cold days with high Humidity

is represented by the expression

(?, Cold, High, ?, ?, ?)

The most general hypothesis-that every day is a positive example-is represented by

(?, ?, ?, ?, ?, ?)

The most specific possible hypothesis-that no day is a positive example-is represented by

(Φ, Φ, Φ, Φ, Φ, Φ)

Notation

· The set of items over which the concept is defined is called the set of instances, which is denoted by X.

Example: X is the set of all possible days, each represented by the attributes: Sky,

AirTemp, Humidity, Wind, Water, and Forecast

· The concept or function to be learned is called the target concept, which is denoted by

c. c can be a Boolean valued function defined over the instances X

c: X→ {O, 1}

Example: The target concept corresponds to the value of the attribute EnjoySport

(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).

· Instances for which c(x) = 1 are called positive examples, or members of the target concept.

· Instances for which c(x) = 0 are called negative examples, or non-members of the target concept.

· The ordered pair (x, c(x)) to describe the training example consisting of the instance x and its target concept value c(x).

· D to denote the set of available training examples

· The symbol H to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept. Each hypothesis h in H represents a Boolean valued function defined over X

· h: X→{O, 1}

· The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.

Ø Given:

· Instances X: Possible days, each described by the attributes

o Sky (with possible values Sunny, Cloudy, and Rainy),

o AirTemp (with values Warm and Cold),

o Humidity (with values Normal and High),

o Wind (with values Strong and Weak),

o Water (with values Warm and Cool),

o Forecast (with values Same and Change).

· Hypotheses H: Each hypothesis is described by a conjunction of constraints on the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be "?" ( any value is acceptable , “Φ” (no value is acceptable), or a specific value.

· Target concept c: EnjoySport : X → {0, l}

· Training examples D: Positive and negative examples of the target function

Ø Determine:

· A hypothesis h in H such that h(x) = c(x) for all x in X.

The inductive learning hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

CONCEPT LEARNING AS SEARCH

· Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation.

· The goal of this search is to find the hypothesis that best fits the training examples.

Example:

Consider the instances X and hypotheses H in the EnjoySport learning task. The attribute Sky has three possible values, and AirTemp, Humidity, Wind, Water, Forecast each have two possible values, the instance space X contains exactly, so 96 distinct instances and 5120 syntactically distinct hypotheses within H are possible in X.

Consider the two hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?)

h2 = (Sunny, ?, ?, ?, ?, ?)

· Consider the sets of instances that are classified positive by ℎ₁ and by ℎ₂.

· ℎ₂ imposes fewer constraints on the instance, it classifies more instances as positive. So, any instance classified positive by ℎ₁ will also be classified positive by ℎ₂. Therefore, h2 is more general than ℎ₁.

Given hypotheses ℎ_𝑗and ℎ_𝑘, ℎ_𝑗 is more-general-than or- equal do ℎ_𝑘if and only if any instance that satisfies ℎ_𝑘 also satisfies ℎ_𝑖.

Definition: Let 𝒉_𝒋 and 𝒉_𝒌 be Boolean-valued functions defined over X. Then 𝒉_𝒋 is more general than-or-equal-to 𝒉_𝒌 (written 𝒉_𝒋 ≥ 𝒉_𝒌) if and only if

(∀_𝒙∈ 𝑿)[(𝒉_𝒌(𝒙) = 𝟏) → (𝒉_𝒋(𝒙) = 𝟏)]

· In the figure, the box on the left represents the set X of all instances, the box on the right the set H of all hypotheses.

· Each hypothesis corresponds to some subset of X -the subset of instances that it classifies positive.

· The arrows connecting hypotheses represent the more - general -than relation, with the arrow pointing toward the less general hypothesis.

· Note the subset of instances characterized by h2 subsumes the subset characterized by

ℎ₁, hence ℎ₂is more - general– than ℎ₁

1. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

FIND-S Algorithm

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x

For each attribute constraint 𝑎_𝑖 in h

If the constraint a_i is satisfied by x

Then do nothing

Else replace 𝑎_𝑖 in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

To illustrate this algorithm, assume the learner is given the sequence of training examples from the EnjoySport task

· The first step of FIND-S is to initialize h to the most specific hypothesis in H

h - (Ø, Ø, Ø, Ø, Ø, Ø)

· Consider the first training example

x1 = <Sunny Warm Normal Strong Warm Same>, +

Observing the first training example, it is clear that hypothesis h is too specific. None of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next more general constraint that fits the example

h1 = <Sunny Warm Normal Strong Warm Same>

· Consider the second training example

x2 = <Sunny, Warm, High, Strong, Warm, Same>, +

The second training example forces the algorithm to further generalize h, this time substituting a "?" in place of any attribute value in h that is not satisfied by the new example

h2 = <Sunny Warm ? Strong Warm Same>

· Consider the third training example

x3 = <Rainy, Cold, High, Strong, Warm, Change>, -

Upon encountering the third training the algorithm makes no change to h. The FIND-S algorithm simply ignores every negative example.

h3 = < Sunny Warm ? Strong Warm Same>

· Consider the fourth training example

x4 = <Sunny Warm High Strong Cool Change>, +

The fourth example leads to a further generalization of h

h4 = < Sunny Warm ? Strong ? ? >

The key property of the FIND-S algorithm

· FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples

· FIND-S algorithm’s final hypothesis will also be consistent with the negative examples provided the correct target concept is contained in H, and provided the training examples are correct.

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM KEY TERMS:

Definition: consistent- A hypothesis h is consistent with a set of training examples D if and

only if h(x) = c(x) for each example (x, c(x)) in D.

𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒉, 𝑫) ≡ (∀(𝒙, 𝒄(𝒙)) ∈ 𝑫)𝒉(𝒙) = 𝒄(𝒙)

Note difference between definitions of consistent and satisfies

· An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the target concept.

· An example x is said to consistent with hypothesis h iff h(x) = c(x)

Definition: version space- The version space, denoted 𝑽𝑺_𝑯_, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D

𝑽𝑺_𝑯_,_𝑫 ≡ [𝒉 ∈ 𝑯|𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒉, 𝑫)]

The LIST-THEN-ELIMINATION algorithm

The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example.

The LIST-THEN-ELIMINATE Algorithm

1. VersionSpace c a list containing every hypothesis in H

2. For each training example, (x, c(x))

remove from VersionSpace any hypothesis h for which h(x ≠ c(x)

3. Output the list of hypotheses in VersionSpace

A More Compact Representation for Version Spaces

The version space is represented by its most general and least general members. These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.

Definition: The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D

𝑮 ≡ {𝒈 ∈ 𝑯| 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕 (𝒈, 𝑫) ∧ (¬∃𝒈^′ ∈ 𝑯)[(𝒈^′ >_𝒈 𝒈) ∧ 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒈^′, 𝑫)]}

Definition: The specific boundary S, with respect to hypothesis space H and training data D, is the set of minimally general (i.e., maximally specific) members of H consistent with D.

𝑺 ≡ {𝒔 ∈ 𝑯|𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒔, 𝑫) ∧ (¬∃𝒔^′ ∈ 𝑯)[(𝒔 >_𝒈 𝒔^𝟏) ∧ 𝑪𝒐𝒏𝒔𝒊𝒔𝒕𝒆𝒏𝒕(𝒔^′, 𝑫)]}

Theorem: Version Space representation theorem

Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean-valued hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {(x, c(x))). For all X, H, c, and D such that S and G are well defined,

𝑽𝑺_𝑯_,_𝑫 = {𝒉 ∈ 𝑯|(∃𝒔 ∈ 𝑺)(∃𝒈 ∈ 𝑮)(𝒈 ≥_𝒈 𝒉 ≥_𝒈 𝒔)}

let g, h, s be arbitrary members of G, H, S respectively with 𝑔 ≥_𝑔 ℎ ≥_𝑔 𝑆

· By the definition of S, s must be satisfied by all positive examples in D.

· By the definition of G, g cannot be satisfied by any negative example in D.

A version space with its general and specific boundary sets. The version space includes all six hypotheses shown here, but can be represented more simply by S and G. Arrows indicate instances of the more_general_than relation. This is the version space for the EnjoySport concept learning problem and training example.

CANDIDATE-ELIMINATION Learning Algorithm

The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the set of all hypotheses consistent with the training examples

The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples.

Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do

• If d is a positive example

• Remove from G any hypothesis inconsistent with d

• For each hypothesis s in S that is not consistent with d

• Remove s from S

• Add to S all minimal generalizations h of s such that

• h is consistent with d, and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example

• Remove from S any hypothesis inconsistent with d

• For each hypothesis g in G that is not consistent with d

• Remove g from G

• Add to G all minimal specializations h of g such that

• h is consistent with d, and some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis in G

• An Illustrative Example

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H;

Initializing the G boundary set to contain the most general hypothesis in H

𝑮_𝟎 <? , ? , ? , ? , ? , ? >

Initializing the S boundary set to contain the most specific (least general) hypothesis

𝑺_𝟎 < ∅, ∅, ∅, ∅, ∅, ∅ >

· When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific and it fails to cover the positive example.

· The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example

· No update of the G boundary is needed in response to this training example because Go correctly covers this example

When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 =G0

· Consider the third training example. This negative example reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.

· The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example

Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3?

For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3. The reason this hypothesis is excluded is that it is inconsistent with the previously encountered positive examples

• Consider the fourth training example.

•

This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example

After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples

Gradient Descent: is a fundamental algorithm in machine learning and optimization. It is used for tasks like training neural networks, fitting regression lines, and minimizing cost functions in models.

Learning Rate:

Learning rate is a important hyperparameter in gradient descent that controls how big or small the steps should be when going downwards in gradient for updating models parameters. It is essential to determines how quickly or slowly the algorithm converges toward minimum of cost function.

1. If Learning rate is too small: The algorithm will take tiny steps during iteration and converge very slowly. This can significantly increases training time and computational cost especially for large datasets. This process is termed as vanishing gradient problem.

Learning rate with small steps

2. If Learning rate is too big: The algorithm may take huge steps leading overshooting the minimum of cost function without settling. It fail to converge causing the algorithm to oscillate. This process is termed as exploding gradient problem.

Learning rate with big steps

For simplicity let's consider a linear regression model with a single input feature x and target y. The loss function (or cost function) for a single data point is defined as the Mean Squared Error (MSE):

Here:

· yp=x⋅w+b: The predicted value.

· w: Weight (slope of the line).

· b: Bias (intercept).

· n: Number of data points.

To optimize the model parameters w, we compute the gradient of the loss function with respect to w.

Gradient descent is a mathematical technique that iteratively finds the weights and bias that produce the model with the lowest loss. Gradient descent finds the best weight and bias by repeating the following process for a number of user-defined iterations.

The model begins training with randomized weights and biases near zero, and then repeats the following steps:

1. Calculate the loss with the current weight and bias.

2. Determine the direction to move the weights and bias that reduce loss.

3. Move the weight and bias values a small amount in the direction that reduces loss.

4. Return to step one and repeat the process until the model can't reduce the loss any further.

The diagram below outlines the iterative steps gradient descent performs to find the weights and bias that produce the model with the lowest loss.

Working of Gradient Descent

· Step 1: we first initialize the parameters of the model randomly

· Step 2: Compute the gradient of the cost function with respect to each parameter. It involves making partial differentiation of cost function with respect to the parameters.

· Step 3: Update the parameters of the model by taking steps in the opposite direction of the model. Here we choose a hyper parameter learning rate which is denoted by γ. It helps in deciding the step size of the gradient.

· Step 4: Repeat steps 2 and 3 iteratively to get the best parameter for the defined model.

CrazyLearners🙂