Back

1 - Fundamental Concepts

Classification

Classification is a fundamental task in Machine Learning, where the objective is to assign a label or category to an input example based on its features. It's a type of supervised learning, which means that the model learns from a dataset with labeled examples, where each example has a known label or category.

The classification process generally follows these steps:

  1. Data collection and preparation: Gather a labeled dataset, where each example has a feature vector and a corresponding label. Split the data into training and test sets.
  2. Algorithm choice: Select a classification algorithm suitable for the problem, such as Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), or Artificial Neural Networks.
  3. Model training: Feed the training set to the chosen algorithm so it can learn the patterns and relationships between the features and labels. The model adjusts its internal parameters to minimize classification error.
  4. Model evaluation: Use the test set to evaluate the performance of the trained model. Common metrics include accuracy, precision, recall, and F1-score.
  5. Model tuning: If necessary, adjust the model's hyperparameters or try different algorithms to improve performance.
  6. Inference: Once the model is trained and validated, it can be used to make predictions (classify) new, unlabeled examples.

Some examples of classification problems include:

Classification can be binary (two classes) or multiclass (three or more classes), and some algorithms also support multi-label classification, where an example can belong to multiple classes simultaneously.

Regression

Regression is another fundamental task in Machine Learning, where the goal is to predict a continuous numerical value based on a set of input variables (features). Unlike classification, which deals with categorical output variables (classes), regression deals with continuous numerical output variables.

The regression process generally follows these steps:

  1. Data Collection and Preparation: Gather a dataset with input variables and their corresponding continuous output values. Split the data into training and testing sets.
  2. Algorithm Choice: Select a suitable regression algorithm for the problem, such as Linear Regression, Polynomial Regression, Ridge Regression, Lasso Regression, Support Vector Regression (SVR), or Artificial Neural Networks.
  3. Model Training: Feed the training set to the chosen algorithm so it can learn the patterns and relationships between the input variables and the output variable. The model adjusts its internal parameters to minimize prediction error.
  4. Model Evaluation: Use the testing set to assess the performance of the trained model. Common metrics include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²).
  5. Model Tuning: If necessary, adjust the model's hyperparameters or try different algorithms to improve performance.
  6. Inference: Once the model is trained and validated, it can be used to make predictions on new examples.

Some examples of regression problems include:

In summary, regression is a powerful technique for predicting continuous numerical values based on a set of input variables, with applications in various areas such as finance, marketing, and engineering.

Types of variables

Independent Variables:

Dependent Variables:

Continuous Data (you can measure):

In contrast, discrete data (you can count) are those that can only take specific values, usually integers, such as the number of children in a family (0, 1, 2, etc.) or the number of cars a person owns.