In our last post, we covered the introduction to web scraping, why do we need it, and the process of web scraping. In this post, we will look at the python library which is used for scraping and also write our first scraping code. We are going to perform web scraping for Premier League Table 2019-20 […]
Category Archives: Data Science
What is Web Scraping? Everything you need to know before writing your first scraping code
In this post, we are going to cover an introduction to web scraping. The aim of this post is to answer questions like: What is web scraping? When to use web scraping? The web scraping process and the python web scraping library which you can use. What is Web Scraping? The process of web scraping […]
What is the Confusion Matrix in Machine Learning? What is Type 1 and Type 2 Error?
Why do we need the confusion matrix? Well, if you don’t know then let me put it plainly for now. It is one of the techniques about how you can measure the performance of your model. By creating a confusion matrix, you can calculate recall, precision, f-measure, and accuracy as well. It is really simple […]
What are the different types of Clustering Algorithms? Its Applications and Usage
What is Clustering ? Clustering is a technique in which unsupervised data are grouped together based on similarities These groups are mutually exclusive. Clustering Algorithms Partitioned-based Clustering1. K-Means2. K-Median3. Fuzzy C-means Hierarchical Clustering4. Agglomerative5. Decisive Density-based Clustering6. DBSCAN Why Clustering ? Exploratory Data Analysis (EDA) Summary Generation Outlier Detection Finding Duplicates Pre-processing Step Applications of […]
3 Metrics to evaluate the accuracy of a KNN Model
After building the model, it is also important to define which metrics would be more suitable for the model. For simple linear regression where we have just one dependent and one independent variable, finding a correlation between them can do the job in finding out how much accuracy factor the model can provide. But the […]
Difference between Label Encoding and One-Hot Encoding | Pre-processing | Ordinal vs Nominal Data
No matter what programming language you use to write your code logic, machines understand the binary language of 1s and 0s. Similarly, it is easier for machines to deal with IP addresses than hostnames while on the contrary, humans prefer to deal with hostnames. The encoding logic in machine learning is more or less based […]
Types of Machine Learning Systems
Machine Learning Systems can be broadly classified into 3 categories. Let us discuss them in detail. Category 1 Whether or not they are trained with human supervision This category is sub-divided into – Supervised LearningIn a supervised learning method, the training data consists of labels. The target variable is trained using these labels to predict […]
Properties of OLS estimators and the fitted regression model
Simple linear model equation is denoted by Ordinary Least Squares is the most common method to estimate the parameters in a linear regression model regardless of the form of distribution of the error π. Least squares stand for the minimum square error or πππΈ (ππ’π ππ πππ’ππππ πΈππππ). A lower error results in a better […]
Data Analyst vs Data Engineer vs Data Scientist
Do you feel like the companies want a Super-Human when you read their job description? It is very important to know about the profile that you are applying for. The role names differ from company to company, for that reason, one should not fall for the role names but instead, insist on getting information about […]
Measures of relationship between variable | Correlation and Co-variance coefficient
While performing EDA (Exploratory Data Analysis) the most crucial step is to find the relationship between two or more variables to understand how one behaves when the other variable tends to change. This helps us to figure out the significance of each independent variable on the target and thus, create a model with a reduced […]