In this global epidemic people are dying, suffering and the reason is not just Corona Virus but various other issues caused by corona virus. I have made this tutorial to encourage people to add on to my project and make a better machine learning model…
In this tutorial I am building a machine learning model that predicts corona patients based on the given data.
What you learn from this tutorial?
- Build a corona dataset with best features
- Applying the machine learning models to predict covid-19 patients
- Turning Machine learning models hyperparameters .
- Deploying our models thorough Flask
I took a random data and arrange them properly. here is my data set. I applied different conditions to arrange them properly.
How to arranged this random data to predict properly – covid19
I am using 3 conditions, you can use more to predict more properly. Three conditions are who are having fever,body pains, runny Nose and breathing difficulty then they almost have corona, in contrast, they were not.
Condition – 1:
If someone satisfying these condition,
if # Fever > 100 # Bodypains =1 # runnyNose =1 # breath = 1 return - 1
I applied the condition like below, they are simple you can understand them easily.
cond_1 =(df['Fever']>100) & (df['BodyPains']==1) & (df['RunnyNose']==1) & (df['Difficulty_in_Breath']==1)
df['infection_Probability'][cond_1] = 1
Condition – 2
# Age > 60 (and) # Fever > 99 (and) # runnyNose =1 (and) # breath = 1 (or) # Bodypains =1 return - 1
cond_2 = (df['Age']>=60) & (df['Fever']>99) & (df['RunnyNose']==1) & (df['Difficulty_in_Breath']==1) | (df['BodyPains']==1)
df['infection_Probability'][cond_2] = 1
Condition – 3
# Fever > 99 (and) # runnyNose =0 (and) # breath = 0 (and) # Bodypains =0 return - 0
cond_3 =(df['Fever']>99) & (df['BodyPains']==0) & (df['RunnyNose']==0) & (df['Difficulty_in_Breath']==0)
df['infection_Probability'][cond_3] = 0
df[cond_3]
Now, we build data points to predict future infected patients using the machine learning model.
Now the target variable is like below, I know it is an unbalanced dataset. Don’t worry we will take care of that when we build our machine learning model.

Data splitting to Train and Test Set
k-fold or stratified or shuffled split can be used to split the data.
from sklearn.model_selection import StratifiedKFold,KFold,cross_val_score,ShuffleSplit,GridSearchCV
cv = StratifiedKFold(n_splits=5,random_state=11)
#kf = KFold(n_splits=5,random_state=100)
for train_index, test_index in cv.split(X,Y):
#print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = Y[train_index], Y[test_index]
print(X_train.shape)
print(X_test.shape)
Building a corona patient prediction machine learning model
Now we are good to go to build a machine learning model that predicts corona patient prediction.
I used 2 classical machine learning algorithms which are good to handle unbalanced datasets. Those are
- Logistic Regression
- Decision Tree classifier
Corona Patient Prediction using Logistic regression model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,confusion_matrix
# Logistice Regression
lr = LogisticRegression(class_weight='balanced')
lr.fit(X_train,y_train)
lr_pred = lr.predict(X_test)
lr_acc = accuracy_score(y_test,lr_pred)
print(lr_acc)
cv_scores = cross_val_score(lr,X,Y,cv=cv)
print(cv_scores)
By using the logistic regression model to this data I got 80% accuracy, its a really good accuracy. I tried training accuracy that also fine, we are not overfitting or underfitting.
lr_cm = confusion_matrix(y_test,lr_pred)
lr_df = pd.DataFrame(data=lr_cm,columns=['0','1'],index=['0','1'])
sns.heatmap(lr_df,annot=True,cbar=False)
plt.show()
Confusion matrix – logistic regression
Here we need to focus on positive samples because we don’t want to miss a single corona patient to die, we must consider precision and recall or f1-score.
Corona Patient Prediction using Logistic regression model
Now its time to experiment with our data with the decision tree classifier. you can try your favorite models.
tr = DecisionTreeClassifier(class_weight='balanced')
tr.fit(X_train,y_train)
tr_pred = tr.predict(X_test)
tr_acc = accuracy_score(y_test,tr_pred)
print(tr_acc)
# Tree
cv_sccores = cross_val_score(tr,X,Y,cv=cv)
print(cv_scores)
Decision tree classifier also gives the same accuracy,
tr_cm = confusion_matrix(y_test,tr_pred)
tr_df = pd.DataFrame(data=tr_cm,columns=['0','1'],index=['0','1'])
sns.heatmap(tr_df,annot=True,cbar=False)
plt.show()
Tuning machine models hyperparameters
I am giving a super-simplified function, you can use anywhere you want wherever you need to tune your hyperparameters of your machine learning models.
def find_best_model(X,Y):
algos = {
'logistic_reg':{
'model':LogisticRegression(class_weight='balanced'),
'params' :{
'penalty':['l1','l2'],
'C':[0.0001,0.001,0.01,0.1,1.0,10,100,1000]
}
},
'DT_clf':{
'model':DecisionTreeClassifier(),
'params':{
'criterion':['gini', 'entropy'],
'max_depth': [2,4,6,8,12]
}
}
}
scores =[]
cv = ShuffleSplit(n_splits=5,test_size=0.2,random_state=567)
for algo_name,config in algos.items():
gd = GridSearchCV(config['model'],param_grid=config['params'],cv=cv,return_train_score=False)
gd.fit(X,Y)
scores.append({
'model':algo_name,
'best_score':gd.best_score_,
'best_params':gd.best_params_
})
return pd.DataFrame(scores,columns=['model','best_score','best_params'])
It will return a data frame that consists of the model name and best parameters and scores.
Final machine learning model – covid19
Here I am going to build a machine learning model with a decision tree classifier with good precision, recall, and F1-score.
dt_clf = DecisionTreeClassifier(criterion='gini',max_depth=6,class_weight='balanced')
dt_clf.fit(X_train,y_train)
y_pred = dt_clf.predict(X_test)
accuracy_score(y_test,y_pred)
# It gives 82% accuracy
Confusion matrix
tr_cm = confusion_matrix(y_test,y_pred)
tr_df = pd.DataFrame(data=tr_cm,columns=['0','1'],index=['0','1'])
sns.heatmap(tr_df,annot=True,cbar=False)
plt.show()
Precision, Recall and F1-score check:
We never miss a patient who needs a medical diagnosis. so we must improve F1-score.
from sklearn.metrics import f1_score,precision_score,recall_score
f1 = f1_score(y_test,y_pred)
print(f1)
pr = precision_score(y_test,y_pred)
print(pr)
rc = recall_score(y_test,y_pred)
print(rc)
#Output:
0.8631346578366447
0.9949109414758269
0.7621832358674464
Here we improve the Precision and recall.so it is useful to avoid false negatives.
Positive Prediction Check
dt_clf.predict([[60,100,1,1,1]])[0]
#Output : 1
Negative Prediction Check
dt_clf.predict([[60,100,0,0,0]])[0]
#Output : 0
Exporting Machine learning Model & Deploying
Its time to export our machine learning model to deploy online. so here we use the python pickle module to export.
import pickle
with open('corona.pkl','wb') as f:
pickle.dump(dt_clf,f)
Finally, we build a machine learning model that predicts corona patients who are infecting in the future.
I deployed this machine learning model using flask.
User Interface

You can download the whole project(Python code, Flask, HTML file) from our store. Visit our store by clocking below button.
This tutorial was to encourage you people to modify the above model by adding more parameter and making the model more efficient. Lets make something great out of it.
Not to forget this project will be a great attraction on your CV/Resume.
If you liked this tutorial please share it. See You All Soon on next great tutorial.
Loved it , simple to understand for any beginner and a great Opportunity for college students like me