Using Text Features along with Categorical and Numerical Features

Param Saraf
1 min readMar 24, 2021

--

First Approach:
1. Use TFIDF and add all the new tfidf tokens (or columns) as new features to your existing feature set and then create your model
2. TruncatedSVD can be used here to reduce dimensionality but it still doesn’t help as we are adding lots of columns just for 1 feature

Second Approach:
1. Create a Text Model separately on your text features. Do k-fold validation and store your oof (out of fold) predictions (i.e. unseen data in training)
2. Add this oof prediction as a single feature in your existing feature set

Advantage of second approach is that it doesn’t increase dimensionality of the data and also gives you flexibility to add lots of Text Models using different Text Features to your final feature set

Also using OOF predictions makes sure that there is no Target Leakage. If you want to use other models like BERT, same approach of OOF can be followed

--

--

Param Saraf

Data Scientist | Machine Learning Engineer | Power BI/ MSBI Expert