# Statistics and distance based features

groupby and nearest neighbor methods

#### 例子：这里有一些CTR任务的数据

• More feature
• How many pages user visited
• Standard deviation of prices
• Most visited page
• Many, many more

### Neighbors

• Explicit group is not needed
• More flexible
• Much harder to implement

Examples

• Number of houses in 500m, 1000m,..
• Average price per square meter in 500m, 1000m,..
• Number of schools/supermarkets/parking lots in 500m, 1000m,..
• Distance to colsest subway station

#### KNN features in springleaf

• Mean encode all the variables
• For every point, find 2000 nearst neighbors using Bray-Curtis metric
$$\frac{\sum{|u_i - v_i|}}{\sum{|u_i + v_i|}}$$
• Calculate various features from those 2000 neighbors

Evaluate

• Mean target of neatrest 5,10,15,500,2000, neighbors
• Mean distance to 10 closest neighbors
• Mean distance to 10 closest neighbors with target 1
• Mean distance to 10 closest neighbors with target 0

# Matrix factorizations for feature extraction

• Example of feature fusion

#### Notes about Matrix Fatorization

• Can be apply only for some columns
• Can provide additional diversity
• Good for ensembles
• It is lossy transformation.Its’ efficirncy depends on:
• Number of latent factors
• Usually 5-100

#### Implementtation

• Serveral MF methods you can find in sklearn
• SVD and PCA
• Standart tools for Matrix Fatorization
• TruncatedSVD
• Works with sparse matrices
• Non-negative Matrix Fatorization(NMF)
• Ensures that all latent fators are non-negative
• Good for counts-like data

#### NMF for tree-based methods

non-negative matrix factorization简称NMF，它以一种使数据更适合决策树的方式转换数据。

### Conclusion

• Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
• It can be applied for transforming categorical features into real-valued
• Many of tricks trick suitable for linear models can be useful for MF

## Feature interactions

• Example:banner selection

auto_part game_news 0
music_tickets music_news .. 1
mobile_phones auto_blog 0

auto_part | game_news 0
music_tickets | music_news .. 1
mobile_phones | auto_blog 0

• Example of interactions

• 相似的想法也可用于数值变量

• Multiplication
• Sum
• Diff
• Division
• ..

### Practival Notes

• We have a lot of possible interactions -N*N for N features.
• a. Even more if use several types in interactions
• Need ti reduce it’s number
• a. Dimensionality reduction
• b. Feature selection

### Interactions’ order

• We looked at 2nd order interactions.
• Such approach can be generalized for higher orders.
• It is hard to do generation and selection automatically.
• Manual building of high-order interactions is some kind of art.

### Extract features from DT

• How to use it

In sklearn:

In xgboost:

### Conclusion

• We looked at ways to build an interaction of categorical attributes
• Extended this approach to real-valued features
• Learn how to extract features via decision trees

## t-SNE

### Practical Notes

• Result heavily depends on hyperparameters(perplexity)
• Good practice is to use several projections with different perplexities(5-100)
• Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams
• Train and test should be projected together
• tSNE runs for a long time with a big number of features
• it is common to do dimensionality reduction before projection.
• Implementation of tSNE can be found in sklearn library.
• But personally I perfer you use stand-alone implementation python package tsne due to its’ faster speed.

### Conclusion

• tSNE is a great tool for visualization
• It can be used as feature as well
• Be careful with interpretation of results
• Try different perplexities