This is a nn_module
representing the TabNet architecture from
Attentive Interpretable Tabular Deep Learning.
Initial number of features.
Dimension of network output. Examples : one for regression, 2 for binary classification etc.. Vector of those dimensions in case of multi-output.
Dimension of the prediction layer (usually between 4 and 64).
Dimension of the attention layer (usually between 4 and 64).
Number of successive steps in the network (usually between 3 and 10).
Scaling factor for attention updates (usually between 1 and 2).
Index of each categorical column in the dataset.
Number of categories in each categorical column.
Size of the embedding of categorical features if int, all categorical features will have same embedding size if list of int, every corresponding feature will have specific size.
Number of independent GLU layer in each GLU block of the encoder.
Number of shared GLU layer in each GLU block of the encoder.
Avoid log(0), this should be kept very low.
Batch size for Ghost Batch Normalization.
Numerical value between 0 and 1 which will be used for momentum in all batch norm.
Either "sparsemax" or "entmax" : this is the masking function to use.