diff --git a/README.md b/README.md index 91187a0..b8d4767 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,8 @@ A unified, extensible framework for text classification with categorical variabl - **ValueEncoder**: Pass raw string categorical values and labels directly — no manual integer encoding required. Build a `ValueEncoder` from `DictEncoder` or sklearn `LabelEncoder` instances once, and the wrapper handles encoding at train time and label decoding after prediction automatically. - **Unified yet highly customizable**: - Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer. - - Manipulate the components (`TextEmbedder`, `CategoricalVariableNet`, `ClassificationHead`) to easily create custom architectures - including **self-attention**. All of them are `torch.nn.Module` ! - - The `TextClassificationModel` class combines these components and can be extended for custom behavior. + - Text embedding is split into two composable stages: **`TokenEmbedder`** (token → per-token vectors, with optional self-attention) and **`SentenceEmbedder`** (aggregation: mean / first / last / label attention). Combine them with `CategoricalVariableNet` and `ClassificationHead` — all are `torch.nn.Module`. + - The `TextClassificationModel` class assembles these components and can be extended for custom behavior. - **Multiclass / multilabel classification support**: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks. - **PyTorch Lightning**: Automated training with callbacks, early stopping, and logging - **Easy experimentation**: Simple API for training, evaluating, and predicting with minimal code: