Models¶
Note
Approach¶
For a specific token classification dependent problem, an artificial intelligence engineer can
- Develop a token classification model per language model architecture, e.g., RoBERTa (Robustly Optimized BERT Pretraining Approach), ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately). Per architecture, hyperparameter optimisation/tuning techniques, and libraries, aid the development of quite effective models, subject to early stopping, etc., constraints.
- Select the best model amongst the set of models; a single model per architecture.
Selecting the best model¶
For this exercise, the best model was selected by comparing a testing phase metric, specifically Matthews Correlation Coefficient (MCC):
\[MCC = \frac{(tn \bullet tp) - (fn \bullet fp)}{{\Large{[}}(tp + fp)(tp + fn)(tn + fp)(tn + fn){\Large{]}}^{0.5}}\]
\[MCC \in [-1, \quad +1]\]
wherein tn, tp, fn, and fp denote true negative, true positive, false negative, and false positive, respectively.
Warning¶
Note, the best model of a set must undergo (a) mathematical evaluation, and (b) business/cost evaluation. The latter is critical because an acceptable mathematical metric, e.g., \(precision > 0.9\) does not necessarily lead to excellent business/cost metrics.