Models¶

Note

Approach¶

For a specific token classification dependent problem, an artificial intelligence engineer can

Develop a token classification model per language model architecture, e.g., RoBERTa (Robustly Optimized BERT Pretraining Approach), ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately). Per architecture, hyperparameter optimisation/tuning techniques, and libraries, aid the development of quite effective models, subject to early stopping, etc., constraints.
Select the best model amongst the set of models; a single model per architecture.

Selecting the best model¶

For this exercise, the best model was selected by comparing a testing phase metric, specifically Matthews Correlation Coefficient (MCC):

\[MCC = \frac{(tn \bullet tp) - (fn \bullet fp)}{{\Large{[}}(tp + fp)(tp + fn)(tn + fp)(tn + fn){\Large{]}}^{0.5}}\]

\[MCC \in [-1, \quad +1]\]

wherein tn, tp, fn, and fp denote true negative, true positive, false negative, and false positive, respectively.

Warning¶

Note, the best model of a set must undergo (a) mathematical evaluation, and (b) business/cost evaluation. The latter is critical because an acceptable mathematical metric, e.g., \(precision > 0.9\) does not necessarily lead to excellent business/cost metrics.