Including ‘unpopular’ reagents and reaction conditions into datasets could lead to better machine-learning models
Scientists have identified human biases in datasets used to train machine-learning models for computer-aided syntheses.1 They found that models trained on a small randomised sample of reactions outperformed those trained on larger human-selected datasets. The results show the importance of including experimental results that people might think are unimportant when it comes to developing computer programs for chemists.