Synthetic data is both an answer and a problem
To train the next generation of AI algorithms, tech companies need new and bigger datasets of human language and knowledge, and even the billions of words in literature and the internet are no longer enough to satisfy their impatient appetites. A solution being explored by many companies is therefore to have their polyphagic progeny feed themselves: machine models can generate datasets for the next generation; an ‘infinite data generation’ machine as one chief executive of a tech company put it recently.