Science & Tech
Straight Arrow News / VideoElephant
When it comes to artificial intelligence companies training their models, Elon Musk says they have run out of data and "exhausted” the sum of human knowledge.
As a result, the billionaire businessman - who is also the owner of AI business xAI - says given the fast pace in which the technology is developing, he reckons "synthetic” data will need to be used to build and train the new systems.
Data from the internet is used to train AI models such as GPT-4o, the model behind the ChatGPT chatbot, and from gathering this information they learn to recognise patterns which gives the AI the ability to predict.
Now Musk has warned that using synthetic data created by AI to train new models is the "only way" to combat this problem.
“The only way to then supplement that is with synthetic data where … it will sort of write an essay or come up with a thesis and then will grade itself and … go through this process of self-learning," he said during a livestream on his social media platform X, formerly Twitter, as per The Guardian.
This isn't entirely a new thing as synthetic data is already being used by Facebook and Instagram owners, Meta, to train their Llama AI model. Similarly, Google and OpenAI (who created ChatGPT) are also using synthetic data in their AI endeavours, while AI-made data is being used in Microsoft's Phi-4 model too.
So what is the problem?
But there are concerns, as voiced by Musk and others, as to how synthetic data could be affected by AI models generating “hallucinations" which is a term used to describe it outputting incorrect or misleading results.
During a livestream with Mark Penn, the chair of the advertising group Stagwell, he described this aspect as "challenging" since with hallucinations "how do you know if it … hallucinated the answer or it’s a real answer”.
Musk's comments come as an academic paper estimates public data for AI models could run out as early as 2026, according to Andrew Duncan, the director of foundational AI at the UK’s Alan Turing Institute.
What could this mean for AI models in the future?
If we're too dependent on synthetic data, Duncan says this could potentially cause a "model collapse" meaning the quality of output will get progressively worse over time with "diminishing returns".
So when you log on to an AI chatbot, they could potentially provide information that is biased, incorrect and not creative - so not exactly ideal for users.
Additionally, he also noted the rise in AI-generated content on the internet may also contribute to this as AI models take in this information.
Elsewhere, the jobs most at risk of AI takeover, as predicted by the World Economic Forum.
How to join the indy100's free WhatsApp channel
Sign up to our free indy100 weekly newsletter
Have your say in our news democracy. Click the upvote icon at the top of the page to help raise this article through the indy100 rankings.
Top 100
The Conversation (0)