How Synthetic Data Is Unlocking New Opportunities for Intelligent Video

March 21, 2025
Pioneering AI research and synthetic data are revolutionizing industries by improving the accuracy, fairness and ethical sourcing of AI models, particularly in video analytics for safety and strategic decision-making.

Pioneering work by the so-called “Godfathers of AI,” 2018 Turing Award winners Yoshua Bengio, Geoffrey Hinton and Yann LeCun, along with Fei-Fei Li’s creation of ImageNet, helped lay the groundwork for modern artificial intelligence (AI), particularly in computer vision (CV). That’s particularly relevant for sensors that create image data, such as video, and it unlocked a host of new opportunities to improve the safety of our cities, transport, retail stores and more. 

Because of AI, organizations are now able to gain deeper insights to inform their strategies and make better decisions on where to build a new road, which products to place on a particular store shelf, and how to plan maintenance or cleaning schedules. It truly is a brave new world, transformed by the combination of video and AI. 

Accurate AI Requires Large Training Datasets

 However, to make these AI models as accurate as possible, training with huge datasets is needed. The datasets used to train AI models need to be representative, diverse to ensure accuracy and fairness, and legally sourced to respect data owners’ intellectual property rights. As AI evolves, the need for these large, annotated datasets becomes more pressing and obtaining this data isn’t always simple. Especially when dealing with sensors such as cameras that can collect a lot of personal or confidential information. Safety, privacy and practical limitations can restrict the amount and quality of data that an AI can be trained with. 

This is where synthetic data steps in to open up new opportunities. 

Solution Offered by Synthetic Data

 Synthetic data refers to artificially generated or augmented datasets that simulate real-world conditions. By using this data, AI developers can train models on vast amounts of diverse and representative information, while mitigating the ethical and legal concerns surrounding privacy and consent. Moreover, synthetic data can preserve key real-world characteristics, ensuring that models learn from realistic environments without needing to expose individuals to risk — and it is a ready-to-use source, which can speed up algorithm development time. 

What's more, synthetic data can help reduce bias in AI models. Traditional datasets are often shaped by the biases present in the original data collection process, which can skew the outcomes of AI decision-making. By designing synthetic data collection processes thoughtfully, developers can minimize the biases that arise from relying on historical datasets. 

Lastly, synthetic data is scalable and cost-effective. It enables AI developers to create vast, diverse datasets quickly and affordably, which is particularly useful for tasks that require specific, high-quality data that is not readily available. 

Data in Action: Protecting Danish Harbors 

The potential role of synthetic data in improving safety and saving lives can be seen in a research project in Denmark, where AI models used to detect someone falling into a harbor have been trained on different datasets including synthetic data. 

Unfortunately, Danish harbors have witnessed numerous drowning incidents over the years, with 1,647 lives lost between 2001 and 2015 in Danish waters, and a quarter of these tragedies occurring in harbors themselves. 

In one of Denmark’s busiest ports, Aalborg Harbor, researchers created the largest outdoor thermal dataset for video analytics to enable AI-equipped video cameras to detect different types of objects in a thermal setup. To cover fall incidents, volunteers were asked to fall into water. It was, however, too dangerous to ask human volunteers to do this. Moreover, jumping into a harbor looks different from someone accidentally losing their footing and falling in. The researchers also needed a representative dataset to cover wheelchair users, cyclists and skateboarders. 

Warmed-up dummies were used to mimic human bodies but again couldn’t fully capture the full complexity of a human falling into the harbor. Therefore, the best solution was synthetic data that could model more intricate behaviors and diverse falling scenarios. 

Using synthetic data, the project expanded its training dataset without compromising safety or ethical concerns. The AI model developed through this process show promising results to alert rescue teams if and when a person fell into the harbor, increasing the chances of survival by minimizing response times and reducing cold water exposure. 

Broader Applications for Synthetic Data

 Video analytics is ubiquitous across multiple industries and the same will apply to the synthetic data it is trained with. Further use cases include manufacturing, where synthetic data-trained AI models can ensure automated production lines function correctly. AI can detect anomalies in production or potential equipment failure. Collecting large amounts of production line footage can be risky, given the confidential information on manufacturing techniques and components. 

Synthetic data may also be helpful in healthcare settings where patient privacy is paramount and collecting training data for scenarios like falling might be too challenging. It can help to train models to detect when a dementia patient is lost and wandering the halls of a hospital, or for example, to alert staff when a care home patient has fallen out of their bed. 

A Growing Opportunity

 As we witness more uses of AI in video and other applications, so too can we expect a rise in the use of synthetic data. Providing a safe, ethical and scalable data source, this data can be the best option in some situations. Everyone working with data and video, therefore, should be aware of the opportunities that synthetic data brings to their AI's accuracy, representation and overall effectiveness.

About the Author

Barry Norton | Vice President of Research

Dr. Barry Norton is an accomplished leader in the fields of artificial intelligence (AI) and analytics, bringing over two decades of specialized expertise to his role as Vice President of Research at Milestone Systems. He oversees a 30-person research division which includes six scientists with PhDs, four PhD students, data scientists and engineers, focused on fundamental and applied research initiatives, as well as intellectual property development. Under Norton’s strategic direction, Milestone’s research division is helping pioneer new innovations in AI, machine learning and computer vision. Norton completed his PhD in Computer Science at the University of Sheffield.