Published on 13 Feb 2024

Transforming the future of media with artificial intelligence

Artificial intelligence algorithms developed by NTU researchers can analyse emotions in text like a human, split long videos into shorter clips for searching and restore doctored images to their original state.

With the ability to analyse large datasets to identify patterns and predict outcomes, all at the click of a button, artificial intelligence (AI) is revolutionising how we live and work. From offering personalised recommendations to automating tedious tasks, AI can help us make better decisions, work smarter and reduce the likelihood of errors.

Chatbots powered by AI, such as ChatGPT, have transformed the media landscape. They can now have human-like conversations, generate content and analyse emotions from text – abilities once thought to be uniquely human.

Given the sheer amount of social media posts and information on the Internet, AI’s ability to decode emotions from words could be a game-changer for applications such as sentiment analysis in media monitoring and blocking malicious content.

Decoding emotions

However, AI is still not as effective as humans at recognising emotions from text. Understanding emotional tones from written words involves understanding the world and social norms that humans learn through experience, which AI cannot do.

An AI platform, SenticNet, has been devised to address the challenges faced by AI when making sense of human languages. Developed by Prof Erik Cambria from NTU’s School of Computer Science and Engineering (SCSE), SenticNet integrates human learning modes with traditional learning approaches that machines use to improve the algorithm's ability to analyse emotions.

SenticNet follows a logical process to infer the sentiments expressed in a sentence by categorising word meanings in a framework resembling commonsense reasoning. Unlike conventional sentiment analysis models, which are often ‘black boxes’ that do not provide any insights into their internal reasoning process, the processes by which SenticNet derives its results are transparent, and the results are reproducible and reliable.

Hourglass of emotions where positive emotions are at the top of the hourglass, while their corresponding negative emotions are at the bottom

The Hourglass of Emotions, a comprehensive and multidimensional framework for interpreting emotions, is inspired by neuroscience and motivated by psychology. Positive emotions are at the top of the hourglass, while their corresponding negative emotions are at the bottom. Credit: NTU Singapore.

“AI systems are becoming less and less transparent, and we hope SenticNet will be able to extract sentiments from text in an explainable manner without compromising performance,” said Prof Cambria.

The researchers have demonstrated that combining commonsense reasoning with machine-learning approaches improves performance. When tested, SenticNet outperformed other machine-learning models.

The latest version of SenticNet was reported in the Proceedings of the 13th Conference on Language Resources and Evaluation in 2022.

Prof Cambria is also working on improving SenticNet’s ability to encode and decode the meaning behind abstract concepts – a major challenge for AI systems as they do not possess the rich sensory experiences that humans have of the real world.

Searching videos

With moving visuals and sound, videos are an engaging way to convey messages and teach concepts. To allow users to better engage with video content for education and entertainment, a method developed by Assoc Prof Sun Aixin at SCSE makes video content searchable by matching keywords with on-screen images.

Conventional computer vision techniques can do this but are not as efficient when searching for images in long videos.

Assoc Prof Sun and his colleagues developed an algorithm that treats a video as a text passage so that people can search for specific moments in the clip. Using the method, a long video can thus be split into multiple shorter clips for searching.

Illustration of how a long video is split into shorter segments of multiple scales for searching. Some clips contain ideal matches, while others contain non-ideal matches. The model derives answers from all segments with potential matches. Credit: NTU Singapore.

“This simple and effective strategy enables images in long videos to be searched efficiently, addressing the issue of performance degradation commonly encountered by conventional computer vision techniques when searching long videos,” said Assoc Prof Sun.

The findings have been published in IEEE Transactions on Pattern Analysis and Machine Intelligence in 2022.

The researchers are currently working to enhance the algorithm's search accuracy and explore its usage on visual content in medical education and surveillance videos.

Detecting fake images

Like any new technology, AI is a double-edged sword. Unfortunately, new threats, such as fake images designed to fool or scam audiences, have surfaced alongside the advancement of AI tools.

For instance, facial manipulation technologies can create photorealistic faces that may be used nefariously to mislead people.

Work fronted by Asst Prof Liu Ziwei at SCSE has resulted in an algorithm called Seq-DeepFake, which flags doctored images by recognising digital fingerprints left by facial manipulation.

Unlike conventional deep fake detection methods that only predict whether images are real or fake, Seq-DeepFake detects the traces left behind by the manipulation sequentially. The alterations are detected within seconds.

The algorithm can also recover the original face from the manipulated face by reversing the manipulation sequence.

Seq-DeepFake can detect fake images and recover original images from altered ones. Credit: NTU Singapore.

“Seq-DeepFake is a powerful tool that can potentially help everyone from government organisations to individual users verify the authenticity of visual information in the digital age to combat misinformation,” said Asst Prof Liu.

In the future, Asst Prof Liu plans to expand the capabilities of Seq-DeepFake to detect other forms of doctored media such as text and videos.

The findings have been published in the European Conference on Computer Vision in 2022.