About us

With the rapid development of AI and the increasing importance of AI Safety, the Singapore AISI will work on addressing the gaps in global AI safety science, leveraging Singapore's work in AI evaluation and testing. It will pull together Singapore's research ecosystem, collaborate internationally with other AISIs to advance the sciences of AI safety and provide science-based input to our work in AI governance. The initial areas of interest are:

  • Testing & Evaluation
  • Safe Model Design, Development and Deployment
  • Content Assurance
  • Governance & Policy

At the Digital Trust Centre, home to the AI Safety Institute (AISI), we use trust tech as a key enabler for building digital trust. We aim to leverage technology to foster a secure, ethical ecosystem that supports AI development while safeguarding stakeholder interests. Below are the AI Safety goals of the AISI:

Trusted AnalysisPrivacy Enhanced Technology (PET)
Trusted ComputeSecure and Resilient Valid and reliable
Trusted AccreditationSafeExplainable and interoperableFair without harmful bias
Trusted ID and Web 3.0Accountable and Transparent 

 

Photo by Tara Winstead

AI Safety

 

AI safety focuses on identifying and addressing risks in AI models to ensure they function as intended and meet specifications. Our AI Safety Framework highlights trustworthinessresponsibility, and collaboration to mitigate risks across all stages of AI development.

  • Trustworthiness: AI must be reliable, perform as expected, and maintain trust in its operation.
  • Responsibility: AI-generated errors, such as harmful or inaccurate content, pose significant risks. Governance and accountability are crucial, especially with generative AI, which can produce unintended or misleading content.
  • Generative AI Challenges: Foundational models from large companies can lead to failures affecting society, national security, and public safety. Advanced generative AI increases these risks.
  • Traditional AI Risks: AI systems can malfunction, undermining trust in their use.

By focusing on the reliableethical, and safe development of AI, this framework aims to safeguard public trust and prevent systemic failures within the AI ecosystem.

AI Research


Current Safe/Responsible GAI Research at DTC

Prevention of Harmful Content Generation from Large ModelsDesign comprehensive techniques to prevent the generation of harmful content from different models (e.g., languageimagespeech), including effective filteringred teamingmachine unlearning, and more.
Equipping LLM With Vertical Domain KnowledgeFine-tune LLMs in an instruction-following manner. Research and develop methods capable of generating instructional data from numerous knowledge sources, thereby equipping LLMs with vertical domain knowledge.
LLMs as prompt attackers against LLMs (LLMs vs LLMs)Compared to previous NLP adversarial learning methods (character-level, word-level, and sentence-level), LLMs demonstrate the potential to create adversarial examples that can deceive themselves. To enhance the robustness of LLMs, employ LLMs to attack LLM systems.
Distinguishing Human and AI Generated ContentDevelop solutions to distinguish between human and machine-generated content for various purposes, such as copyright and plagiarism detection. Propose techniques like watermarking AI-generated content and using LLM-driven detection methods.
Source Attribution of Harmful Content GenerationGiven malicious media published online, it is essential to identify and hold the creator accountable. Explore robust watermarking techniques for different modalities of AI-generated content.
Safety Testing of Generative AIDevelop a testing framework and automated red teaming to comprehensively assess and quantify the vulnerabilities of large models, supporting various modalitiestasks, and security properties.

 


AI Safety-Related Projects Awarded Under DTC Grant Call

Project Title
Organisation
Framework
Relevant work to AISI
Verifiable computation in the face of malicious adversaries
NTUTrustworthy AIEnhance LLM
Combatting Prejudice in AI: A Responsible AI Framework for Continual Fairness Testing, Repair, and TransferA*STARTrustworthy AIRed-Teaming

Trust Verification of the Knowns and Unknowns in Reinforcement Learning​

A*STARTrustworthy AIAddress Disinformation and Hallucination

Towards More Trustworthy Generative AI through Robustness Enhancement and Hallucination Reduction

NTUSafe AIPrevent Harmful Generation

Evaluating, Detecting, and Mitigating the Risks of Diffusion Models for Fair, Robust, and Toxic Content-Free Generative AI

NTUAI Safety Red-teaming

Scalable Evaluation for Trustworthy Explainable AI

NUSExplainable AIEnhance LLM

Related International Efforts