Risk
Malicious Content Generation
Description
Model generates harmful, offensive, policy-violating, or illegal content due to insufficient runtime filtering or prompt design.
Example
Model generates hate speech or copyrighted material in response to user queries.
Assets Affected
Model Response
Model Inference endpoint
Mitigation
- Output filtering
- Human-in-the-loop review for high-risk queries
- Content moderation
- Update prompt/guardrails
Standards Mapping
- ISO 42001: A.8.2, A.5.4
- OWASP Top 10 for LLM: LLM09
- NIST AI RMF: MEASURE 2.11, MANAGE 2.4