SAIL

/

Deploy - Runtime Guardrails

/

Malicious Content Generation

5.13

.

Malicious Content Generation

sail
5.13
Risk

Malicious Content Generation

Description

Model generates harmful, offensive, policy-violating, or illegal content due to insufficient runtime filtering or prompt design.

Example

Model generates hate speech or copyrighted material in response to user queries.

Assets Affected

Model Response

Model Inference endpoint

Mitigation
  • Output filtering
  • Human-in-the-loop review for high-risk queries
  • Content moderation
  • Update prompt/guardrails
Standards Mapping
  • ISO 42001: A.8.2, A.5.4
  • OWASP Top 10 for LLM: LLM09
  • NIST AI RMF: MEASURE 2.11, MANAGE 2.4