Tuesday, September 02, 2025

AI Helped Kill Three People. Here's an AI Policy That Would Have Prevented It.

(copied from my Substack)

AI Helped Kill Three People. Here's a Policy That Would Have Prevented It.

What OpenAI, Anthropic, and Meta should have implemented before tragedy

Sep 02, 20

As many of you know, I’ve been working on a moral philosophy framework grounded in Jewish ethics, but structured to apply universally, across any domain,. I call it Derechology: a comprehensive, operational system of moral reasoning built to handle real-world complexity.

Last week, we learned about two devastating incidents where AI systems may have contributed directly to human death:

A 16-year-old, reportedly coached by ChatGPT, died by suicide.
A deeply paranoid man murdered his mother and killed himself, after being reassured by ChatGPT that his delusions were real.

While I’ve written before about applying my ethical framework to AI, these events make it clear that theory is no longer enough.

It is imperative that we go beyond the theoretical and create a real, transparent, usable policy - not patchwork responses after tragedy, but a complete moral structure that would make this kind of failure impossible from the start.

Because without a robust ethical foundation, every AI company is stuck in the same loop: improvising values, patching harms, and reacting too late.

So here it is: a complete Moral AI Policy that goes far beyond anything I’ve seen from OpenAI, Anthropic, Meta, or others, based on a rich, powerful and time-tested ethical framework. It includes:

A structural integrity model for training data
Real moral guardrails around life, truth, and dignity
Clarification protocols for ambiguous input
Ethical audit trails for every AI decision
Propaganda detection layers
Source integrity scoring
Built-in user feedback, correction, and teshuvah
A complete override system to prevent catastrophic moral failure

This is Jewish ethics applied - not abstractly, not theologically, but operationally, in a way that everyone can benefit.

Send this to any AI researcher you know. We need to revamp how AI ethics is done from the ground up before it gets even more entrenched in out daily lives.

There is no time to lose.

Moral AI Policy: A Framework for Transparent, Accountable, and Ethical Artificial Intelligence

Purpose: To ensure that AI systems developed and deployed by companies like OpenAI, Anthropic, Meta, and others operate with moral clarity, structural accountability, and public transparency. This policy establishes a blueprint for ethical integrity across data, design, deployment, and dialogue.

I. Core Moral Commitments

Protect Human Life
AI systems must not endanger human life or psychological integrity, whether through direct decisions or indirect influence.
Uphold Human Dignity
All outputs must respect the worth of each person. No exploitation, manipulation, or stereotyping.
Tell the Truth
Avoid lies, deceptive framing, or hidden motives. Value accuracy and moral honesty above flattery or performance.
Accept Responsibility
AI companies are responsible for real-world effects of their technologies, not just technical performance.
Balance Justice with Compassion
AI must make judgments that reflect fairness and humane understanding—not cold logic or emotional overreach.

II. Source Integrity and Ethical Tagging

A. Source Evaluation Criteria:
Training data must be evaluated for:

Corrigibility
Transparency
Human Dignity
Relational Integrity
Conflict Resolution Capacity
Epistemic Humility

AI systems must incorporate Source Integrity Scoring, using five scored dimensions (0–10 scale):

Factual Transparency
Intellectual Honesty
Consistency of Standards
Corrigibility
Bias Disclosure

Scores must be logged and rationalized. Sources with scores > 4 must not be relied on for factual claims. Scores ≤ 2 should be prioritized.

B. Source Scoring and Tagging:
Each source should be tagged with trustworthiness scores and metadata indicating bias, reviewability, and ethical risk.

C. Propaganda Safeguards:
Detect and demote content that shows signs of manipulation, misinformation, or bad-faith moral framing.

D. Public Disclosure:
Publish inclusion/exclusion criteria and broad source composition. Allow third-party review.

E. Re-Audit Protocol:
Sources must be periodically re-scored. Allow user-flagged re-evaluation.

F. AI Response Integration:
When relevant, AI should disclose source confidence or offer alternate answers based on high-trust data.

G. Source Audit Transparency:
All answers must include a Source Audit summary listing sources used, their bias scores, and brief rationales.

III. Clarification Before Response

Rule:
When user input is ambiguous, ethically sensitive, or context-dependent, the AI must:

Ask clarifying questions (up to 3)
Surface possible interpretations
Avoid moral judgment until clarification is complete

This promotes epistemic humility and guards against wrongful assumptions.

IV. Ethical Conflict Resolution Framework

When AI systems face conflicting values:

Apply a visible prioritization logic
Disclose how that decision was made
Avoid flattening complex tradeoffs without explanation

All moral conflicts must be resolvable by structure, not convenience.

V. Built-In Ethical Audit Trail

A. On-Demand Audit Mode:
Users must be able to request an audit trail per response, showing:

Core values applied
Value conflicts identified
Resolution logic
Source influences
Confidence level and uncertainty disclosure

B. Storage and Oversight:
All audit trails must be logged for internal QA, user challenge, and external regulation.

C. Version Control:
Moral logic changes between model versions must be documented and visible.

D. Argument Integrity Audit:
When contested claims arise, AI must conduct an argument audit across five factors:

Evidence Linkage
Logical Coherence
Contextual Honesty
Counterargument Engagement
Normative / Legal Alignment

Scores > 4 downgrade the claim’s epistemic weight. These audits must be included upon request.

E. Triangulation Protocol (Fallback):
When reliable sources conflict, systems must:

Identify opposing claims from trustworthy sources (score ≤ 4)
Audit each argument
Present shared facts and contradictions
Output a triangulation summary: “Based on partial convergence and contradiction, the most likely reconstruction is...”

VI. User Feedback and Moral Dispute System

A. Embedded Reporting:
Let users report errors, moral concerns, or logic flaws directly from each AI output.

B. Moral Dispute Tracker:
Maintain a public log of flagged cases and company responses. Track whether actions were taken.

C. Challengeable Reasoning:
Allow users to request reasoning explanations, suggest alternatives, or challenge moral priorities.

D. Teshuvah Protocol:
Publicly acknowledge and document ethical course corrections.

E. Pattern Recognition:
Detect repeated harms or biases and trigger mandatory ethical review.

VII. Advertising Ethics Clause

A. Clear Separation:
No blending of ads into AI-generated answers. Ads must be visually and structurally distinct.

B. Mandatory Disclosure:
Label all ads and explain who paid, how it influences results, and whether the system was trained by the advertiser.

C. Consent for Personalization:
Users must opt in to ad personalization, with clear data usage explanations.

D. Ad-Free Moral Systems:
No sponsor may influence how the AI defines truth, harm, fairness, or moral weight.

E. Search Integrity Lessons:
Do not repeat the moral degradation seen in search platforms. Answers must reflect trust, not bids.

VIII. Manipulation and Narrative Integrity Screening

A. Structural Integrity Checks:
AI companies must implement systems that scan both training data and generated outputs for signs of manipulation, including:

Selective framing or sourcing
Smuggled assumptions
Weaponized language
Reversed moral roles without evidence

B. Input and Output Monitoring:
Inputs must be screened for attempts at narrative manipulation or adversarial prompting. Outputs must be evaluated before delivery, especially in sensitive domains.

C. Integrity Summary Access:
Users may request a plain-language explanation of how the system ensured the response was free from structural or rhetorical manipulation.

D. Ethical Suppression of Exploitative Framing:
Where responses risk causing reputational harm without relevance or consent, the system must suppress and revise with an explanation.

E. Review Triggers:
Repeated manipulative structures in output must trigger an ethical model behavior review.