A way to fix the social media hate problem - without violating free speech ~ Elder Of Ziyon

Wednesday, August 06, 2025

A way to fix the social media hate problem - without violating free speech

Earlier today I saw an interview on Quillette with Dr. Andre Oboler, the CEO of the Online Hate Prevention Institute, about online hate and what can be done about it.

I realized that this could be another great application of derechology, my universal ethical framework based on Jewish ethics..

So I started a discussion with AskHillel, the AI I built using derechology principles, and after some back and forth we came up with a social media policy - heavily leaning on AI to implement - that would leave the posters, the readers and the social media companies themselves feeling much better than they do today.

The problem now is that there are no clear standards, there is no transparency, the social media users who are offended do not see any recourse that ever works and the people being censored don't have a clear idea why. The social media companies are inundated with requests for review that swamp them. The whole thing is a mess.

This can be solved.

First of all is the standards. These should be values, not detailed rules, as far as what is not allowed and what is potentially a problem. The values should follow the derechological baseline values: protection of life, dignity of people, mutual responsibility.

When a person posts something that is illegal, like child pornography, there is no choice: it must be stopped and reported.

But the vast majority of issues that are gray areas like phrases that can mean incitement to violence but can have innocuous interpretations as well, or negative stereotypes of groups of people, can be dealt with by AI before they are posted. The key is transparency. The AI can explain why the post might be a violation of the platform's values - and then offer for the user to reword it, or offer to rephrase it itself, until both sides approve the message and it can be posted. If the user disagrees and insists that it be posted as-is, the AI will allow it but will inform the user that the post will have a flag attached, and/or it will be limited in visibility.

This way the platform does not look like a censor but as a partner, assuming good faith and wanting to work together to craft a message that would not hurt others.

On the other side, if a user is offended by a post, the AI can explain why it was allowed, and discuss that with the user as well. The user might point out, for example, that the post used a dog-whistle that has a hidden racist meaning. In that case, the AI can log the issue and it can be referred to humans for further research. Otherwise, the AI can offer not only to block that poster for the user but also to block other posts that share the same issues.

Everything has to be upfront and honest. If the AI cannot assure the user that a human will review every case, it should say so, but also point out that (given user permission) the discussions can be logged and aggregated in case there are many people who are offended. If a user has a pattern of offensive posts, the AI can inform them that after a specific score is reached they may be suspended. But the reasons must always be clear.

This method is so much better than what is happening now. There are no black boxes - reasons are always available and the rules, and consequences, are public. The social media platform is not presented as authoritarian but as caring. The AIs would be polite and engaging. And the number of posts that require human review would go down greatly, helping the social media companies.

This is yet another way derechology can take a seemingly intractable problem and view it anew through a lens of values, responsibility and humility to help everyone get what they want.

_______________________

Here is the full suggested design:

Ethical Design Document: Universal Social Media Policy (Value-Aligned Framework)
Purpose: To implement a values-rooted, universal social media policy for a mainstream platform, balancing freedom of expression with moral responsibility. This framework draws from foundational ethical principles and is designed to be inclusive and applicable across diverse contexts.
I. Core Ethical Framework
Principle Function
Inherent Human Worth Every user has dignity. Harmful content must be addressed respectfully, not erased thoughtlessly.
Truth and Honesty All moderation actions must be transparent, fact-based, and subject to review.
Shared Responsibility Platforms are accountable for what they allow or amplify. Silence or inaction can cause real harm.
Duty to Prevent Harm Platforms must not stand by when foreseeable harm could occur.
No Enabling of Harmful Behavior Platforms must avoid features that promote outrage, bullying, or manipulation.
Public Integrity Mishandling speech ethics undermines trust in the platform and the communities it serves.
Humility in Automation AI systems must acknowledge their limitations. Every user has a right to appeal and clarity.
II. AI Moderation Logic
1. Harm Detection Thresholds AI flags content likely to cause harm based on:
Dehumanizing language
Incitement to violence or discrimination
Misleading or doctored content
Personal attacks, group slurs, or mockery of suffering
2. Real-Time Ethical Dialogue Before publishing, users receive a contextual message:
"This post may be perceived as harmful due to [reason]. Our ethical guidelines emphasize dignity and respectful communication. Would you like to revise, discuss, or continue as-is?"
Options:
"Edit with Suggestions"
"Discuss with AI"
"Post Anyway (Visibility May Be Reduced)"
"Learn More About This Warning"
If "Discuss with AI" is selected:
The AI engages in a structured, respectful dialogue to understand user intent.
The user may explain context, clarify meaning, or propose alternate wording.
Together, the AI and user may co-create a revised version that preserves intent while reducing risk of harm or misunderstanding.
At the end of the interaction, the user is asked:
"Would you like to anonymously share this dialogue with the platform's ethics team to help improve our policies?"
If accepted, the data is sent anonymized and used for policy refinement.
This supports ongoing ethical learning and accountability — a model of platform-level course correction.
3. Visibility Management If posted without revision:
Post is algorithmically downranked
Visible advisory label is attached
Viewers may choose to hide, report, or engage with content thoughtfully
4. Appeal and Oversight
All flagged content can be appealed
Human reviewers trained in ethics review each case
AI decision-making is transparent and available for scrutiny
5. Hard Threshold for Illegal or Dangerous Content Some content must be removed immediately and cannot be published under any condition. This includes:
Verified illegal material (e.g., child exploitation, terror propaganda, threats of violence)
Clear and imminent incitement to violence
Content explicitly designed to cause harm or violate platform or legal safety standards
For such content:
No option to edit or post is provided
AI issues a clear explanation and cites relevant policy or legal standard
Content and metadata are quarantined for audit purposes
If criminal in nature, the platform reports to appropriate authorities, even if the content was never posted. This includes mandatory reporting of child exploitation material, as required by law. In 2024 alone, over 36 million such reports were filed globally.
An appeal process exists, but the default action is immediate suppression and referral
III. Platform Integrity Measures
Transparency Portal: Public access to all moderation rules and the ethics behind them
Graceful Correction: Users can revise or delete content without punishment or shame
Propaganda Safeguards: Moderation training and data screening guard against misinformation, manipulation, and biased framing
Protection of Diverse Voices: Disagreement is welcome; only speech that causes harm is moderated
IV. Platform Message to Users
"Speech is power. Use it as if every person matters — because they do."
V. User Response to Perceived Harm
If a user encounters content they find offensive or harmful, they are offered a respectful pathway to respond:
Flag and Explain: The user may flag the content and describe — in their own words — why they found it troubling.
AI Acknowledgment and Clarification: The AI responds by explaining why the content was not automatically flagged, while respectfully acknowledging the user's experience.
Offer of Anonymous Logging: The user is asked:
"Would you like to anonymously share this flag and explanation with the platform's ethics team to inform future policy adjustments?"
If accepted, the data is anonymized and logged.
Users are informed that while not all cases receive individual review, all are weighted using transparent criteria and can influence platform-wide ethical refinement.
Personal Content Controls:
Users may choose to block the individual post, the user who posted it, or all content matching similar categories or patterns.
Settings are customizable, respectful, and clearly explained.
This process ensures both dignity and protection for those affected by harmful speech, fostering a culture of mutual responsibility and continuous learning.
Note: This policy expresses ethical reasoning and universal principles of responsible communication. It does not replace legal compliance or cultural sensitivity, but aims to create a safe and respectful digital public square.

Principle	Function
Inherent Human Worth	Every user has dignity. Harmful content must be addressed respectfully, not erased thoughtlessly.
Truth and Honesty	All moderation actions must be transparent, fact-based, and subject to review.
Shared Responsibility	Platforms are accountable for what they allow or amplify. Silence or inaction can cause real harm.
Duty to Prevent Harm	Platforms must not stand by when foreseeable harm could occur.
No Enabling of Harmful Behavior	Platforms must avoid features that promote outrage, bullying, or manipulation.
Public Integrity	Mishandling speech ethics undermines trust in the platform and the communities it serves.
Humility in Automation	AI systems must acknowledge their limitations. Every user has a right to appeal and clarity.