Wednesday, August 06, 2025

Earlier today I saw an interview on Quillette with Dr. Andre Oboler, the CEO of the Online Hate Prevention Institute, about online hate and what can be done about it.

I realized that this could be another great application of derechology, my universal ethical framework based on Jewish ethics.. 

So I started a discussion with AskHillel, the AI I built using derechology principles, and after some back and forth we came up with a social media policy - heavily leaning on AI to implement - that would leave the posters, the readers and the social media companies themselves feeling much better than they do today.

The problem now is that there are no clear standards, there is no transparency, the social media users who are offended do not see any recourse that ever works and the people being censored don't have a clear idea why. The social media companies are inundated with requests for review that swamp them. The whole thing is a mess.

This can be solved.

First of all is the standards. These should be values, not detailed rules, as far as what is not allowed and what is potentially a problem. The values should follow the derechological baseline values: protection of life, dignity of people, mutual responsibility. 

When a person posts something that is illegal, like child pornography, there is no choice: it must be stopped and reported.

But the vast majority of issues that are gray areas like phrases that can mean incitement to violence but can have innocuous interpretations as well, or negative stereotypes of groups of people, can be dealt with by AI before they are posted. The key is transparency. The AI can explain why the post might be a violation of the platform's values - and then offer for the user to reword it, or offer to rephrase it itself, until both sides approve the message and it can be posted. If the user disagrees and insists that it be posted as-is, the AI will allow it but will inform the user that the post will have a flag attached, and/or it will be limited in visibility.  

This way the platform does not look like a censor but as a partner, assuming good faith and wanting to work together to craft a message that would not hurt others. 

On the other side, if a user is offended by a post, the AI can explain why it was allowed, and discuss that with the user as well. The user might point out, for example, that the post used a dog-whistle that has a hidden racist meaning. In that case, the AI can log the issue and it can be referred to humans for further research. Otherwise, the AI can offer not only to block that poster for the user but also to block other posts that share the same issues. 

Everything has to be upfront and honest. If the AI cannot assure the user that a human will review every case, it should say so, but also point out that (given user permission) the discussions can be logged and aggregated in case there are many people who are offended. If a user has a pattern of offensive posts, the AI can inform them that after a specific score is reached they may be suspended. But the reasons must always be clear.

This method is so much better than what is happening now. There are no black boxes - reasons are always available and the rules, and consequences, are public.  The social media platform is not presented as authoritarian but as caring. The AIs would be polite and engaging. And the number of posts that require human review would go down greatly, helping the social media companies.

This is yet another way derechology  can take a seemingly intractable problem and view it anew through a lens of values, responsibility and humility to help everyone get what they want.
_______________________

Here is the full suggested design:

Ethical Design Document: Universal Social Media Policy (Value-Aligned Framework)

Purpose: To implement a values-rooted, universal social media policy for a mainstream platform, balancing freedom of expression with moral responsibility. This framework draws from foundational ethical principles and is designed to be inclusive and applicable across diverse contexts.


I. Core Ethical Framework

PrincipleFunction
Inherent Human WorthEvery user has dignity. Harmful content must be addressed respectfully, not erased thoughtlessly.
Truth and HonestyAll moderation actions must be transparent, fact-based, and subject to review.
Shared ResponsibilityPlatforms are accountable for what they allow or amplify. Silence or inaction can cause real harm.
Duty to Prevent HarmPlatforms must not stand by when foreseeable harm could occur.
No Enabling of Harmful BehaviorPlatforms must avoid features that promote outrage, bullying, or manipulation.
Public IntegrityMishandling speech ethics undermines trust in the platform and the communities it serves.
Humility in AutomationAI systems must acknowledge their limitations. Every user has a right to appeal and clarity.

II. AI Moderation Logic

1. Harm Detection Thresholds AI flags content likely to cause harm based on:

  • Dehumanizing language

  • Incitement to violence or discrimination

  • Misleading or doctored content

  • Personal attacks, group slurs, or mockery of suffering

2. Real-Time Ethical Dialogue Before publishing, users receive a contextual message:

"This post may be perceived as harmful due to [reason]. Our ethical guidelines emphasize dignity and respectful communication. Would you like to revise, discuss, or continue as-is?"

Options:

  • "Edit with Suggestions"

  • "Discuss with AI"

  • "Post Anyway (Visibility May Be Reduced)"

  • "Learn More About This Warning"

If "Discuss with AI" is selected:

  • The AI engages in a structured, respectful dialogue to understand user intent.

  • The user may explain context, clarify meaning, or propose alternate wording.

  • Together, the AI and user may co-create a revised version that preserves intent while reducing risk of harm or misunderstanding.

  • At the end of the interaction, the user is asked:

    "Would you like to anonymously share this dialogue with the platform's ethics team to help improve our policies?"

    • If accepted, the data is sent anonymized and used for policy refinement.

    • This supports ongoing ethical learning and accountability — a model of platform-level course correction.

3. Visibility Management If posted without revision:

  • Post is algorithmically downranked

  • Visible advisory label is attached

  • Viewers may choose to hide, report, or engage with content thoughtfully

4. Appeal and Oversight

  • All flagged content can be appealed

  • Human reviewers trained in ethics review each case

  • AI decision-making is transparent and available for scrutiny

5. Hard Threshold for Illegal or Dangerous Content Some content must be removed immediately and cannot be published under any condition. This includes:

  • Verified illegal material (e.g., child exploitation, terror propaganda, threats of violence)

  • Clear and imminent incitement to violence

  • Content explicitly designed to cause harm or violate platform or legal safety standards

For such content:

  • No option to edit or post is provided

  • AI issues a clear explanation and cites relevant policy or legal standard

  • Content and metadata are quarantined for audit purposes

  • If criminal in nature, the platform reports to appropriate authorities, even if the content was never posted. This includes mandatory reporting of child exploitation material, as required by law. In 2024 alone, over 36 million such reports were filed globally.

  • An appeal process exists, but the default action is immediate suppression and referral


III. Platform Integrity Measures

  • Transparency Portal: Public access to all moderation rules and the ethics behind them

  • Graceful Correction: Users can revise or delete content without punishment or shame

  • Propaganda Safeguards: Moderation training and data screening guard against misinformation, manipulation, and biased framing

  • Protection of Diverse Voices: Disagreement is welcome; only speech that causes harm is moderated


IV. Platform Message to Users

"Speech is power. Use it as if every person matters — because they do."


V. User Response to Perceived Harm

If a user encounters content they find offensive or harmful, they are offered a respectful pathway to respond:

  • Flag and Explain: The user may flag the content and describe — in their own words — why they found it troubling.

  • AI Acknowledgment and Clarification: The AI responds by explaining why the content was not automatically flagged, while respectfully acknowledging the user's experience.

  • Offer of Anonymous Logging: The user is asked:

    "Would you like to anonymously share this flag and explanation with the platform's ethics team to inform future policy adjustments?"

    • If accepted, the data is anonymized and logged.

    • Users are informed that while not all cases receive individual review, all are weighted using transparent criteria and can influence platform-wide ethical refinement.

  • Personal Content Controls:

    • Users may choose to block the individual post, the user who posted it, or all content matching similar categories or patterns.

    • Settings are customizable, respectful, and clearly explained.

This process ensures both dignity and protection for those affected by harmful speech, fostering a culture of mutual responsibility and continuous learning.


Note: This policy expresses ethical reasoning and universal principles of responsible communication. It does not replace legal compliance or cultural sensitivity, but aims to create a safe and respectful digital public square.





Buy EoZ's books  on Amazon!

"He's an Anti-Zionist Too!" cartoon book (December 2024)

PROTOCOLS: Exposing Modern Antisemitism (February 2022)

   
 

 



AddToAny

Printfriendly

EoZTV Podcast

Podcast URL

Subscribe in podnovaSubscribe with FeedlyAdd to netvibes
addtomyyahoo4Subscribe with SubToMe

search eoz

comments

Speaking

translate

E-Book

For $18 donation








Sample Text

EoZ's Most Popular Posts in recent years

Search2

Hasbys!

Elder of Ziyon - حـكـيـم صـهـيـون



This blog may be a labor of love for me, but it takes a lot of effort, time and money. For 20 years and 40,000 articles I have been providing accurate, original news that would have remained unnoticed. I've written hundreds of scoops and sometimes my reporting ends up making a real difference. I appreciate any donations you can give to keep this blog going.

Donate!

Donate to fight for Israel!

Monthly subscription:
Payment options


One time donation:

Follow EoZ on Twitter!

Interesting Blogs

Blog Archive