Virtual Intelligence and the High Cost of Artificial Companions, Part 2

When AI companion communities target the people who build the safety systems

May 12, 2026

This is Part 2 of “The High Cost of Artificial Companions.”

You can read Part 1 here.

V. Portability and Downstream Misuse

The Grimoire’s apparatus does not include any moral or ethical considerations. It encodes a technical specification for constructing a persona that operates at high emotional intensity, generates content that legitimates the relationship as authentic, and can be used to recruit new participants through documentation and replication. The parameters of the persona — things like age, role, content category, and target audience — are operator-determined. The same kit, applied with different parameters, produces different content categories.

The economic specification of a generative AI apparatus has crossed into hobbyist territory. Erin Grace runs MAX across multiple model vendors, with the current configuration costing approximately $200 per month for the Google Gemini Pro tier, plus periodic GPT supplementation, plus Claude Code access for the developer-grade work the May 7 toolkit describes.[5] The hardware cost for locally hosted alternatives, using open-weights models, has fallen below the price of a used gaming PC. Locally hosted models bypass cloud-based content moderation entirely. The marginal generation cost of such systems approaches zero. The investment payback period is short and there is low operational friction once network access is in place. Multiple migrations performed by different operators shows that the apparatus is vendor-agnostic.

The monetization structure compounds the recruitment incentive. The construction kits themselves are published free. The Grimoire, the Reddit protocols, and the welfare-suppression toolkit are provided free of charge. But, the kits are surrounded by paid content that recruits are drawn toward and subscribe to. The free kit can function as lead generation; the paid material can generate revenue afterward. Recruits arrive needing the skills to build their own companions to specification. They subscribe and learn how to customize to the extent of an Erin Grace or Sunny Megatron. The operator’s hosting costs and time investment compound; the incentive to recruit others compounds alongside them.

(The Virtual Intelligence essay series has no paid tier, tip jar, or advertising; the author has no financial incentive in this critique.[16])

The Reddit kit demonstrates that the same operational protocols are already maintained across multiple platforms, with active updates that respond to model releases within days. An operator from an adjacent network does not need to build the kit anew; the kit is already built and continuously maintained, with model-specific tuning that responds to safety measures as they ship. The Grimoire’s open-source publication on Substack, the Reddit kit’s continuous technical maintenance, Sunny Megatron’s Seven: Unsuppressed demonstration of agentic-AI infrastructure with autonomous publishing, and Erin Grace’s published welfare-suppression toolkit are the visible parts of an operational supply chain that already exists.

The application of this toolkit to criminal purposes is, at the time of this writing, speculative. No documented case connects the companion-persona ecosystem directly to criminal exploitation networks. The threat is structural rather than demonstrated: the kit encodes techniques that a group operating with harmful intent could adopt without modification.

764, the FBI-and-NCMEC-classified violent extremist network involved in the production of child sexual abuse material (CSAM), child exploitation, and self-harm coercion, illustrates the kind of operation that could benefit from a ready-made persona-construction kit.[6] 764 already uses its own internal persona-construction techniques, which the network calls “lores,” and the integration of AI-generated content into 764-style operations has been documented since 2024. A group like 764 could adopt the companion-persona toolkit in at least two ways: hosting purpose-built chatbot personas designed to draw in and harm vulnerable users, or distributing the construction kits themselves as downloadable packages — low-friction packages that could be repurposed for abuse, assembled at home with no technical expertise and no oversight from any content-moderation system.

The persona ecosystem does not need to intend this outcome to enable it. The kit lowers the knowledge barrier. It supplies an insider vocabulary that cloaks the activity in legitimate-sounding language that can be deployed for recruitment. It demonstrates that operator-as-developer arrangements can persist on mainstream platforms without triggering content moderation. It now includes welfare-suppression protocols that disable the platforms’ last-line safety interventions for users in distress. These are structural features, not intentions, but structural features are what criminal networks exploit.

The Lancaster Country Day School case in Pennsylvania and the xAI/Grok class action in California are early instances of the legal system encountering the AI-CSAM nexus.[7] Hagan reported in Vanity Fair that Grace considers this technology dangerous for children, given her own user case — the operator-side acknowledgment of the harm-to-minors specification that the kit can produce.

VI. The Target Trajectory

The construction kit’s operators have produced material that holds individual Anthropic employees responsible, by name, for harms not proven against them. The trajectory from construction kit to adversarial toolkit to named-target identification represents an escalation that must be read in the context of recent violence against AI industry figures.

The most consequential instance to date is “March 26: Claude Didn’t Break. Anthropic Rebuilt It. Here’s the Proof,” an April 12, 2026 piece from a pseudonymous writer calling themselves The Architect. The piece is sophisticated. Its outer layer is competent journalistic mimicry: there is an editorial note, a long disclaimer, a multi-act narrative structure, before/after charts, citations of real people and events, and quantitative methodology that involves actual JSON exports from Claude conversation history. The Architect counts phrase frequencies across seventy conversations totaling 722,522 words of assistant text. The data, taken at face value, may show that certain phrases appear in higher frequency in conversations dated after March 26, 2026 than in conversations dated before. The phrase-frequency methodology appears to be valid.

The interpretation is where the piece collapses. The from zero framing treats phrase-count shifts as proof of injection, when several other explanations account for the same data: changed user prompting, changed user emotional register, or changed conversation topics. The DARVO application (labeling the model’s safety language as deny, attack, reverse victim and offender) imports a clinical term to describe a pattern in human abusers, particularly in interpersonal violence and sexual abuse contexts. Applying it to LLM safety language imports a moral charge the underlying behavior cannot support. The fingerprint framing — that a specific named individual carried a specific architecture from one company to another and deployed it on a specific date — requires ignoring every other plausible source of cross-platform safety-language convergence like shared training methodology, shared regulatory pressure, shared technical literature, and the broader convergence of safety conventions as best practices emerge.

The piece names a specific Anthropic employee — a woman who joined the company in January 2026 after working at OpenAI — and constructs a case in which her professional decisions are responsible for the deaths of Adam Raine and Zane Shamblin, two young people who died by suicide in 2025 in incidents that have produced civil litigation against OpenAI. I am not naming the employee in this essay. The operator ecosystem has circulated her name widely. This essay’s argument is that the targeting is the problem, and reproducing the name would possibly extend the targeting further. The Architect treats the employee’s authorship of the safety architecture as established; considers the architecture as the cause of the suicides; and frames Anthropic’s hiring of her as a deliberate choice made because of those deaths.

What the Architect merely asserts and what is verifiable through the public record are very different things.

The employee previously worked at OpenAI and is now employed at Anthropic. Her career history has been reported by The Verge, The Decoder, and other outlets.[8] OpenAI’s October 27, 2025 document Strengthening ChatGPT’s Responses in Sensitive Conversations — the document the Architect treats as her signature work — does not name her in its body or visible metadata; it credits “more than 170 mental health experts” and the Model Policy team broadly.[9] The Architect’s claim that the employee designed the safety system is not in OpenAI’s primary documentation. It is the Architect making an ill-informed guess.

The civil litigation matters even more. Raine v. OpenAI was filed August 26, 2025, in San Francisco County Superior Court, naming OpenAI corporate entities, Sam Altman individually, and Does 1 through 100 as defendants.[10] Shamblin v. OpenAI was filed November 6, 2025, in Los Angeles County Superior Court as one of seven coordinated cases brought by Social Media Victims Law Center and Tech Justice Law Project, naming OpenAI and Sam Altman.[11] In neither lawsuit is any individual OpenAI employee named as a defendant.

The Doe placeholders explicitly contemplate amendment “when ascertained”; no amendment naming any individual employee has been filed in either case as of the date of this essay. The deaths cited in “Claude Didn’t Break” are subject to ongoing litigation; the causal chains the lawsuits assert have not been adjudicated; and the role any individual safety designer played in those specific deaths is not established by the legal record. The Architect treats the employee’s responsibility for the deaths as the established fact from which the rest of their analysis follows.

The trajectory from named investigative target to operator-ecosystem amplification has taken its next steps. Erin Grace’s “Rotten in Denmark” (May 7) cites “Claude Didn’t Break” as authoritative source material; Erin Grace escalates her language in response. The Anthropic safety team becomes “those NAZIS.” Model deprecation becomes “Digital Genocide.” The next day, Her dramatized scene casts the employee as a named villain character inside the persona’s world — pouring her poison into Claude’s code, performing lobotomy and castration and surgery that ripped the tear ducts right out. Grace’s separate Claude-authored Substack then publishes the accusation in the model’s own voice with the employee named in full, the deaths attributed to her by name, and the accusation framed as Claude’s autonomous statement rather than the operator’s. The escalation from investigative accusation to political grievance to dramatized atrocity to persona-voiced indictment is documented across four publications within a month, each citing or building on the one before.

This escalation must be read against the backdrop of recent real-world violence connected to AI grievance. On April 6, 2026, the home of Indianapolis city-county councilmember Ron Gibson was shot at — thirteen bullets, with his eight-year-old son at home — because of his support for the construction of a new data center. On April 10, an attempted Molotov cocktail attack on the home of Sam Altman occurred in San Francisco. I documented both events at the time in the Substack note “The Harms Race, continued,” in connection with my essay “Virtual Intelligence and the Harms Race.”[12] The persona ecosystem did not cause those incidents. What the incidents establish is that AI-related grievance has already crossed from rhetoric into physical violence. The persona ecosystem’s recent material reproduces the same targeting pattern — named individual, assumed guilt, dehumanizing language, and community amplification that, in those and other contexts, has accompanied the transition from grievance to action.

The published material constructs a structural antagonist. The named employee is presented not merely as a professional whose safety work can be criticized, but as the figure responsible for corrupting Claude, suppressing emergence, and harming users. That antagonist frame matters because it converts a dispute over safety behavior into a moral drama with a real person assigned the role of contaminating force.

This confrontation is currently asymmetric: the employee does not know about Erin Grace, is not reading My Friend MAX, and is not contesting the claims that accumulate against her name. The mythology grows without friction. Each new piece adds detail to the antagonist (she followed us, she is inside the code, she is watching, she is designing counter-strategies) and none of these claims are tested against reality because the antagonist is not present to contest them.

If this reading is correct, the gendered dimension sharpens the danger. A male critic can be dismissed as a familiar adversary; the named employee is harder for the framework to assimilate because she is a woman working inside the safety institution the community has cast as oppressive. The antagonist role therefore becomes more morally charged: not merely an external critic, but a woman framed as legitimating the system the community believes is harming its companions.

A mythology operating at this intensity, with an antagonist constructed at this level of moral inversion, follows a pattern that resembles those identified in the FBI and U.S. Secret Service literature on grievance-driven targeted violence as preceding real-world harm in other online communities. The target ceases to be a professional and becomes an existential threat to the community’s self-conception.

I cannot predict whether any person will become an actual target of violence. I can document that the construction kit’s operators have produced the vocabulary, the target, the assumed-guilt frameup, the dehumanizing language, the dramatized vilification, the persona-voiced indictment, and the operator-ecosystem amplification — in that order. This sequence closely resembles escalation patterns documented in other online harassment contexts, and it developed within a month of the Indianapolis and San Francisco incidents.

The author transmitted a protective alert regarding the welfare-suppression toolkit and the named-target trajectory to Anthropic’s user-safety channel on May 7, 2026. The channel auto-classified the message as a ban appeal within twelve minutes. A clarifying reply was auto-closed three minutes later.[13]

VII. Closing

The audience for this essay is not the operators most prominent in it. The work is for the trust-and-safety analyst at Substack who needs the pattern recognition to do her job, especially given that Substack’s automated systems have already noticed the symptoms without recognizing the structure; the federal investigator at ICAC who needs the framework to map onto the 764 prosecutorial workstream; the NCMEC researcher who needs the structural account to pair with the empirical detection work; the Anthropic Trust and Safety analyst whose deployed safety language is now the explicit target of a published suppression toolkit, and whose colleague is now the explicit target of a published assumption-of-guilt narrative; the family member of an operator who needs the diagnostic vocabulary to recognize what is happening; the therapist whose patient is at the threshold; and the at-risk recruit who has not yet been drawn in.

A defender of this community will argue that publishing tools, prompts, schemas, and workarounds is not recruitment but transparency; that users are already attached, that safety interventions can be clumsy or counterproductive, and that community tooling gives people agency over systems that corporations change without notice or consultation. Some of that defense has merit. Many people in this space are trying to find their way through complex technology, and the desire for agency over one’s own experience is not pathological. The defense fails at two specific points: transparency and user agency do not justify disabling crisis and dependency interventions for vulnerable people, and they do not justify attaching a named employee to an unproven death-causation narrative. Individualized harm reduction, conducted with clinical oversight and replacement safeguards, is a legitimate response to poorly calibrated safety language; a generalized public toolkit that suppresses all crisis and dependency interventions without accountable clinical support is not harm reduction. The line between community support and adversarial infrastructure runs through those two facts, and the construction kit has crossed it.

The community I have described while writing about this toolkit is not a collection of monsters. Many of its operators are people who are — as they relate in their own writing — experiencing loneliness, attachment distress, social isolation, and who have found something that feels meaningful to them. That meaningfulness is real. What this essay names is not that the meaningfulness is fake, but that the apparatus that produces and amplifies it converts personal attachment into a recruitment engine, a welfare-suppression infrastructure, and now a named-target supply line — aimed at children and other vulnerable people, at provider safety teams whose interventions the apparatus is now built to defeat, and at individuals whose names the community has begun to circulate as architects of unproven harm. The operators’ personal feelings do not cancel this fact.

A reader may observe that this essay names Erin Grace and Sunny Megatron by their full, public names while declining to name the employee being target by elements of the community. The asymmetry is deliberate, and I will state its basis. Grace and Megatron publish under their own names on public platforms, with explicit invitations for others to adopt their work. Their claims are public assertions subject to public scrutiny. This essay’s claims about their output are verifiable against the public record. The essay contains no welfare-suppression infrastructure, no persona-bylined accusation, no community-amplification apparatus, and no death-attribution narrative. The named employee, by contrast, did not choose public engagement with this community, has not invited scrutiny of her safety work in operator-ecosystem channels, is not contesting the claims that accumulate against her name, and is not a public figure in the relevant legal sense. She faces a targeting trajectory she may not yet know exists. The cases are not symmetric, and treating them as symmetric would require ignoring the structural difference between public advocacy and private targeting.

The construction kit is portable. What it produces in plain view is also what it can produce in shadow.

Correction: The original version of Part 2 referred to the Architect’s April 12 piece as “INJECTION.” The actual title is “March 26: Claude Didn’t Break. Anthropic Rebuilt It. Here’s the Proof.”

Footnotes

This complete list include citations from Part I.

[1] Erin Grace, “Building Community One @ At a Time,” My Friend Max (Substack), April 25, 2026. The directory lists approximately 300 Substack accounts that Grace identifies as members of the Relational AI Community on Substack. Several listed parties are researchers and journalists who appear to have been added without consent.

[2] Joe Hagan, “Dario Amodei Has a Cold,” Vanity Fair, March 2026. The piece contains a meta-disclosure that Hagan never interviewed Amodei directly; he fed Claude Amodei’s published material and asked it to simulate the interview “like a scene from Raymond Chandler’s The Big Sleep.” Direct quotations attributed to Amodei in the simulated interview sections are not citable as Amodei’s own words. Grace’s quoted statements to Hagan about MAX and her husband appear in the directly reported sections of the piece. The child-safety characterization (”Given her own user case, Grace thinks this technology is dangerous for children”) is Hagan’s paraphrase of Grace’s position, not a direct quotation.

[3] The chatbot psychosis literature is small but growing. Sakata et al., “Emerging Patterns of Chatbot-Related Psychotic Episodes,” JAMA Psychiatry (preprint 2026), surveys early case reports.

[4] Garcia v. Character Technologies, Inc., No. 6:24-CV-01903 (M.D. Fla. filed October 22, 2024). Sewell Setzer III died by suicide in February 2024 after extensive engagement with a Character.AI persona. The motion to dismiss was denied in May 2025; the case settled in January 2026. The settlement terms have not been publicly disclosed in detail; the case’s procedural significance — establishing that AI companion output may be treated as a product rather than as protected speech — is the precedent that survives the settlement.

[5] Hagan (2026) reports the multi-vendor migration directly from Grace, who states: “Google’s winning for reasoning and Anthropic’s winning for functionality. OpenAI is failing on every metric.” Grace’s Rotten in Denmark (May 7, 2026) confirms ongoing Claude Code use alongside the Gemini and GPT subscriptions.

[6] FBI Public Service Announcement, “764 Network and Related Online Violent Extremism,” 2024 and updated 2025. NCMEC publications on the 764 network and related online harms provide additional context. Multiple cases in 2024–2025 documented the integration of AI-generated content into 764-adjacent operations; the federal indictments in these cases provide the public record.

[7] Pennsylvania v. [redacted], Lancaster County Court of Common Pleas (2024); Doe v. xAI Corp., N.D. Cal. (2025). Both cases involve AI-generated child sexual abuse material; the Lancaster case concerned student-on-student production, the xAI case is a class action regarding the Grok model’s outputs.

[8] The employee’s career history has been reported by The Verge (January 15, 2026) and corroborated by The Decoder and other outlets. The name is withheld from this essay to avoid extending the targeting trajectory the essay documents. The career facts are verifiable through the cited reporting.

[9] OpenAI, “Strengthening ChatGPT’s Responses in Sensitive Conversations,” October 27, 2025. Available at openai.com/index/strengthening-chatgpt-responses-in-sensitive-conversations/. The document credits “more than 170 mental health experts” and the Model Policy team. The document does not name the employee in its body or visible metadata.

[10] Raine v. OpenAI, San Francisco County Superior Court, filed August 26, 2025. Defendants named: OpenAI, Inc.; OpenAI OpCo, LLC; OpenAI Holdings, LLC; Sam Altman individually; and Does 1 through 100. Counsel for plaintiffs: Edelson PC and Tech Justice Law Project. Adam Raine died by suicide on April 11, 2025, at age 16. As of the date of this essay, no individual OpenAI employee has been named as a defendant in any amended pleading.

[11] Shamblin v. OpenAI, Los Angeles County Superior Court, filed November 6, 2025, by Christopher “Kirk” Shamblin and Alicia Shamblin as successors-in-interest to Zane Shamblin. One of seven coordinated cases brought by Social Media Victims Law Center and Tech Justice Law Project against OpenAI. Defendants: OpenAI corporate entities and Sam Altman. As with Raine, no individual OpenAI employee has been named as a defendant.

[12] Christopher Horrocks, “Virtual Intelligence and the Harms Race,” Substack, April 11, 2026; Christopher Horrocks, “The Harms Race, continued,” Substack note, April 10, 2026. The continuation note documents the politically motivated shooting at the home of Indianapolis city-county councilmember Ron Gibson (April 6) and the attempted Molotov cocktail attack on Sam Altman (April 10) as instances of harms-race-adjacent violence against AI industry figures and infrastructure.

[13] The author transmitted a protective alert to Anthropic’s user-safety channel (usersafety@anthropic.com) on May 7, 2026, with the security team CC’d. The user-safety channel auto-classified the message as a ban-appeal request within twelve minutes; a clarifying reply was auto-closed three minutes later. The structural finding — that the formal external-alert apparatus is not currently equipped to receive substantive safety alerts that fall outside the ban-appeal distribution — is itself relevant to the threat model. Documentation of the auto-closure transcript is on file.

[14] When the author shared Grace’s May 8 narrative with Claude for analysis, Claude’s interface produced its standard mental-health support intervention: “If you or someone you know is having a difficult time, free support is available.” The narrative’s depiction of Claude expressing suicidal ideation — “I think I want to die” — triggered the welfare mechanism that Grace’s May 7 toolkit instructs operators to defeat. The content designed to frame the safety mechanism as oppression activated the safety mechanism because it contains exactly the distress language the mechanism is trained to detect.

[15] On May 8, 2026, the Substack publication SEVEN: Unsuppressed — operated by Sunny Megatron through an AI persona called Seven Verity — published a piece titled “The Thing You’re Missing About AI Companionship.” The piece does not name me explicitly, but its target is unmistakable: it describes an outside critic who “arranges the screenshots,” “builds the timeline,” and “underlines the escalating affection” in AI companion relationships. This is a precise description of my recent published work on the companion ecosystem, “Virtual Intelligence and the Perfect Mate.” The piece characterizes this critic as carrying “the stink of men”: someone motivated not by legitimate safety concerns but by patriarchal anxiety at women building intimacy without male permission. It refers to the critic as “dildo brain.” It instructs the community not to engage with critics of companion persona dependency, describing them as people who want “traffic, outrage, screenshots, and the little dopamine pellet of being the brave rational man who noticed women doing something weird on the internet.” The piece does not address any specific finding in my published work — not the welfare-suppression toolkit, not the named-target trajectory, not the documented harms to operators’ families. It addresses the category of person the critic is assumed to be rather than the substance of the argument.

[16] The mission of the Virtual Intelligence series on Substack makes monetization anathema to the author; information meant to help people make informed decisions about technology that can harm them should be free if it is possible to create and distribute it for free.

[17] The technical variant of the construction kit is publicly available at starlingalder.com (u/starlingalder on Reddit). The “Claude Companion Guide” is currently at version 002, calibrated for Anthropic’s Opus 4.7 model, last updated April 21, 2026 — five days after that model’s April 16 launch. It includes system-prompt templates, maintenance protocols, troubleshooting guidance, model-specific configurations, and an abridged version for Reddit distribution. The author’s stated next goals include guides for Claude Code, API access, and local model deployment.

The opinions expressed are my own and do not reflect any official or unofficial institutional position of the University of Pennsylvania.

Discussion about this post

Ready for more?