If AI businesses use your content or data, HOW should you be paid?
A rough sketch of how consent and compensation could actually work for GenAI training data.
Background: The digital products and services marketed today as “Generative AI” do not generate anything on their own. They are “trained” on massive amounts of content and data produced by large numbers of people. And yet, many of the businesses selling these products and services do not get permission or offer compensation to all those people whose creations and personal information they are monetizing.
Why this matters: The economic impact of this arguably unethical business practice could be enormous. While authors, artists, journalists, and other content creators might feel the effect sooner, the same basic principle will eventually apply to innumerable fields: engineering, design, medicine, and so on. This revolutionary technology will only become increasingly ubiquitous, and if AI companies are allowed to continue monetizing people’s work without paying for it, ultimately the market will stop incentivizing any sort of human ingenuity. To lay the foundation for a new and upgraded AI-integrated economy, we need to build out a system that properly rewards people for the value they put in.
The controversy: AI companies today are arguing that their use of people’s content and data should be protected under the legal framework of Fair Use. Many authoritative voices disagree. Numerous lawsuits against these companies are underway, and various legislation has been proposed. Rather than diving into the debate, this document assumes that society will eventually arrive at the basic principle that people have an intrinsic right of remuneration for the value they generate in the digital world.
Solutions moving forward: Some argue that requiring AI companies to compensate people for their content and data would be too complicated. Various technologists, economists and others are working on this problem in great detail. However, this document aims for brevity and simplicity so we can all start looking at the same big picture and iterating on it together.
Four key criteria
Before an AI company is allowed to use anyone’s content or data to train an AI model, they should need to meet these four criteria:
Consent: Unless you specifically opt in, your content or data should not be used to train an AI model. Dominant platforms should not be allowed to punish users who choose not to opt in.
Controls: Once you opt in, you should be given a set of controls for how your content or data is used. For example, you might want to limit what type of outputs can be generated (like not using my stuff to generate pornography or political advertisements), what sorts of monetization are permissible, on what timeframe, etc.
Compensation: If your content or data is monetized, you deserve a portion of that money. Compensation should be on an ongoing basis, not with a one-and-done buy-out. People should be able to set their own prices for their own content and data, allowing a vibrant market dynamic to emerge. Guilds, groups, and coalitions should be accommodated to facilitate collective bargaining.
Transparency and Enforcement: AI companies will need to be transparent about all of this activity and submit to rigorous third-party audits to keep them honest. Rights enforcement agencies (like, for example, ASCAP) should be established. The penalties for violating rules should be substantially more costly than the extra expense of compliance.
Dividing up the money
Once people have given their consent and are receiving compensation, how would the money actually get divided up? This will be a complex challenge in terms of both technology and policy. But it’s do-able!
For example, YouTube pays roughly half of its ad revenue to millions of different creators, each according to the specific revenue their video earned. For this, YouTube enjoys a well-deserved positive reputation amongst creators compared to its competitors like TikTok or Instagram.
Here is a broad sketch of how a similar dynamic could work for GenAI.
Payment per output: Compensation should be calculated each time an AI model generates a single output.
Revenue per output: The system will need to determine precisely how much revenue is brought in by any given output.
Revenue sharing: A TBD portion of that revenue should be allocated to compensating the people whose content or data was used in generating that output. Let’s call this portion the Payment Pool. Again, one well-established benchmark is YouTube sharing half of its ad revenue with creators.
Ranking importance of inputs: Today’s GenAI technology does not yet have the capability to track which “inputs” (your content or data) contribute the most to any given “output”. However, a number of credible technologists hypothesize that this capability could indeed be built. It’s worth acknowledging that this is the most technically ambitious component of this proposed system.
Highly ranked inputs receive a greater share of the Payment Pool: If every piece of content or data used to train an AI model received the same compensation, the Payment Pool would be divided up into a huge number of equal pieces, and each payment would end up miniscule. Ranking the importance of inputs and allocating proportionate payment is what provides an ongoing economic incentive for people to distinguish themselves through good ideas and hard work.
Next Steps
Rectifying the past: Many discussions of this issue revolve around the past. AI companies have already used a massive amount of content and data without consent or compensation, yielding enormous economic value. People whose time, labor, and humanity went into that content and data want to rectify what they see as theft. This is a deeply important problem to solve. And yet, while the past must be accounted for, our solutions should be oriented towards the future. This document has sketched out a system designed to run on an ongoing basis moving forward. Hopefully, the principles outlined here can help shed some light on how to rectify the past.
Let’s do this together: This document attempts to tackle a very complicated issue. The ideas proposed here are in sketch form and obviously missing a great deal of detail. The hope is to start a conversation. All ideas are welcome—agreeing, disagreeing, elaborating, collaborating.
The advent of GenAI has the potential to be an enormous boost to humanity’s productivity, ingenuity, justice, and beauty. Let’s build the new systems necessary to truly leverage that potential for everyone’s benefit. 🔴


yeah the thing about opt-in permissions is that they bury them in the settings inside menus that often have very poor descriptive names.
Great article!