When participation becomes raw material, from OEDP's Fieldnotes on AI
Reflecting on participation as a site for extraction, Senior Fellow Shannon Dosemagen writes about how stewardship can no longer end at collection or sharing as data becomes training material for AI.
This post is part of a series titled “OEDP’s Fieldnotes on AI”, where we offer reflections about AI in environmental and participatory contexts. In our final series, this week’s reflections center on the question:
What does it mean to care for data and communities when participation itself can become a site for extraction?
We turn to refusal, stewardship, and obligation in the era of AI models. We question what “open stewardship” means when data becomes training material and models become intermediaries.
Open Environmental Data Project first started with acknowledging a gap. Community science and environmental data collection efforts were proliferating, but we still lacked strong translational roles, organizations, and practices that could bridge institutions and communities while maintaining the space of environmental data stewardship. Some of these roles certainly existed, but the work itself was often inconsistently recognized and resourced. Translation in environmental governance is not just about moving information between parties. It requires navigating unequal power relationships, protecting context, building trust, and helping people retain agency over how their knowledge is interpreted and used.
I’ve often referred to these organizations and people as “intermediaries,” whose work is focused on the human role of interpreting and facilitating data exchange between different parties. Before the AI conversation inundated the participatory landscape, developing these translational and stewarding roles was difficult to achieve, but relatively straightforward to imagine. You can see this tension in the work of OEDP’s data stewardship handbook and in many environmental justice and participatory science projects that have attempted to balance openness, accountability, and protection at the same time. Sometimes these roles are meant to help people make decisions for their own sake. Sometimes they are meant to support multiple parties in creating agreements. And sometimes they are meant to provide protections and autonomy to communities who want to share their data while trying to point to a bigger issue.
But a lot of the decisions being made around community-generated environmental data still largely exist at the levels of research, community engagement, or education. The data might support advocacy or public awareness campaigns, or it might shape a local policy discussion. But communitycommunit data often remains connected to the context in which it was gathered and interpreted. The relationship between the people generating the knowledge and the claims being made from it is still relatively visible.
AI systems are beginning to change that relationship. The challenge is not that more data are being collected or shared, but that the distance between sharing data and governing how that data acts in the world is collapsing. Community environmental data is no longer only being used to support a particular claim or intervention. Increasingly, it becomes substrate for systems that generate new inferences, rankings, recommendations, and decisions, far beyond the original conditions and intentions of collection.
For translators and stewards, this creates a very different problem space. From the moment of deliberate sharing to the moment an AI system absorbs data into a model, there is a letting go that happens, which is not always intentional. Once data becomes part of a model, there is often no clear way to understand how it will later be interpreted, surfaced, or operationalized.
And the implications are not abstract. We can easily imagine a legislative assistant using AI tools to help weigh the risks and benefits of an environmental decision. Or a regulatory agency relying on AI-generated synthesis to prioritize interventions. Data that once functioned as situated evidence within a community may instead become generalized training material feeding systems that shape future policy recommendations.
It’s ironic, when you consider the question of legitimacy. Historically, community-generated knowledge has often been dismissed as anecdotal, emotional, or political—until it becomes validated by institutional actors. AI systems may place a veneer of technical authority over community-generated data that suddenly makes it more legible or persuasive to decision-makers. But what gets lost in that translation? What context disappears when lived experience becomes machine-readable inference? What does it mean when institutions cannot hear people directly, but can hear their experiences once translated into machine-readable form? What obligations remain to the communities who generated the knowledge in the first place?
This is part of why I found myself nodding along when Sylvie Delacroix described “data stewards” as one of the missing careers of the twenty-first century. To really crack the nut on stewardship in the age of AI, we need to think much more seriously about how people are trained to use data. And I’m not talking about communities creating data—I’m talking about AI developers, computer scientists, data analysts, procurement officers, consultants, and end users of AI systems in state and local government. Increasingly, these actors are becoming governors of environmental knowledge systems whether they recognize it or not.
While most technical systems are still designed as though mediation, negotiation, and contextual interpretation are inefficiencies that automation can remove, the more environmental knowledge becomes intermediated through AI systems, the more important human translation becomes, not less.
This is where intermediary roles and ombuds models become increasingly important. Colorado’s Environmental Justice Ombudsperson role, for example, exists in part because environmental governance is not only a technical problem. It is also a problem of accountability and attempts at conflict resolution. Communities need mechanisms that help them navigate institutions while also preserving autonomy and recourse. We need more roles that recognize stewardship not just as maintaining data infrastructure, but as maintaining relationships, obligations, and consent over time. And translators become all the more important when we start thinking about negotiation, exchange, and consent itself as infrastructure. That infrastructure includes refusal.
Part of the difficulty here is that many contemporary AI systems are not emerging from neutral conditions of knowledge production. Meredith Whittaker described AI as being “born out of surveillance,” pointing to the enormous infrastructures of extraction, tracking, behavioral monitoring, and data accumulation that underpin many large-scale AI systems. The concern is not just about privacy in the narrow individual sense, but about the way contemporary technical systems normalize continuous observation and treat human behavior, communication, and participation as material to be captured and operationalized.
That framing matters for environmental and participatory science communities because so much of this work has historically depended on trust, voluntary participation, and negotiated forms of openness. Communities often participate because they believe they are contributing to accountability around a shared environmental (and/or health) issue. But when participation itself becomes raw material for systems built on extractive logic, the meaning of consent starts to shift. The question is no longer only whether people agreed to share data, but whether they meaningfully understood the downstream systems their participation was feeding.
In participatory science and open source communities, participation is often framed as inherently good. Sharing is progress, and openness is framed as virtue. While I deeply believe in the importance of participation and collective knowledge production, stewardship in the age of AI may increasingly require us to create conditions where communities can also withhold participation, redirect it, or weigh its terms.
Carole McGranahan writes about refusal not as simple rejection, but as a political and generative act, one that creates the possibility for different relationships, obligations, and futures to emerge. Refusal, in this framing, is not withdrawal from engagement altogether. It is a way of asserting that participation cannot be assumed, and that communities retain the right to shape the terms under which they are known, represented, or incorporated into larger systems. Refusal becomes a way of maintaining autonomy and protecting the possibility of alternative political arrangements rather than accepting the structures already on offer.
That framing feels increasingly important in environmental and participatory science contexts. Refusal is not only obstruction. It can be a way of preserving autonomy, slowing processes, protecting context, and keeping alternative futures imaginable. It can also be a way of insisting that consent is ongoing rather than part of a single transaction, where communities retain some ability to shape not only whether their data is collected, but how it travels and mutates once it enters technical systems.
This is perhaps the hardest shift for many of us who came from traditions of open science, open data, and participatory technology. We may be entering an era where stewardship no longer means just helping information move more freely. It may also mean helping communities understand what sits on the other end of systems that increasingly use participation itself as raw material.
If AI systems are becoming intermediaries between people and environmental knowledge, then stewardship cannot end at the moment of collection or release. We won’t be able to think about consent as a box checked or a singular agreement. Governance will have to span full lifecycles to persist as data moves through institutions, models, interfaces, and decisions. The future of open stewardship may depend less on how much information we can circulate, and more on whether communities retain the ability to partake in the terms under which knowledge is transformed into mechanisms of action and levers of decision-making.I have three graduations this summer. Two niblings are leaving high school, one is leaving college. (My money is also leaving my pocket.) But this period has also made me reflect a lot on a time when they were little, and a younger version of myself was entrusted to keep them whole and within eye shot. I joke that it’s a miracle that we’re here—not because I/we didn’t take our job seriously, but because it seems kids do not come with a self preservation mechanism built in. But we made it. The kids were stewarded. And now I’m the one in the passenger seat.. I often remind them that the responsibility to get me home in one piece is on them now. They hold the keys.
Stewardship on its own is relatively straightforward. It’s the careful and responsible management of something entrusted to one’s care. In the case of data it typically means that you collect data, you share it, you make it discoverable and reusable. The open science movement pushed institutions toward transparency, toward FAIR principles. But things get complicated, largely because we’re humans. Careful management also means dealing with corporate entities that can easily discover, reuse, and monetize data they never collected. Responsible management also means ensuring that the data isn’t misused by bad actors. AI models ingest, they then predict and transform, then learn from individual interactions, and the owners of the model benefit. But what about the community whose knowledge fed into it? Arguably, without them, there would be no AI model.
However, when we talk about what developers and institutions owe the communities they rely on, we tend to reach for the word consent. Did people agree to have their data used? But consent assumes that people knew what they were agreeing to, that the terms were legible, that the power between the person clicking “I agree” and the institution holding the data was roughly equal. Yet, when you have to use their platform to submit all your documents for your new job, is there much of a choice?
David Kernohan sees this as a structural gap: “when you’re ill, you rely on doctors whose professional obligations require them to prioritise your wellbeing. When you need legal representation, lawyers are bound by duties to act in your best interests. But when your health data, location history, or browsing patterns are being negotiated for use in AI training, algorithmic decision-making, or commercial exploitation – who represents you?” We could say it’s the responsibility of data policies such as GDPR, but we all know HIPAA wouldn’t be as effective without the people responsible for ensuring that it’s enforced.
As the Citizens and Technology Lab put it, “those who create the tools of science exert a powerful influence on what can and can’t be known.” The questions researchers ask determine what gets measured. What gets measured shapes what is considered a problem. What is a problem determines who gets to contest it and how.
And that’s where refusal comes back into play. While we may not always have the power to do so, Ufuoma Ovienmhada writes that (when we can) there is an obligation to refuse to produce and present research that might endanger marginalized populations or entrench harmful narratives. Specifically, they describe refusal as “the act of withholding knowledge, withholding data, and withholding consent.” This may sound like the opposite of what research is supposed to do. Scientific knowledge is supposed to be disseminated or be FAIR. But that assumption is itself a choice, and it is not a neutral one. It assumes that the production and circulation of knowledge is always good, that more visibility is always better, that the communities whose lives are being studied always benefit from being studied. None of that is guaranteed. Refusal, understood this way, is not a failure of openness. It’s a protective act. It’s a recognition that some knowledge, in the wrong hands or at the wrong moment, does harm.
AI adds another layer to this. There is the data that helped create the model, but then there’s also the model that creates even more data (accurate or not). The questions the model can answer become the questions that count. And the model’s questions were set by whoever designed it, whoever funded it, whoever decided what the training data should look like and what the benchmark should measure. This is why we’ve found that it’s important that we make efforts to reduce harm to communities, in particular, develop data use and sharing agreements to be clear on who has the keys, when, and how that responsibility will manifest.




Appreciate the thoughtfulness with which you treat the topic. It feels like the realities of AI are rapidly changing not only data stewardship, but also the stewardship of informed consent I see moments of hope in tools that inform people who consent to participate in an activity or study whether their words, responses will be used to train AI and moments of cliff walking when the same respondents are informed about whether they can opt out (sometimes they can't). Right to be forgotten becomes really difficult to exercise as well.