OXA: An Open Research Container Format (for the Age of AI?)
The case for semantic packaging in scientific publishing
There’s a new scientific exchange format in the works. It’s called OXA — the Open Exchange Architecture — and it’s being developed by a coalition including Rowan Cockett from Curvenote, Tracy Teal at openRxiv, and others from the Continuous Science Foundation. I was invited to a meeting a few weeks back but couldn’t make it at the last minute. We caught up this week and walked through the proposal.
The basic idea is familiar: create a modern, open standard for packaging scientific documents — text, figures, data, code — so they can move between systems without friction. If you’ve worked in scholarly publishing, this sounds a lot like MECA.
The MECA Problem
MECA (Manuscript Exchange Common Approach) is the current NISO-blessed standard for moving manuscripts between systems. It came out of submission system vendors realizing that authors shouldn’t have to re-enter everything from scratch every time a paper gets rejected and resubmitted elsewhere. Good goal.
But MECA has an awkward dependency: it leans heavily on JATS XML.
JATS is the Journal Article Tag Suite — a comprehensive XML schema for representing published articles. It’s extremely thorough. Every major publisher uses it. PubMed Central has 8.5 million full-text articles in JATS. It’s the industry workhorse for archiving and interchange after publication.
Here’s the weird part: JATS is expensive to produce. Properly marked-up JATS typically requires professional typesetting. It’s what publishers create after they’ve accepted something, as part of the production process. So using it as the backbone of a pre-publication exchange format has always struck me as a bit oxymoronic. How many publishing operations produce JATS before something is published? Only the ones who can afford to, which means large publishers using MECA for internal transfers between their own imprints. For everyone else, it’s aspirational at best.
MECA ends up being useful mainly for transfers within a publisher’s ecosystem, not the broader preprint-to-journal-to-archive lifecycle it was meant to enable.
Enter OXA
OXA is trying to be a modern MECA. The proposal describes it as “a specification for representing scientific documents and their components as structured JSON objects” designed for “exchange, interoperability, and long-term preservation.”
A few things stand out:
JSON, not XML. The format is JSON Schema–based and designed to be web-native. It draws inspiration from JATS but also from unified.js and Pandoc AST — the kind of data structures that modern authoring tools like MyST Markdown, Stencila, and Quarto already work with internally.
Preprint-first thinking. The people driving this come from preprint servers (openRxiv stewards bioRxiv and medRxiv) and dynamic publishing tools (Curvenote). They’re not starting from the traditional journal production mindset. They’re starting from researchers sharing work early, iterating, and wanting their figures, notebooks, and data to stay connected to the narrative.
Modular and composable. Documents and components can link across projects. Figures, data, methods can be cross-referenced and reused from distributed sources. This fits the reality of how computational science actually works — code in one repo, data in another, narrative tying it together.
Community-driven. It’s stewarded by the Continuous Science Foundation and explicitly designed to evolve through community input. Supporters include openRxiv, Curvenote, Stencila, Posit (Quarto), and Creative Commons.
The Missing Piece: AI Readability
When we met to discuss the proposal, something struck me that hadn’t been considered yet: what does this format look like in the age of AI?
OXA packages content in containers with manifests and objects. But those manifests are currently designed for human developers and existing tools to parse. What about the systems that will increasingly be reading these packages — AI agents, LLMs doing literature review, automated pipelines extracting and synthesizing knowledge?
I suggested they think about semantic annotation at the packaging level. Not just valid JSON with typed fields, but explicit semantic markers that tell an AI system what it’s looking at and what the relationships between objects mean. The manifest should be annotated so that a machine intelligence can understand the contents without having to infer structure from patterns.
This isn’t about replacing human-readable documentation. It’s about making the container self-describing in a way that’s useful to the growing number of AI systems that will interact with scientific content. When an LLM encounters an OXA package, it should be able to understand at a glance: this is a research article, here are its components, this figure relates to that dataset, this code produced these results.
They were receptive. They’ve invited me to help pull together a working group of folks who understand both AI systems and scientific publishing — to think through where else semantic annotations could add value and what form they should take. The goal would be to ensure OXA isn’t just a modernized MECA, but a container format that’s native to an era where machines are first-class readers of scientific literature.
Why This Matters
The preprint world is different from traditional publishing. Content originates on servers like bioRxiv. It’s dynamic — versions change, code updates, data gets revised. The document isn’t a static artifact, it’s a snapshot of ongoing work.
JATS was designed for describing static, finished documents. MECA inherited that assumption. OXA has the opportunity to be different: a format for living documents, computational content, and an ecosystem where AI systems are active participants in scientific discovery.
We’ll see where it goes. Standards efforts can take years and often fail. But the coalition here is strong — preprint servers, authoring tools, open infrastructure organizations — and the timing feels right. The gap between how scientists actually work and how their work gets packaged for exchange keeps growing. Something needs to fill it.
If that something can also be readable by the AI systems that are reshaping how we interact with knowledge... all the better.


