Scientific research has traditionally undergone a rigorous peer review process by experts in the field, before being published in journals and available to read. Prior to the pandemic, preprints were a relatively new format of research article that enable researchers to share their results and data early, before journal peer review is complete.
At the start of the COVID-19 pandemic, it became clear that immediate access to scientific research and data on the new coronovirus was critical, so that treatments and vaccines could be developed quickly, safely and effectively. The European Bioinformatics Institute (EMBL-EBI) received grant funding from Wellcome, the Swiss National Science Foundation and the Medical Research Council to create a collection of full text COVID-19 preprints in machine-readable, structured XML format to enable deeper analysis.
As the Chief Scientist of WHO I welcome the huge increase in the use of pre-prints by researchers to rapidly share the emerging evidence from the many studies on Covid-19. However, these are published as .pdf documents and I recognise that the information they contain could be more rapidly searched and linkages made between the results and data they contain if they were converted to the standard publishing language XML. I therefore support this initiative by Europe PMC to take on this task.
Dr. Soumya Swaminathan, Chief Scientist, WHO; 4 May 2020
The challenge
Some life sciences preprints have restrictive licenses which meant they couldn’t automatically be displayed on Europe PMC without the author’s approval. Even though some preprints had open licenses such as CC-BY, we decided that it would be courteous to ask all preprint authors to review the conversion of their preprint to XML/HTML and approve the display of the full text on Europe PMC. A significant challenge in this project was how to obtain author approval, as many authors had not heard of Europe PMC before. It was essential that our communications and approval workflows were clear, simple and instilled trust.
We also wanted to make it easy to find and view preprint articles on Europe PMC, alongside peer reviewed articles. Because preprints are not fully peer reviewed, It was important for readers to be able to clearly distinguish preprint articles from peer reviewed journal article content and make their own decisions about the robustness of the science.
What I did as Product Manager
I worked with the Team Leader and a Data Scientist to write a grant application and define the vision and strategy for the project. I then planned and led the project delivery. Some of the activities included:
- Planning and implementing data pipelines to ingest metadata and PDF files from preprint servers based on a set of rules and a consistent COVID-19 search query.
- Creating new author workflows in Europe PMC Plus, the existing manuscript submission system based on the open source framework PubSweet. The workflows sent emails to authors and invited them to review their converted preprint and approve it for display in Europe PMC. I created flow diagrams (see below) to ensure the team were clear on the workflow steps.
- Increasing our article processing capacity to convert PDFs to structured XML and HTML. I worked closely with our external supplier and internal Helpdesk team to scale the XML conversion and QA operations by 2000%.
- Designing changes to the search results interface and preprint page display to ensure readers were clear they were looking at an article that had not been peer-reviewed.

After launch I worked closely with the Data Scientist to analyse live service data, for example the number of preprint approvals by license type.
I managed the project budget and created a forecasting spreadsheet so that we could predict how long the funding would last. I also produced reports for the funders.
What I did as a User Researcher
I worked with the UX Designer, Developer and Helpdesk to design email text and screens in the author workflow preprint pages and test them with users. As a team we gathered and responded to live service data and feedback from users via the Helpdesk and Twitter.

Outcomes
Over 77,500 full text COVID-19 preprints were indexed in Europe PMC, providing researchers and clinicians with access to the latest COVID-19 research and data. The collection has been used in systematic reviews and meta analysis studies. As a team we defined best practice standards and shared these with the open science community.
Learnings
There was limited time for user research due to the urgency of the pandemic situation. With more time I would have done more testing with users of the author workflows and emails.
References
Levchenko M, Parkin M, McEntyre J, Harrison M. Enabling preprint discovery, evaluation, and analysis with Europe PMC. Plos one. 2024 ;19(9):e0303005. DOI: 10.1371/journal.pone.0303005. PMID: 39325770; PMCID: PMC11426508.
Ferguson C, Araújo D, Faulk L, et al. Europe PMC in 2020. Nucleic Acids Research. 2021 Jan;49(D1):D1507-D1514. DOI: 10.1093/nar/gkaa994. PMID: 33180112; PMCID: PMC7778976.
Europe PMC Team. Over 15,300 full text COVID-19 now available in Europe PMC. 2021 Feb.