The Open Science Of Reproductive Biology: A New Open-Source Project For Sperm Analysis

Science does not exist, at least, not by itself. It is a word that we use to mention all of the knowledge we have obtained by applying a specific method that involves several steps. First, you need a question to which you do not have the answer. After background research about what is known in relation to that question, you construct a hypothesis, a guess of a possible answer. Next, you have to test if your hypothesis is correct by designing an experiment — the best one possible to get the results that will lead you to a yes/no answer.

If you obtain a “yes,” your hypothesis has been converted into an explanation of your question and you have advanced in your knowledge. However, if you obtain a “no,” you have to come back to your hypothesis, rethink a new one, and design and test a new experiment. This is what we call “the scientific method.” This method is our best tool to advance in knowledge, and it offers us not only a way to deal with the unknown, but also to reproduce the processes that build what we know, independent of the observer.

Nevertheless, in the real world, the problem of reproducibility comes in during the experimentation step. When scientists deal with an experiment, either using different tools or products, sometimes the results change. A priori, it is difficult to know how the variability of different machines to measure the same phenomena or the products involved in a complex chemical reaction can affect the conclusions of your study.

Recently, a manifesto for reproducible science [1] has been published, where authors point to a list of good practices in order to guarantee the reproducibility of the scientific studies as much as possible. As part of what it is known as Open Science, one of the key points of this manifesto is the encouragement to make all data and software used publicly available, in order to make peer-review testing of the results and conclusions obtained in the corresponding studies. The problem here is that in most studies, the source code of the software used to either measure or analyze the data is private and inaccessible, making the comparison and understanding of why similar studies led to different conclusions difficult. Furthermore, an additional problem found is that, usually, the needs of the scientific community and the availability of commercial solutions for these needs are not always synchronized, with the former normally leading the latter. In other cases, the scientists need a level of flexibility to make changes that private solutions cannot offer because of the opaque nature of such programs.

In order to solve these problems, in the past few years, a number of initiatives have launched to create open-source projects with the aim to provide a common framework to discuss and develop standard tools to guarantee the reproducibility of the assessments, but also to develop new functionalities as fast as possible, within and for the scientific community. A few years ago, Michael Woelfle published an article about how open science accelerates the research process. In this communication [2], Woelfle et al. described the process of creating open-source products as an iterative cycle of five steps: (1) a problem or need being identified, (2) a preliminary solution being posted to this problem, (3) an open appeal to the wider community being made, (4) inputs received from an unrestricted community, and (5) the cycle beginning over again.

They explained the traditional versus open-source approaches to software development by the analogy of the “cathedral and the bazaar.” In the traditional (cathedral) model, adopted by many academic and industrial groups, the work is done by a closed team of artisans with a high qualification, organized in a hierarchical, rigid structure, where just one or few people are in charge of the team. In a bazaar-type project, there is a low barrier to entry, and the operation is seemingly chaotic or self-organizing. In this paradigm, leadership is fluid, if it exists at all. The system is effective at what it does, yet requires little investment to start up and relies on the traffic of inherently interested strangers.

Recently, in the field of reproductive biology, we have started a new open-source project focused on the development of new software for sperm analysis. With the description of Woelfle et al., what we have done is detected a need in the field and provided a preliminary solution. Now, it is time to make a community. With the idea of the open science, we have not only published the source code of the software, but we also have provided it via different communication channels in order to facilitate the interaction between researchers. Despite the recent publication, the forum is working better than expected, and we are now preparing a new version of the software based on the suggestions of other members. Anyway, we want to reach a bigger audience, but before making a call to the community, let me explain how the software works.

OpenCASA: a new Open-Source Computer-Assisted Sperm Analysis System

In the field of assisted reproductive techniques (ART), computer-assisted sperm analysis (CASA) systems have proved their utility and potential for analyzing sperm quality. The idea is to predict the fertility rates analyzing some sperm parameters, like cell motility, morphometry, or membrane integrity. Briefly, these CASA systems work as follows: with the help of a microscope and a camera, and depending on the analysis, different videos or images are recorded for different sperm samples. This will be the input for the program. After that, the program tries to identify the cells and to extract the specific information needed for the assessment.

For motility analysis, for example, the program takes a video as input, and the procedure consists of four steps: first, for each frame, the program detects all possible cells; second, the program reconstructs the trajectory of every cell, based on the location of each one frame by frame; third, the program applies some quality control to filter possible trajectories that have not been detected properly; and fourth, the program processes each trajectory, obtaining the motility parameters (like velocity, linearity, etc) and shows the results. In image analysis, the process is simpler. For morphometry analysis, for example, the program only needs to detect all cells in the image and to assess some morphometry parameters like area, perimeter, etc.

Despite its apparent simplicity, some of these assessments strongly depend on the algorithms used to implement the analysis, and several studies have shown high variability in sperm quality parameters, depending on the commercial system used [3].

In the recent years, some open-source alternatives have been proposed, but these programs are still way behind the commercial CASA systems in terms of ease of use and standardization, and they have not usually been designed to encourage the scalability and the continuity of the software development. Hence, the source code is usually written in one single file and published by references to static web pages or by links to a file hosting service, like Dropbox. In this scenario, users can download the software, but they cannot update or improve these programs for the benefit of other users. In the worst cases, the link is broken shortly after publication.

Besides the known parameters, recently, a new one has been hypothesized that could be strongly related to fertility: the sperm capacity to respond to the guidance mechanism toward the egg. It has become clear that mammalian spermatozoa must be guided to reach the oocyte[4], and three different mechanisms have been proposed to date, at least in humans: thermotaxis, rheotaxis, and chemotaxis, each of which is a response to a specific stimulus: temperature gradient, fluid flow, and chemical gradient, respectively. Currently, the idea that the movement guiding the sperm to the oocyte could be due to a combination of several of these mechanisms cannot be ruled out. Recent studies provide experimental support for the importance of guidance in the fertilization process. Thus, the study of sperm responsiveness to these guidance mechanisms could be a good indicator of seminal quality and could help predict the fertility of a given seminal sample. The problem is that there is no consensus of how to measure this responsiveness, neither in the experimental setup nor in the software used to analyze the results.

Due to the lack of an open-source alternative for the analysis of sperm responsiveness to guidance mechanisms, and in order to integrate several sperm quality parameters in the same tool, we set out to develop a new, open-source sperm analysis software. The aim of this project was to develop free software that offers the possibility to analyze several parameters related to seminal quality and to provide communication channels to discuss further developments and possible standards. This software, which we have named OpenCASA[5] allows, using a single program structured in four different modules, to analyze motility, viability, morphology, and the sperm response of guidance mechanism. For the latter, we implemented a specific method, but it is open to the community to decide and discuss new ones.

The software has been released at Github. This platform allows researchers not only to download the software but also to be involved in and contribute to further developments. Additionally, a test data has been uploaded to figshare, and a Google group has been created to allow the community to interact and discuss OpenCASA further and access the group via the forum or the mailing list. All of these links can be found in the original article[5].

Talking about the advantages of openness in their particular case of developing an off-patent drug, Woelfle et al.[2] said: “[T]he crucial message of the open project is this: the research was accelerated by being open. Experts identified themselves and spontaneously contributed based on what was being posted online. The research therefore inevitably proceeded faster than if we had attempted to contact people in our limited professional circle individually.”

Unlike other plugins developed for the study of certain cell quality parameters, the development of the OpenCASA software does not end here but remains open for the incorporation of new modules and new functionalities. For this purpose, we would like to appeal to the scientific community to collaborate and use the communication networks mentioned above.

These findings are described in the article entitled OpenCASA: A new open-source and scalable tool for sperm quality analysis, recently published in the journal PLOS Computational Biology.


  1. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Sert NP du, et al. A manifesto for reproducible science. Nat Hum Behav. 2017 Jan 10;1:0021.
  2. Woelfle M, Olliaro P, Todd MH. Open science is a research accelerator. Nat Chem. 2011 Oct;3(10):745–8.
  3. Boryshpolets S, Kowalski RK, Dietrich GJ, Dzyuba B, Ciereszko A. Different computer-assisted sperm analysis (CASA) systems highly influence sperm motility parameters. Theriogenology. 2013 Oct 15;80(7):758–65.
  4. Eisenbach M, Giojalas LC. Sperm guidance in mammals — an unpaved road to the egg. Nat Rev Mol Cell Biol. 2006 Apr;7(4):276–85.
  5. Alquézar-Baeta C, Gimeno-Martos S, Miguel-Jiménez S, Santolaria P, Yániz J, Palacín I, et al. OpenCASA: A new open-source and scalable tool for sperm quality analysis. PLOS Comput Biol. 2019 Jan 18;15(1):e1006691.