A Computer Vision Application For Galaxy Detection

Computer Vision is an interdisciplinary field that combines knowledge from disciplines such as Physics, Computer Science, and Electrical Engineering. Its main goal is to develop algorithms and systems capable to reproduce human vision skills. The fields most closely related to computer vision are image processing, image analysis, and machine vision.

The core applications of computer vision have been historically in the healthcare, automotive, and agriculture industries, mostly because of the large investments required for the development and deployment of these systems. During the last 8 years, the situation has changed dramatically: the barriers to entry have decreased, and open-source libraries have proliferated. Nowadays, students and professionals from low-income countries have access to a large stack of computer vision resources and can develop high impact applications in short timescales.

Figure 1: Related disciplines to Computer Vision. Credit: Roberto E. Gonzalez.

Observational astronomy is a division of astronomy that is concerned with recording data about the observable Universe. Ground-based and space telescopes are used nightly to observe planets and distant galaxies. Specialized telescope instruments collect raw data that is stored in remote servers and later processed using several image processing and analysis pipelines.

The usual tasks related to the processing of astronomical images are systematics effect removal, point source detection, and image enhancement. These tasks are available in several applications, such as IRAF1 and libraries such as Astropy2, and they are routinely used by astronomers and engineers.

Nowadays, most of the data acquisition and processing tasks are fully automated. The community has developed several data reduction pipelines and frameworks that are freely available and can be easily used by professionals working at telescopes, universities, and outside academia.

Figure 2: A map showing some of what the Sloan Digital Sky Survey has discovered over the last twenty years. Image Credit: V. Belokurov, M. R. Blanton, A. Bonaca, X. Fan, M. C. Geha, R. H. Lupton, the SDSS Collaboration (https://www.sdss.org)

The Sloan Digital Sky Survey (SDSS3) is the largest astronomical survey ever executed, producing a catalog that contains about 500 million sources. The entire dataset weighs more than 100 TB and it includes images, spectra, and catalogs from one-third of the celestial sky. The data reduction and analysis of the data were done initially using customized pipelines developed by astronomers, data scientists, and engineers from several universities and institutes in the USA, and was later expanded by professionals from all over the world.

Although data reduction and preparation is mostly done using classical image processing methods, there still a lot of space for improvements in the areas of data analysis and visualization. Computer vision looks like a promising solution to facilitate the analysis of the big data in Astronomy and to accelerate the discovery of structures and phenomena in the Universe. However, this is not an easy task; the introduction of new methods or techniques coming from different fields(interdisciplinarity) is slow and usually is delayed several years from the state-of-the-art. The reason for this may be explained by two factors: one is that there are not may interdisciplinary scientists who bring knowledge from other fields; the second factor is that knowledge spreads into other fields after it becomes mature and well-developed. As an example, we have  computer vision techniques developed in the field of Computer Science, which are associated with machine- or deep-learning, arrive into Astrophysics ~4-5 years after they are developed, and it can be easily seen by counting the number of paper publications related to computer vision/deep learning/machine learning, which are fewer than 300, and concepts such as Deep Learning, Faster-CNN, and SSD have only just appeared in papers since ~2017-2018.

In this context, AstroCV4 repository appears to be an invitation to join efforts to reduce this time delay in the knowledge transfer from Computer Vision into Astrophysics, especially now with the overwhelming growth of knowledge in Computer Vision and access to new development frameworks and cheaper GPU computational power.

As part of the AstroCV initiative, we train a galaxy detection and identification model using state-of-the-art SSD neural networks framework(Darknet), and we develop a new data augmentation procedure to make this robust against images coming from different filters and instruments. The training set is built from the Galaxy Zoo5 database, with a classification of elliptical, spiral, edge-on, and merge galaxies. Data augmentation is very important for any model training scenario; it helps to improve the results of small training sets and make models more reliable in different conditions. In particular, astronomy images are taken in multiple filters and in FITS format with raw CCD data for each pixel, then data conversion from FITS to a RGB image is not unique and depends on the telescope’s camera, band filters, reduction schema, and on the conversion method used to scale photon counts to color scale.

We produced a data augmentation schema including several color conversion methods on the same objects, resulting in an important improvement in detection for images coming from different telescopes/instruments, taking into account we used a training set from SDSS instrument only. In Figure 3, we show results for images from SDSS reaching a recall ratio of 90%. However, for images taken from different color filters and telescope, results are not that good, and performance may drop down to even 20% recall performance. Including our data augmentation procedure, we get up to 3x better recall results. In Figure 4, we show results for an image taken from the Hubble Deep Field.

Figure 3: Galaxies found using our model in a typical SDSS image. Credit: Roberto Gonzalez

Figure 4: Galaxies in a Hubble Deep field image with data augmentation. (without data augmentation we could find one third of the galaxies only). Credit: Roberto Gonzalez

Roberto Gonzalez and Roberto Muñoz are formerly astronomers and moved to the Computer Vision Industry for a Chilean company MetricArts6, so knowledge transfer between Astrophysics, Computer Science and the Industry has become a daily basis process for them. They think that interdisciplinarity and collaboration between the technology industry and academics are fundamental to lead in the Computer Vision and AI fields. However, it requires a change of thinking from a traditional academy, and from traditional industry, where interdisciplinarity and knowledge transfer have a low value, especially in less developed countries.

These findings are described in the article entitled Galaxy detection and identification using deep learning and data augmentation, recently published in the journal Astronomy and Computing.


  1. Image Reduction and Analysis Facility http://iraf.noao.edu/
  2. http://www.astropy.org/
  3. https://www.sdss.org/
  4. https://github.com/astroCV
  5. https://www.galaxyzoo.org
  6. www.metricarts.com