Scientific Rationale
A new paradigm in astronomical research has been emerging – “Data Intensive Astronomy” that utilizes large amounts of data combined with statistical data analyses.
The first research method in astronomy was observations by our eyes. It is well known that the invention of telescope impacted the human view on our Universe (although it was almost limited to the solar system), and lead to Kepler’s law that was later used by Newton to derive his mechanics. Newtonian mechanics then enabled astronomers to provide the theoretical explanation to the motion of the planets. Thus astronomers obtained the second paradigm, theoretical astronomy. Astronomers succeeded to apply various laws of physics to reconcile phenomena in the Universe; e.g., nuclear fusion was found to be the energy source of a star. Theoretical astronomy has been paired with observational astronomy to better understand the background physics in observed phenomena in the Universe. Although theoretical astronomy succeeded to provide good physical explanations qualitatively, it was not easy to have quantitative agreements with observations in the Universe. Since the invention of high-performance computers, however, astronomers succeeded to have the third research method, simulations, to get better agreements with observations. Simulation astronomy developed so rapidly along with the development of computer hardware (CPUs, GPUs, memories, storage systems, networks, and others) and simulation codes.
It has been well known that we need to conduct “statistical” analysis among and/or comparisons with various celestial objects to better understand astrophysical processes. However the limited sensitivity and amount of data in the past prohibited us to do so.
The rapid development of computer hardware depends strongly on semiconductor technologies, which, in turn, leads to large sensitive detectors that enabled astronomers to easily survey large sky areas. There are several challenging projects in the world to get large amount of data: ALMA with a few Petabytes of data product per year, LSST with expected product of 200 Petabytes for over ten years, Pan-STARRS that will produce several Terabytes per “night”, together with the VISTA, ELT, TMT, SKA, and others. These projects cover a wide range of scientific themes: cosmology, the large-scale structure of the Universe, formation of galaxies, star formation, variable stars, transient phenomena such as the Gamma-ray bursts, small bodies in the solar system, extrasolar planets, life in the Universe, dark matter and dark energy, and others.
Thus a new era of astronomical research utilizing large amounts of data will soon come, and astronomers need to be well-prepared for this new era. Since the data production rate will be 100 to 1000 times larger than the past, it will be crucial to have a combination of advanced machine learning technologies with immediate access to extant, distributed, multi-wavelength databases. Such an approach is necessary to make these assessments and to construct event notices that will be autonomously distributed to robotic observatories for near-real-time follow-up. Advanced data analyses combined with statistics and data mining will be essential to derive general “rules” and/or “knowledge” on various phenomena in the Universe, as the data volumes will make human inspection and analysis of the data impossible. The most important and exciting astronomical discoveries of the coming decade will rely on research and development in data science disciplines (including data management, access, integration, mining, visualization and analysis algorithms) that enable rapid information extraction, knowledge discovery, and scientific decision support for real-time astronomical research facility operations.
Significant scientific results are expected to be obtained from data-intensive astronomical research in the very near future and beyond, and, thus we will hold a special session during the GA in Beijing to understand and share the latest and foreseen status towards the fourth paradigm in astronomy – Data Intensive Astronomy.