Machine learning report cover

Collecting and analysing data is central to the scientific method. To extract insights from data, researchers have long used statistical techniques. These statistical frameworks are an important tool to help researchers extract as much information as possible from data that has often taken significant time and money to generate and collect. In the early 1900s, for example, the t-test gave researchers a new tool to test the strength of their hypotheses.

Today, an increasing volume of information is being collected, from a greater range of sources, and at greater speed than ever before. Image or video uploads to social media, GPS-enabled devices, and other online activities are generating stores of data, as people spend more of their work and leisure time online. This all contributes to the creation of an estimated 2.5 billion gigabytes of data per day. The availability of large data sets, coupled with increasing computing power and the development of new algorithmic techniques has vastly increased the power of AI technologies.

AI is an umbrella term. It refers to a suite of technologies that can perform complex tasks when acting in conditions of uncertainty.

AI today has become a key tool for researchers across domains to analyse large datasets, detecting previously unforeseen patterns or extracting unexpected insights For example:

  • Understanding a protein’s shape is key to understanding the role it plays in the body. By predicting these shapes, scientists can identify proteins that play a role in diseases, improving diagnosis and helping develop new treatments. The process of determining protein structures is both technically difficult and labour-intensive. While advances in genetics in recent decades have provided rich datasets of DNA sequences, determining the shape of a protein from its corresponding genetic sequence – the protein-folding challenge – is a complex task. To help understand this process, researchers are developing machine learning approaches that can predict the three-dimensional structure of proteins from DNA sequences.
  • Research in astronomy generates large amounts of data and a key challenge is to detect interesting features or signals from the noise, and to assign these to the correct category or phenomenon. AI tools have also been used to discover new astronomical phenomena, for example: finding new pulsars from existing data sets; identifying the properties of stars; and correctly classifying galaxies.
  • In environmental sciences, AI can help understand the impact of climate change on cities and regions; analyse satellite data to track the movement of endangered species, and help produce more accurate forecasts of extreme weather events. In so doing, AI could help develop new solutions to tackle climate change.

Many of these applications in turn open up new questions for AI research. For example:

In data management:

  • How should researchers decide what data to keep and what to discard, when an experiment or observation produces too much data to store?
  • How can scientists search efficiently for rare or unusual events and objects in large and noisy data sets.

In designing AI methods:

  • How can AI methods produce results that researchers are able to interpret and understand?
  • How can research help create more advanced, and more accurate, methods of verifying machine learning systems to increase confidence in their deployment?

While bringing a range of benefits today, these technologies could also have a disruptive influence on the conduct of science in future.

In the near term, AI can be applied to existing data analysis processes to enhance pattern recognition and support more sophisticated data analysis. A more sophisticated emerging approach is to build into AI systems scientific knowledge that is already known to influence the phenomena observed in a research discipline – the laws of physics, or molecular interactions in the process of protein folding, for example.

In future, AI tools could play a role in the definition and refinement of scientific models. An area of promise is the field of probabilistic programming (or model-based machine learning), in which scientific models can be expressed as computer programs, generating hypothetical data. This hypothetical data can be compared to experimental data, and the comparison used to update the model, which can then be used to suggest new experiments – running the process of scientific hypothesis refinement and experimental data collection in an AI system.

AI’s disruptive potential could, however, extend much further. AI has already produced outputs or actions that seem unconventional or even creative – in AlphaGo’s games against Lee Sedol, for example, it produced moves that at first seemed unintuitive to human experts, but which proved pivotal in shaping the outcome of a game. In the longer-term, the analysis provided by AI systems could point to previously unforeseen relationships, or new models of the world that reframe disciplines. Such results could advance the frontiers of science, and revolutionise research in areas from human health to climate and sustainability.

 

A discussion paper (PDF) by the Society and The Alan Turing Institute explores some of these issues in further detail. Available on our AI webpage.