Introduction

Defining synthetic biology

Synthetic biology, also known as engineering biology, is an emerging multidisciplinary field with many definitions. It is generally accepted that SynBio involves the design or redesign of biological systems for the development of useful and sustainable new products, etc. This includes novel metabolic pathways, engineered enzymes, artificial genomes and much more. SynBio technologies have a wide array of applications and huge potential to revolutionise how engineering biology can be used to augment our daily lives.

Topic modelling overview

This study updates our previous attempt and continues to apply advanced topic modelling, optimised by hyperparameter tuning to deliver deep insights, leveraging transformer based text embeddings and state of the art natural language processing (NLP) techniques. To introduce the updated SynBio patent landscape, a datamap of the identified 60 topics is visualised in figure 1 (the 40 previous topics from the previous report are conserved).

For simplicity, the datamap (figure 1) visualises a patent document mapped to a primary cluster, with 60 clusters colour coded. The higher dimension text embeddings have undergone dimensionality reduction to enable 2D plotting. The workflow exploits artificial intelligence to augment the clustering procedure when analysing diverse and complex subject matter. A quality control stage enables the clustering to be verified and fine-tuned by manual data checks, requiring diverse patent classification codes and keywords. Implemented by the bespoke analytics system developed by Inevus Advanced Analytics.

The topic modelling exploits a custom smart summarisation method ensuring contextually relevant vector representations of patent text, for a specific context window. State of the art NLP is used for topic discovery, powered by Transformers. The methodology utilises a hybrid approach to include SynBio domain expertise. This involves manually creating additional topics and quality control of topic assignment, all ensuring excellent accuracy. The methodology was further adapted to support multi-topic assignment to recognise patent documents with multiple invention embodiments. The hybrid methodology powered by state of the art NLP enables a patent to belong to more than one cluster delivering accurate trend analysis and deep insights.

Defining synthetic biology

SynBio technologies can help solve many of the environmental and societal challenges of this century. Delivering alternative bio fuels, innovative biomaterials, microbes which degrade environmental pollutants, alternative proteins, engineered immunotherapies in addition to many other breakthroughs reshaping the bioeconomy. Potter Clarkson’s patent attorneys and IP solicitors specialise in protecting and commercialising synthetic biology innovation, with deep domain expertise. A broader interpretation of synthetic biology or engineering biology encompasses manipulation of the DNA or RNA of an organism right up to the broad definition of “using biology to do stuff”! Our report investigates synthetic biology from a patent perspective working with this broad definition. Further details can be reviewed here Potter Clarkson - Synthetic biology.

UK perspective

The UK is now ranked 7th as an assignee country based on publications since 2015, a decrease from 5th in our earlier report. However, the UK has now been identified as the second fastest growing assignee country with a compound annual growth rate of 25.7% during 2016-24. The UK is identified as a top 5 ranked assignee country across a number of key SynBio topic areas. Exhibiting therapeutic expertise in topics such as fusion proteins, gene therapy, immunotherapy, treatment of neurodegenerative related conditions, pharma compounds, vaccines, virus and bacteriophage related innovation amongst others. The UK is also a leading territory for SynBio related AI & machine learning, molecular design and biostatistics such as drug discovery and functional genomics or proteomics.

Data notes and limitations

EP patent publications organised by INPADOC families act as a proxy to retrieve patent publications within the synthetic biology field and asses innovation trends via advanced patent analytics. The analysis identifies key trends with a standardised methodology, it is not intended to be 100% exhaustive due to complexity issues. Previous attempts to map synthetic biology landscapes have revealed a high difficulty level. We have carefully curated a bespoke dataset using the leading commercial database, Questel Orbit Innovation. The following dataset notes and limitations apply:

  • The analysis focuses on European patent filings. Whilst we acknowledge that this will not capture applicants that file only in their home territories such as the US and China, it exploits applicants that take a more global approach to their IP and tend to elect to file a European patent application. The dataset captures engineering patent applications delivering a good compromise to the alternative; compiling very large detailed datasets for all major patent territories, deemed unpracticable given the current levels of dataset curation.

  • Coverage is subject to the standard 18-month publication delay, due to the publication routines and examination timeframes of patent offices. Therefore, the dataset represents a snapshot in time and investigates EP patent applications published since 2015. To manage the dataset size we had to implement a rule where we worked with patent families since 2015 and isolated the applicable EPO family members. The EPO patent applications identified are A1 & A2 kind code only, avoiding duplication of counts from correction specs, etc.

  • The study aims to capture key technologies within the synthetic biology field which broadly encompasses engineering biology, providing important examples of SynBio innovation. This approach also captures relevant broader biotechnology patents, etc. which form the background from which synthetic biology technologies have emerged e.g. protein engineering, genetic engineering, including biofuels, etc.

  • A large component of the research methodology relies on using patent families assigned to SynBio appropriate IPC/CPC subgroup classification codes identified via review of patent portfolios of prominent synthetic biology applicants using advanced data mining techniques. This is further supplemented with keyword searches to extend the scope of the analysis, data cleaning also sufficiently balances the need for precision and recall, providing a very robust tool to conduct analyses. The coverage, machine translation and data quality of the Questel Orbit innovation database was an excellent resource for this project.