Introduction
Defining synthetic biology
Synthetic biology, also known as engineering biology, is an emerging multidisciplinary field with many definitions. It is generally accepted that SynBio involves the design or redesign of biological systems for the development of useful and sustainable new products, etc. This includes novel metabolic pathways, engineered enzymes, artificial genomes and much more. SynBio technologies have a wide array of applications and huge potential to revolutionise how engineering biology can be used to augment our daily lives.
Topic modelling overview
This study applies advanced topic modelling, optimised by hyperparameter tuning to deliver deep insights, leveraging transformer based text embeddings and state of the art natural language processing (NLP) techniques. To introduce the SynBio patent landscape, a datamap of the identified 40 topics is visualised in figure 1.1.
For simplicity, the datamap (figure 1.1) visualises a patent document mapped to a primary cluster, with 40 clusters colour coded. The higher dimension text embeddings have undergone dimensionality reduction to enable 2D plotting. The workflow exploits artificial intelligence to augment the clustering procedure when analysing diverse and complex subject matter. A quality control stage enables the clustering to be verified and fine-tuned by manual data checks, requiring diverse patent classification codes and keywords. Implemented by the bespoke analytics system developed by Inevus Advanced Analytics.
The topic modelling exploits a custom smart summarisation method ensuring contextually relevant vector representations of patent text, for a specific context window. State of the art NLP is used for topic discovery, powered by Transformers. The methodology utilises a hybrid approach to include SynBio domain expertise. This involves manually creating additional topics and quality control of topic assignment, all ensuring excellent accuracy. The methodology was further adapted to support multi-topic assignment to recognise patent documents with multiple invention embodiments. The hybrid methodology powered by state of the art NLP enables a patent to belong to more than one cluster delivering accurate trend analysis and deep insights.
Defining synthetic biology
SynBio technologies can help solve many of the environmental and societal challenges of this century. Delivering alternative bio fuels, innovative biomaterials, microbes which degrade environmental pollutants, alternative proteins, engineered immunotherapies in addition to many other breakthroughs reshaping the bioeconomy. Potter Clarkson’s patent attorneys and IP solicitors specialise in protecting and commercialising synthetic biology innovation, with deep domain expertise. A broader interpretation of synthetic biology or engineering biology encompasses manipulation of the DNA or RNA of an organism right up to the broad definition of “using biology to do stuff”! Our report investigates synthetic biology from a patent perspective working with this broad definition. Further details can be reviewed here Potter Clarkson - Synthetic biology.
UK perspective
Our analysis of European Patent Office (EPO) patents with UK residency revealed the synthetic biology field is growing at a faster compound annual growth rate (CAGR) (10.2%) than UK resident Biotechnology (8.5%) and UK resident patents generally - no subject matter limit (2%), based on EPO filings published during 2014-23. During 2014-23, the UK had a positive relative specialisation index of 0.21, ranked 5th ahead of other European countries with similar demographics such as Germany (RSI of -0.25) and France (-0.03). Statistical evidence suggests the UK is one of the most specialised countries in terms of patenting within the synthetic biology field and may further develop its position in the ranking driven by an increasing CAGR.
Data notes and limitations
EP patent publications organised by INPADOC families act as a proxy to retrieve patent publications within the synthetic biology field and asses innovation trends via advanced patent analytics. The analysis identifies key trends with a standardised methodology, it is not intended to be 100% exhaustive due to complexity issues. Previous attempts to map synthetic biology landscapes have revealed a high difficulty level. We have carefully curated a bespoke dataset using the leading commercial database, Questel Orbit Innovation.
The following dataset notes and limitations apply:
-
The analysis focuses on European patent filings. Whilst we acknowledge that this will not capture applicants that file only in their home territories such as the US and China, it exploits applicants that take a more global approach to their IP and tend to elect to file a European patent application. The dataset captures engineering patent applications delivering a good compromise to the alternative; compiling very large detailed datasets for all major patent territories, deemed unpracticable given the current levels of dataset curation.
-
Coverage is subject to the standard 18-month publication delay, due to the publication routines and examination timeframes of patent offices. Therefore, the dataset represents a snapshot in time and investigates EP patent applications published during 2004-2023 for a 20 year publication period. The EPO patent applications identified are A1 & A2 kind code only, avoiding duplication of counts from correction specs, etc.
-
The study aims to capture key technologies within the synthetic biology field which broadly encompasses engineering biology, providing important examples of SynBio innovation. This approach also captures relevant broader biotechnology patents, etc. which form the background from which synthetic biology technologies have emerged e.g. protein engineering, genetic engineering, including biofuels, etc.
-
A large component of the research methodology relies on using patent families assigned to SynBio appropriate IPC/CPC subgroup classification codes identified via review of patent portfolios of prominent synthetic biology applicants using advanced data mining techniques. This is further supplemented with keyword searches to extend the scope of the analysis, data cleaning also sufficiently balances the need for precision and recall, providing a very robust tool to conduct analyses. The coverage, machine translation and data quality of the Questel Orbit innovation database was an excellent resource for this project.