Antibody Uses/Therapeutics – Subtopic Landscape
A synthetic biology perspective
The subset of SynBio – antibody uses/therapeutics related patents were further investigated to identify subtopics and assess trending areas.
The topic model leverages a hybrid approach based on the optimised extractive summary for each publication. Using a combination of topic discovery via fine-tuned transformer based deep learning and ground truth cross referencing via keyword and classification codes. The process enables a patent to belong to more than one topic for accurate multi-classification trends, accounting for multiple invention embodiments. Please see the topic model page for further details regarding the topic model methodology to avoid duplication here.Subtopic landscape
The synthetic biology – antibody uses/therapeutics topic model is visualised in figure 11.9, based on the dimensionality reduction of vector embeddings to map each patent to a contextually relevant x & y coordinate, the categorical clusters are colour coded to support review. The visual is based on patents assigned to one key subtopic for simplicity. However, trend analysis also enables a patent to belong to more than one subtopic which is consistent with the topic model methodology throughout this project.
Subtopic model – technology cluster totals
The hybrid topic model methodology identified 20 diverse topics which are ranked based on the total number of published applications in figure 11.10. A patent application can be counted more than once as it can belong to multiple topics.
In figure 11.10, the analysis enables multilabel classification for each patent application, to account for multiple invention embodiments. During the 20 year publication period 2004-2023, nearly 60% of the antibody dataset was classified within the top ranked subtopic (Biosequence related – 58.7%). The largest subtopic, Biosequence related, is a proxy for applications which have disclosed a sequence or sequence ID. Cancer therapeutics represents a major treatment area with 47.2% of applications classified within the topic. In particular, monoclonal antibodies (ranked 12th ) are a type of cancer treatment, antibody conjugates such as ADCs (ranked 9th) guide cytotoxic payloads to cancer cells and bispecific antibodies (ranked 18th) can be synthetically engineered to target t cells to kill cancerous cells. An important category, fusion polypeptides (ranked 4th) possess targeting capabilities through antibody derivatives as recombinant therapeutics.
The remaining subtopics reveal the applications of antibody based technology for various genetic engineering applications such as targeting of gene therapy, diagnostics, treatment of other disease areas and emerging therapeutic areas such as immunotherapy and chimeric antigen receptors. The importance of fusion proteins in the last decade (2014-23) is indicated by its increase in ranking from 4th to 3rd, the t-cell related topic (immunotherapy) is now ranked 5th instead of 7th and single chain antibodies are inside the top 10 during 2014-23.
Subtopic publication trends
The antibody uses/therapeutics subtopic publication year trends are shown in figure 11.11. Publication trends discussed below are based on EP A1/A2 applications, identified patents can belong to more than one subtopic due to multiple invention embodiments.
In figure 11.11, there are a number of rapidly growing subtopic areas within the antibody uses/therapeutics landscape at the EPO. The largest topic identified (biosequence related) is growing at a 11.9% compound annual growth rate (CAGR) during 2014-23, with 1586 publications during 2023. Cancer therapeutics grew at 15.7% with 1475 publications in 2023. The growth rates are mid-range but impressive given the large size of the topics. Chimeric antigen receptors are the fastest growing topic identified (47.5%). Above the 20% threshold, single chain antibodies grew at 28.5%, T-cell related (immunotherapy) at 25.6% and the nanobody / single domain topic at 23.2% CAGR. Fusion polypeptides grew at 19.3% CAGR, impressive given the overall size of the topic (6,918 publications during 2014-23). There were 637 published applications related to gene therapy in 2023, which grew at 15.9% CAGR during 2014-23, indicating the importance of antibody based cellular targeting. Antibody conjugates such as ADCs grew at 14% CAGR, treatment of joint disorders such as arthritis has been consistent since an earlier peak, growing at 3% CAGR during 2014-23.
Subtopic top 20 assignees distributions (2014-23)
The patent portfolios of the top 20 assignees within the SynBio – antibody uses/therapeutics dataset are analysed in figure 11.12. The portfolios are restricted to publications during 2014-23, mapped to the antibody subtopics identified; the counts represent total EPO publications.
The heatmap in figure 11.12 reveals the distribution of the top 20 antibody uses/therapeutics assignees during 2014-23, publications can be assigned to more than one subtopic, reflecting multiple invention embodiments. Within the top 20 assignees, NOVARTIS and HOFFMANN LA ROCHE have the largest portfolios within the cancer therapeutics subtopic. Beyond the UNIVERISTY OF PENNSYLVANNIA portfolio, NOVARTIS (83 publications) and in particular, JUNO THERAPEUTICS (81 publications) since acquired by BRISTOL MYERS SQUIBB, are the CAR-T specialists. REGENERON are the most prolific applicants for biosequence related patent applications. IMMATICS BIOTECHNOLOGIES have the largest portfolio for antibodies within gene therapy. AMGEN have a sizeable distribution of their portfolio within bispecific and monoclonal antibodies and one of the larger assignees within the single chain antibodies topic.
The analysis does not account for earlier publications prior to 2014, which may have contributed to companies developing market share, etc. and potential licensing and acquisitions (subsidiaries). Data cleaning was carried out to clean names and consolidate. The analysis is an informative guide as some specific subtopics have strict content boundaries to enable differentiation, whilst others are broader to capture more generic areas.
Patent family territory analysis
The INAPDOC patent families comprising the identified antibody uses/therapeutics related EPO patents were analysed to identify the top 30 territories where patents are filed. Analysing the publication countries alone is insufficient as major countries such as France, the UK, Germany, etc. may not publish patents going through the European (EPO) route, especially when pending. To further supplement the available data, a bespoke analysis was conducted standardising the publication countries and including ‘protected countries’ to include patent rights which are pending or granted based on legal status. There are caveats which include:
- The study methodology is focused on EPO patents and may not capture assignees/applicants that file only in home territories or don’t file in Europe via EPO filings.
- The protected country data may not be fully up to date, due to INPADOC data availability and where EPO patents are recent filings.
The standardisation procedure ensures a territory is only counted once per family. The territory analysis is visualised in figure 11.13, EPO and WO (PCT) patents have been included for reference purposes. Despite the caveats, the analysis provides useful indicators regarding territories where applicants are filing patents within the antibody uses/therapeutics field, based on 2014-23 publications for a relatively recent perspective.
In figure 11.13, approx.91% of the patent families identified had at least one US (90.6%) national filing. Other key territories with at least one national filing include Japan (72.7%), China (70.1%), Canada (66.4%) and Australia (58.8%). Below the 50% threshold, key territories include Republic of Korea (47.7%), Brazil (35.4%) and India (35.4%).
Subtopic keyword trends
Investigating keyword trends provides a different perspective beyond the antibody uses/therapeutics subtopic model. The smart summaries used during the topic model stage were data mined for the most contextually important keywords leveraging transformer based embeddings. Identifying keywords and phrases most similar to the document plus manual auditing for relevance to the SynBio project, visualised in figure 11.14. The visualisation indicates how the cumulative publication counts have changed between the publication periods during 2014-18 & 2019-23. The methodology aims to identify contextually relevant and reliable keywords as a source of ground truth, signify important keywords within the corpus and audit the topic model subtrend analysis already carried out.
In figure 11.14, the following key findings are observed and also support the trending areas identified by the subtopic modelling:
- Cancer is a major therapeutic area, increasing from 1470 publications during 2014-18 to 3470 publications in 2019-23. Tumor grew from 753 to 1728 publications during 2019-23. Vaccines represent a major area of patenting increasing from 1137 to 1680 publications during 2019-23. Coronavirus increased rapidly from 15 publications to 320 publications during 2019-23, which has contributed to the growth of the vaccine portfolio.
- Chimeric antigen receptor grew rapidly from 333 to 1464 publications during 2019-23, increasing by 4.4 times the size of publications during 2014-18. Immunotherapy grew from 298 to 801 publications during 2019-23.
Subtopic keyword analysis
For a further perspective of contextually important keywords, a statistical procedure was applied selecting six subtopics from the corpus. The analysis contrasts how the usage or frequency of the keywords / phrases differs across the subtopics using a weighted log odds ratio. This aims to identify which differences are meaningful and weight the log odds ratio by a prior outlined in Monroe, Colaresi, and Quinn (2008). The statistical procedure requires the prior is estimated from the data itself rather than an uninformative prior, such as a Dirichlet prior. The procedure is an empirical Bayes approach with results identified in figure 11.15. A further motivation is to audit the subtopics for result relevance and transparency and provide insights into content. As a sidenote the transformer based keyword analysis provides powerful methods to review subtopics and extend the analytical power beyond procedures of evaluating a corpus such as TF-IDF (term frequency-inverse document frequency).
In figure 11.15, the keywords outlined are most characteristic of each subtopic based on the weighted log odds score which is labelled. Another implication of higher log odds scores is the ability to define the keyword identified as more likely to be used within the specific subtopic. This is interesting as some of the log odds scores are not very high, which is not surprising given the overlap encountered between the multiple subtopics identified within the specific topic landscape.
Some key findings observed are:
- Gene therapy – use of antisense, sirna and irna molecules, conditions highlighted include neurodegenerative and CNS.
- Antibody conjugates – use of liposomes, synthetic nanocarriers and nanoparticles for drug delivery.
- Cancer therapeutics – antibody targeting of cytotoxic drugs, use of synthetic nanocarriers and liposomes for drug delivery. Specific chemotherapy treatments such as Gemcitabine.
It is difficult to distil and characterise the coverage of the subtopics via restricted keywords and phrases, this is also complicated by the weighting not always being frequency led but reflective of the terminology and context which is more characteristic of one subtopic in relation to others. It is fair to conclude that the subtopic model has successfully captured an extensive set of subtrends which are distinct, overlap exists but the trends are accurate once audited. The keywords are relevant to real word applications and suggest the insights identified are a useful tool to examine the specific topic landscape.