PATopics: A framework to automate the extraction of information in pharmaceutical patent documents

Authors

  • Pablo Cecilio Universidade Federal de São João del-Rei
  • Felipe Viegas Universidade Federal de Minas Gerais
  • Juliana Rosa Universidade do Porto
  • Leonardo Rocha Universidade Federal de São João del-Rei

DOI:

https://doi.org/10.5753/reic.2023.3417

Keywords:

Patentes farmacêuticas, Modelagem de tópicos, Processamento de linguagem natural

Abstract

Pharmaceutical patents are composed of documents with many details regarding the invention’s claims and methodology/results explanation. Management them refers to an exhaustive manual search. To mitigate this problem, we proposed PATopics, a framework able to extract relevant information from patents’ textual information, build relevant topics, correlate them with useful patent characteristics and present the information in a friendly web interface. We evaluated the framework using 4,832 pharmaceutical patents concerning 809 molecules patented by 478 companies. We analyze considering the demands of three user profiles – researchers, chemists, and companies – showing how practical and helpful PATopics is in the pharmaceutical scenario.

Downloads

Download data is not yet available.

References

Garattini, L., Badinella Martini, M., and Mannucci, P. M. (2022). Pharmaceutical patenting in the European Union: reform or riddance. Internal and Emergency Medicine, 17(3):937–939.

Genin, B. L. and Zolkin, D. S. (2021). Similarity search in patents databases. The evaluations of the search quality. World Patent Information, 64(February):102022.

Khachigian, L. M. (2020). Pharmaceutical patents: reconciling the human right to health with the incentive to invent. Drug Discovery Today, 25(7):1135–1141.

Meng, Z., Shen, H., Huang, H., Liu, W., Wang, J., and Sangaiah, A. K. (2018). Search result diversification on attributed networks via nonnegative matrix factorization. Information Processing & Management, 54(6):1277–1291.

Reinhardt, U. E. (2001). Perspectives on the pharmaceutical industry. Health Affairs, 20(5):136–149.

Sammut, C. and Webb, G. I., editors (2010). TF–IDF, pages 986–987. Springer US, Boston, MA.

Viegas, F., Canuto, S., Gomes, C., Luiz, W., Rosa, T., Ribas, S., Rocha, L., and Gonçalves, M. A. (2019). Cluwords: Exploiting semantic word clustering representation for enhanced topic modeling. pages 753–761.

Waters, H. and Graf, M. (2018). The Costs of Chronic Disease in the U.S. Milken Institute, (August):24.

Published

2023-08-05

How to Cite

Cecilio, P., Viegas, F., Rosa, J., & Rocha, L. (2023). PATopics: A framework to automate the extraction of information in pharmaceutical patent documents. Eletronic Journal of Undergraduate Research on Computing, 21(2), 21–30. https://doi.org/10.5753/reic.2023.3417