PATopics: A framework to automate the extraction of information in pharmaceutical patent documents


  • Pablo Cecilio Universidade Federal de São João del-Rei
  • Felipe Viegas Universidade Federal de Minas Gerais
  • Juliana Rosa Universidade do Porto
  • Leonardo Rocha Universidade Federal de São João del-Rei



Patentes farmacêuticas, Modelagem de tópicos, Processamento de linguagem natural


Pharmaceutical patents are composed of documents with many details regarding the invention’s claims and methodology/results explanation. Management them refers to an exhaustive manual search. To mitigate this problem, we proposed PATopics, a framework able to extract relevant information from patents’ textual information, build relevant topics, correlate them with useful patent characteristics and present the information in a friendly web interface. We evaluated the framework using 4,832 pharmaceutical patents concerning 809 molecules patented by 478 companies. We analyze considering the demands of three user profiles – researchers, chemists, and companies – showing how practical and helpful PATopics is in the pharmaceutical scenario.


Download data is not yet available.


