Centers for Disease Control and Prevention Centers for Disease Control and Prevention CDC Home Search CDC CDC Health Topics A-Z site search
National Office of Public Health Genomics
Centers for Disease Control and Prevention
Office of Genomics and Disease Prevention
Site Search
   

Links
An open source infrastructure for managing knowledge and finding potential collaborators in a domain-specific subset of PubMed

line

Background
Identifying relevant research in an ever-growing body of published literature is becoming increasingly difficult. Establishing domain-specific knowledge bases may be a more effective and efficient way to manage and query information within specific biomedical fields. Adopting controlled vocabulary is a critical step toward data integration and interoperability in any information system. We present an open source infrastructure that provides a powerful capacity for managing and mining data within a domain-specific knowledge base. As a practical application of our infrastructure, we developed two applications—Literature Finder and Investigator Browser—as well as a tool set for automating the data curating process for the Human Genome Published Literature Database (HuGE Pub Lit). The design of this infrastructure makes the system potentially extensible to other data sources. 

Results
Information retrieval and usability tests demonstrated that the system had high rates of recall and precision, 90% and 93% respectively. The system was easy to learn, easy to use, reasonably speedy and effective.

Conclusions
The open source system infrastructure presented in this paper provides a novel approach to managing and querying information and knowledge from domain-specific PubMed data. Using the controlled vocabulary UMLS enhanced data integration and interoperability and the extensibility of the system. In addition, by using MVC-based design and Java as a platform-independent programming language, this system provides a potential infrastructure for any domain-specific knowledge base in the biomedical field.   

The System Requirements
Operating systems: Windows and Linux/Unix
Database: MS SQL server and MySQL
Programming language: Java
Software packages: J2EE 1.4, Hibernate 3.0 and Strut 1.2.9
License: GNU General Public License. This license allows the source code to be redistributed and/or modified under the terms of the GNU General Public License as published by the Free Software Foundation. The source code for the application is available at no charge.
Any restrictions to use by non-academics: None

Download and Installation
This open source project includes a Web Application and corresponding standalone curating tool sets.

External Data Source Download

For questions and comments please email HuGE@cdc.gov

This reference links to a non-governmental website  Provides link to non-governmental sites and does not necessarily represent the views of the Centers  for Disease Control and Prevention.

 

Page last updated: December 11, 2007
Content Source: National Office of Public Health Genomics