Context: Advances in molecular pathology, systems biology, genomics, proteomics, clinical trials, and biomarker development studies have led to the need for standardized and well-annotated biospecimens for research. The Mesothelioma Virtual Biorepository (MVB) brings together information from epidemiologic, clinical, pathologic, and molecular areas to develop sets of common data elements to annotate the tissue specimens, providing useful and vital information for the end-users. The MVB project is supported by the Centers for Disease Control and Prevention (CDC), and is associated with the National Institute for Occupational Safety and Health (NIOSH). Technology: The MVB Web-based tool is based upon the caTISSUE Clinical Annotation Engine (CAE), originally developed by the University of Pittsburgh as part of the National Cancer Institute's (NCI) Cancer Biomedical Grid (caBIG) program. It provides a mechanism for entering, importing, and searching for biospecimen annotation. In the latest released version of this software (version 1.3), clinical annotations are attached to a participant/patient, a tissue accession, or a specimen (part) or subspecimen (block). Altogether, these entities form a hierarchy or backbone that encapsulates all of the annotation data for a case. Annotations can be entered manually using the provided user interface or imported using an XML format. The application provides a mechanism for making mesothelioma cases searchable via a user-friendly Web interface. The resulting system has been made available publicly on the Department of Biomedical Informatics (DBMI) Web site (http://www.mesotissue.org
). Information models for MVB are constructed as Unified Modeling Language (UML) class diagrams constructing, visualizing, and documenting the artifacts of software engineering. A UML class diagram is one that depicts a collection of static model elements such as physical or conceptual entities and their relationships. Enterprise Architect (EA) (developed by Sparx System) has been used as the UML modeling tool for this project due to its low cost and high performance. High-level UML classes were then joined by relations representing the logical relationships between classes. Design: The MVB architecture is based on 3 major components that work in succession to rationalize the process of data annotation: Common data element (CDE)-The entire set of information regarding CDEs is gathered during the routine work of a medical center, so these data sets can be easily configured and maintained. The CDEs were built upon the College of American Pathologists (CAP) checklists, the Association of Directors of Anatomic and Surgical pathology (ADASP) guidelines, and the North American Association of Central Cancer Registries (NAACCR) core elements. The final set of data elements thus formed provides sufficient and comprehensive information: Data Entry-The data entry tool is mobile, adjustable, and Web based. Data entered is de-identified before being made available in the database. Data Query Tool-The data query tool runs on a "point and click" query system, thereby facilitating researchers to search through de-identified data, while at the same time only allowing specific data to be copied and transferred. The level of query is dependent on the access level granted to the investigators. Result: The database contains standardized sets of clinical (demographic and epidemiologic data) and pathologic information, follow-up information, and genotypes data. These are available to investigators via an easy to use query tool so they can maximally use tissue samples. The data disclosed are tightly regulated depending upon users' authorization. Such a secure and easily navigated database provides access to a vast source of information related to biospecimens for the end-users. Conclusions: The MVB acts as a central resource allowing researchers to access well-annotated high-quality biospecimens. It protects patient privacy by disclosing only de-identified data, and biospecimens can be accessible only to researchers who have institutional review board and scientific review board committee approval. The integration of heterogeneous data sets along with an efficient statistical analysis provides users with standardized biospecimens to support their research.