Description of Aspects of the EF-Hand Calcium-Binding Proteins Data Library

The EF-Hand Calcium-Binding Proteins Data Library (EF-Hand CaBP-DL) is a highly curated collection of sequence, structural, and functional information about the EF-Hand superfamily of calcium-binding proteins. It has been conceived, designed, and implemented by Melanie Nelson, a former graduate student in Walter Chazin's lab.

Information Integrity

All information that is not obtained directly from another public database has been published in a peer-reviewed journal. Users of the data library can view the reference associated with a particular piece of information via the InfoCard. The InfoCards also hold information about who submitted the data to the EF-Hand CaBP-DL, and which library administrator checked the information and validated it for inclusion in the database. Every piece of information included in the data library has been validated by a library administrator.

Information Storage

There are two types of information stored in the data library: 1) Information that is stored in a relational database and written to browsable HTML pages on a regular schedule, and 2) information that is stored solely in HTML pages.

The majority of information is stored in the relational database, which is implemented in the PostgreSQL database management system. The database was designed using the relational paradigm, although normalization is occasionally broken for convenience of reporting the information. The entity-relationship model for the database is available online. As can be seen in this model, the information in the database is organized around proteins. Each mutant or isoform of a protein is considered a unique protein in the database, and is assigned a unique identifier (the prot_id). This allows storage of the calcium-binding constants, for instance, of two isoforms of parvalbumin from the same species, and clearly indicates that the two sets of binding constants are for two different chemical entities. It also allows storage of any type of information about a mutant, eliminating the need to predict which types of information about a mutant will be useful. All of the isoforms and mutants of a given protein are associated by a common group identifier (the group_id). This allows all of the isoforms and mutants of a given protein to be grouped together for reporting purposes. For instance, the three human isoforms of caltractin each have a distinct prot_id, but share a single group_id with each other and any other isoforms or mutants of caltractin from various species that are stored in the database. When the protein home page for caltractin is generated, information from all of the isoforms and mutants from various species is included.

Information Access

All information that is stored in the relational database can be accessed both by searching the database and by browsing the web pages in the data library. The web pages that are supported by the database are regenerated via Perl scripts on a regular schedule. The maintenance of the browsable interface is an important design feature of the EF-Hand CaBP-DL because it allows users to find information even if they do not have a clear idea of what they are trying to find. However, the ability to directly search the underlying database is also important, because it allows users to associate different types of information about the various proteins together. We believe that as the amount of information in the data library grows, this ability will allow the identification of unexpected correlations among different properties of the proteins.

The browsable interface for the EF-Hand CaBP-DL is divided into four main sections: general information (which includes functional information), sequence information, structural information, and analytical tools. There is also a section of links to other web resources and a picture gallery, as well as a collection of information with limited access (this information is mostly unpublished work from the Chazin lab, and can only be accessed from within the University and by some of our collaborators).

The search interface for the relational database allows three types of searches:

Data Feeding

The data library will only be as useful as the information it contains. Therefore, it is imperative that its information content grows and is kept current. This is too large of a task for any one person or laboratory to undertake. Therefore, we have provided online forms for submitting new information to the data library. We hope that the larger community will help us to maintain and expand this resource by submitting information about EF-hand CaBPs to the database. Each piece of information that is submitted is reviewed by a library administrator before it is actually deposited in the database. Data integrity is also ensured by the requirement for the inclusion of the reference from which the information was obtained. Only data that has been published in a peer-reviewed article will be accepted into the database.

We welcome any ideas, suggestions, critiques, or comments about the current status of the data library and our future plans for it.

