What does CoMet 2 do?
The CoMet 2 server provides a platform for comparative metagenomics based on the analysis of protein domain signatures. CoMet 2 implements a complete pipeline for taxonomic, functional and metabolic profiling of metagenomic sequences using NCBI taxonomy, Pfam domain counts and KEGG pathways. You can upload large multi-FASTA sequence files to be processed by the CoMet engine for ultra-fast protein domain detection. By means of statistical tools you are able to compare your metagenome of interest to precomputed profiles from a broad variety of publicly available data. Here, metagenomes with similar profiles are automatically identified and can then be used for a detailed comparison.
CoMet 2 user account management
The complete functionality of CoMet 2 can be accessed without any login data, in this case you simply select “Use CoMet 2 without login” from the home page and all your data will be stored in a browser session. To be able to access your analysis results at later time points, you will be provided with a unique URL pointing to the result page associated with your job. Additionally, CoMet 2 provides personalized access using an email address and a password. In this case, your job history, uploaded data and analysis results will be available through the “My CoMet“ tab on top of the CoMet 2 web site. To login with email address and a password you first have to register an account.
Getting started
At the homepage, CoMet 2 provides a simple overview of possible analysis tasks. From here, you can directly access the data upload and data set description page for taxonomic and functional analysis of your metagenome, including a preconfigured example job using a small metagenomic data set. Furthermore, the comparison module can be accessed to choose a subset of user-supplied and pre-processed metagenomes for analysis of statistically significant differences. Finally, the results of the previously performed analyses and an example output are available.
Functional and taxonomic profiling of your metagenome
Data upload and data set description
For profiling analysis of your metagenome, the CoMet 2 web server accepts as input a DNA sequence file in multiple FASTA format. Note that the file may be compressed in the “ZIP” format for faster upload. Several text fields allow you to characterize your metagenomic data set for storage in your personal CoMet metagenome database. After choosing a file and describing your data set, press the ‘Upload metagenome’ button to start the profiling process. While your data is processed in the background, you will see a summary of your data set along with some statistics. Further tabs with profiling results will appear as soon as these are available (see also below).
Comparison of multiple metagenome profiles
The comparison of your metagenome profile with more than thousand precomputed profiles in the CoMet database enables a fast identification of similar samples and the retrieval of information about the corresponding metagenome projects. For a further comparative analysis you can select a subset of interesting metagenomes that may include previously processed samples of your own and obtain an in-depth statistical comparison of protein domain counts. Gene set enrichment analysis according to Gene Ontology (GO) terms can then be used to highlight possible relationships between significantly differing protein families.
Identification of similar metagenome ("neighbors")
CoMet 2 allows you to identify metagenomes in the CoMet database which are similar to an uploaded metagenome in terms of the Pfam profile. For this purpose, a dimensionality-reduced (2d) map of reference metagenomes is calculated via unsupervised kernel regression (UKR). The “Distances to reference metagenomes“ tab within CoMet 2 provides access to this map, which shows this 2d map of reference metagenomes colored according to different habitats. Here, the ten nearest neighbors in the Pfam profile space are highlighted with special marker symbols. A list with the names and information about the associated metagenomes is shown below the map. Form here, a comparative analysis job can be configured (see below).
Comparative analysis job configuration
Results
CoMet 2 provides a broad variety of easily interpretable graphical output and tabular data for download and offline analysis. In brief the output comprises downloadable plain text files with Pfam domain assignments and Pfam/GO term frequency information for all samples. Furthermore, the CoMet result page provides figures and links for the results of the abovementioned statistical analysis.
Metagenome profiling results
-
Domain hits
Here you can find some basic statistics about the files you uploaded. The first column denotes the file ID that is used by CoMet to reference the files in some figures. The second column contains the original file name, followed by the number of sequences in the original file, the number of sequences with Pfam domain hits and the total number of domain hits in the file.
-
Top 10 detected Pfam families
In the result page associated with a particular input file you can find a list of the ‘top ten’ Pfam domain families containing domain families ranked according to their number of hits in the file. -
Downloads
The download section provides plain text files with all input file-specific detected Pfam domain assignments and Pfam/GO term frequencies. Furthermore, you can find all figures from the result page in EPS format here.
Comparative analysis results
-
Distribution of significant Pfam families
CoMet performs a statistical analysis across all selected samples to determine the number of significant Pfam domain hits in a file. The result is displayed in matrix form (text and figure), whereby the total numbers of file-specific significant domain hits are represented by the diagonal elements of the matrix figure, e.g. the number of significant domain families in the first file can be taken from the upper left corner of the matrix.
Furthermore, for each pair of files CoMet calculates the number of significantly different domain families by one-sided statistical tests. The number of significantly different domain families for a particular pair can be taken from the element in the lower and upper triangle part of the matrix that is associated with the two files. The matrix in text form contains links that, when clicked, will open a new browser tab with the list of Pfam families and associated p-values. In the subjacent matrix figure, the background color of a matrix element represents its associated number on a scale from blue (low) to green (medium) to red (high). -
Most variable Pfam families
After the statistical analysis, CoMet 2 determines for each Pfam domain family the variation across the samples in terms of the distribution of p-values. The bar chart shows the relative frequency for a particular Pfam domain family in all uploaded files, whereby the domain families are sorted by variation. -
GO term enrichment
The (Gene) Set Enrichment Analysis provides a framework for the identification of significantly enriched GO terms based on the associated Pfam families and their p-values in the pairwise comparison. For each GO term, a p-value and the corresponding familywise error rate (FWER) are calculated utilizing a rank-sum test. In this test, the p-values of Pfam families associated with a particular GO term are compared to the p-values of Pfams which are not associated with this term. -
Clustering dendrogram
If more than two files are selected, CoMet 2 can hierarchically cluster the input data using the UPGMA algorithm. The resulting dendrogram reflects the order in which the files are grouped according to their pairwise distance in terms of the number of significantly different Pfam domain families. If groups are sufficiently unrelated, they are displayed using different colors for their associated dendrogram branches. -
Multi-dimensional scaling
For data sets of three or more files CoMet 2 also performs a multi-dimensional scaling (MDS) of the data. Again, the basis of the computation are the pairwise distances between the input files in terms of the number of differing Pfam families. On the CoMet 2 server, the principal coordinates are calculated using a metric MDS approach. In the MDS plot, the different input files are symbolized using various colors and symbols according to the legend on the right-hand side of the plot.
For comments or questions related to CoMet please send an e-mail to comet2@gobics.de. We also appreciate feature requests and bug reports :)