Tutorial: Co-expression Clustering
This brief tutorial uses a cancer microarray expression dataset to demonstrate AutoSOME gene co-expression clustering.
1) If you have not already done so,
2) Download and save the example
cancer line expression dataset
(primary dataset: Alizadeh et al. (2000) Nature, 403:503)
3) Select the
button to launch a file browser. AutoSOME accepts three input formats for microarray datasets. In this example, we are using a simple table (gene identifiers on the y-axis, cell lines on the x-axis, and expression values inside). However, AutoSOME can parse two widely-used microarray file formats: PCL and Series Matrix. The former is standard input for the Cluster software tool (Eisen et al. (1998) PNAS 95:14863) while the latter is made available for microarray datasets housed at the Gene Expression Omnibus at NCBI. Please see
for more details.
4) After loading the dataset, expand
. Although we apply several normalization procedures in this example to obtain maximal smoothing of the dataset (see Figure below), some or all of these strategies are inappropriate at times. See
for tips on normalization.
5) Run AutoSOME with all remaining parameters set to their default values (reduce the number of CPUs if your system becomes unacceptably bogged down; see
for additional help).
6) Partial heat map of cluster 1 (clusters are ordered from top to bottom by decreasing size).
7) To adjust contrast in the heat map, go to View>settings>image settings. For example, scroll the contrast bar to 0.3 (press Update if the screen doesn't automatically refresh). Alternatively, precise contrast adjustments can be made by selecting
Manually adjust range for contrast
and changing the maximum and minimum values (followed by pressing
8) To search for a particular gene (or identifier) of interest, select Search>Find. This will launch a search window. Find the gene of interest in the list or type it in, then press Submit. The cluster (or singleton) containing your gene will be highlighted, and your gene will be colored yellow in the heat map display (in the current implementation, you may need to scroll down to find it).
9) A useful tool for identifying the most prominent clusters in the dataset is the
filter located in the bottom-right corner of the output window. Cluster confidence is calculated over all AutoSOME ensemble iterations and represents the affinity of each data point for its assigned cluster (100=maximum affinity). Thus, filtering by confidence provides a rapid way to increase the signal to noise ratio. For example, enter Confidence=50 and press
. The resulting clusters of co-regulated genes are shown below as a heat map.
10) For missing value treatment, cluster editing tools, and many additional important features, please see the