Prepare data

Prior to drawing circos plot, you should prepare or import data for plotting. Circhart has five data types including karyotype (band) data, plot data, link data, loci data, text data. Each data type has different columns. The columns are generally separated by a white space.

Karyotype Data

The karyotype data defines the chromosomes and cytogenetic bands. It has seven columns: type, parent, name, label, start, end, color. The name column is an unique id for each chromosome or band. The name is very important, as the other data types must use this name to distinguish different chromosomes.

The description of each column

Column

Description

type

chr (for karyotype) or band (for band data)

parent

- (for karyotype) or chromosome name (for band data)

name

chromosome uniq id

label

chromosome or band label

start

start position

end

end position

color

color name

The karyotype data example:

chr - hs1 NC_060925.1 0 248387328 chr1
chr - hs2 NC_060926.1 0 242696752 chr2
chr - hs3 NC_060927.1 0 201105948 chr3
...

Note

Karyotype data is essential for creating circos plots. You must prepare or import karyotype data before drawing circos plots.

Import Karyotype Data

If you already have karyotype data, you can import data directly into Circhart. Go to File menu -> Import Data -> Import Karyotype Data, select a file to import karyotype data.

Note

The imported or prepared karyotype data will be assigned data type of karyotype.

Prepare Karyotype Data

If you don’t have karyotype data, you can prepare karyotype data.

  1. Go to Tools menu -> Prepare Data -> Prepare Karyotype Data to open karyotype data preparation dialog:

    _images/prepare_kdata.png

    Karyotype data preparation dialog

  2. Input a name for generated karyotype data.

  3. Select a genome and select some chromosomes.

    Note

    Generally, genome file may have many unplaced sequences that we don’t want to be used for plotting. You can select only the complete chromosomes or chromosomes you desired.

  4. Input an uniq chromosome name prefix. e.g. hs for human, mm for mouse, or you can also simply use chr for single genome. The circhart will use the this prefix to generate new name for each chromosome. e.g. hs1, hs2, hs3.

  5. Click OK button to generate karyotype data based on selected chromosomes.

View Karyotype Data

You can click a karyotype data in Data List to view the karyotype data.

_images/kdata.png

View of karyotype data

Edit Karyotype Data

Circhart allows you to edit the data in columns name and color. Double-click the cell to change name and color.

The karyotype color also can be changed using following methods:

  • Go to Edit menu -> Karyotype Color -> Set to Default to change colors to default colors.

  • Go to Edit menu -> Karyotype Color -> Set to Random to change colors to random colors.

  • Go to Edit menu -> Karyotype Color -> Set to Single to change all colors to a single color.

Band Data

The band data has the same data format with karyotype data. The band data was generally put into karyotype file. Circhart also allows you to import or prepare band data separately.

The band data example:

band hs1 p36.33 p36.33 0 1735965 gneg
band hs1 p36.32 p36.32 1735965 4816989 gpos25
band hs1 p36.31 p36.31 4816989 6629068 gneg
band hs1 p36.23 p36.23 6629068 8634052 gpos25
band hs1 p36.22 p36.22 8634052 12044143 gneg
...

Import Band Data

If you already have band data, you can import data directly into Circhart. Go to File menu -> Import Data -> Import Band Data, select a file to import band data.

Note

The imported or prepared band data will be assigned data type of banddata.

Prepare Band Data

If you don’t have band data, you can prepare band data. Before preparing band data, you should get genome cytobands.

  1. Go to Tools menu -> Prepare Data -> Prepare Band Data to open band data preparation dialog:

    _images/prepare_band.png

    Band data preparation dialog

  2. Input a name for generated band data.

  3. Select a karyotype data.

  4. Select imported genome bands.

  5. Click OK button to generate band data.

Plot Data

The plot data has four required columns (chrom, start, end, value) and on optional column (options). The plot data is used to plot line, scatter, histogram and heatmap tracks.

The description of columns in plot data

Column

Description

chrom

chromosome name

start

start position

end

end position

value

integer or decimal

options

plot options, usually empty

Plot data example:

hs1 1000 2000 0.546
hs1 2000 3000 0.423
hs2 4000 6000 0.379
...

Import Plot Data

If you already have plot data, you can import data directly into Circhart. Go to File menu -> Import Data -> Import Plot Data, select a file to import plot data.

Note

The imported or prepared plot data will be assigned data type of plotdata.

Prepare Plot Data

If you don’t have plot data, you can prepare plot data. Circhart can prepare different plot data using different data resource. Circhart supports calculating distribution data using both tumbling window (fixed window without overlap) and sliding window (fixed window with overlap).

Prepare GC Content Plot Data

GC content preparator can help you to calculate GC content within windows.

  1. If no genome data, Go to File menu -> Import Data -> Import Genome File to import a genome.

  2. Go to Tools -> Prepare Data -> Prepare GC Content Data to open GC content preparation dialog.

    _images/prepare_gc.png

    GC content preparation dialog with tumbling window

  3. Input a name for generated GC content data.

  4. Select an imported genome.

  5. Select a karyotype data.

  6. Select tumbling window or sliding window.

    _images/prepare_gc2.png

    GC content preparation dialog with sliding window

    Note

    If you select using sliding window, you should also set the step size. Step size should < window size.

  7. Click OK button to generate GC content data.

Prepare GC Skew Plot Data

GC skew preparator can help you to calculate GC skew within windows.

  1. If no genome data, Go to File menu -> Import Data -> Import Genome File to import a genome.

  2. Go to Tools -> Prepare Data -> Prepare GC Skew Data to open GC skew preparation dialog.

    _images/prepare_gcskew.png

    GC skew preparation dialog

  3. Input a name for generated GC skew data.

  4. Select an imported genome.

  5. Select a karyotype data.

  6. Select tumbling window or sliding window.

  7. Click OK button to generate GC skew data.

Prepare Density Plot Data

Density preparator can help you to calculate the number of features from genome annotation file (gtf/gff), the number of variations from vcf file, or the number of regions from bed file winthin windows.

  1. Go to Tools menu -> Prepare Data -> Prepare Density Data to open density preparation dialog.

    _images/prepare_density.png

    Density data preparation dialog

  2. Input a name for generated plot data.

  3. Select a karyotype

  4. Select source data type according to your imported data.

  5. Select source data.

  6. If Genome annotation (gtf or gff) seleted, you should also select a feature.

  7. Click OK button to generate GC skew data.

Text Data

The text data has the same columns with the plot data. The only difference is that the value column contains text instead of numbers. The text data is used to plot text track.

Text data example:

hs1 144134 146717 SEPTIN14P14
hs1 148562 152332 CICP3
hs1 372945 388041 NOC2L
...

Import Text Data

If you already have text data, you can import data directly into Circhart. Go to File menu -> Import Data -> Import Text Data, select a file to import text data.

Note

The imported or prepared text data will be assigned data type of textdata.

Prepare Text Data

Circhart allows you to extract features as text data from genome annotation file (gtf or gff).

  1. If no annotation data, Go to File menu -> Import Genome Annotation to select a gtf/gff annotation file to import.

  2. Go to Tools menu -> Prepare Data -> Prepare Text Data to open text data preparation dialog.

    _images/prepare_text.png

    Text data preparation dialog

  3. Input a name for generated text data.

  4. Select a karyotype data.

  5. Select a feature.

  6. Setect an attribute you desired as text value.

  7. Optionally, you can check “Only extract records whose attribute value in below list” to input attribute values (one value per line) to extract matched features.

  8. Click OK button to generate text data.

Loci Data

The loci data has three required columns: chrom, start, end and one optional column: options. Each row defines an interval in a chromosome. The loci data used to plot tile, connector and highlight tracks.

Loci data example:

hs1 144134 146717
hs1 148562 152332
hs1 372945 388041
...

Import Loci Data

If you already have loci data, you can import data directly into Circhart. Go to File menu -> Import Data -> Import Loci Data, select a file to import loci data.

Note

The imported or prepared loci data will be assigned data type of locidata.