A Python Package for Analysis of Copy Number Variation

baseqCNV is a toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with Whole Genome Sequencing (WGS) data for both bulk and single cell experiments.

The copy number is based on the reads counts per genomic region. The region are predefined to exclude and discount the low complexity parts. For each sample, the following samples should be provided.

Pipeline Steps

The whole pipeline can be devided five steps. The main working loads including align, bincounting and nomralization which can be performed at local server. The resulting file (“bincounts_norm.txt”) can be uploaded to for further genome segementation and visualization.


Reads Alignment using Bowtie2 (run in Local Server);

2:Bin Counting

Counting the unique mapped reads in each bins (run in Local Server);


Normalize by GC content (run in Local Server);


Circular Binary Segmentation (CBS) for partitions a genome into segments of constant total copy numbers the similar bins (run in Web Server). It is based on a R package DNACopy (


Visualization, generating the CNV distributions along the whole genome (run in Web Server);


At first, Python3 is required (version >=3.6).


  • Bowtie2: For alignment of raw sequencing reads; ()
  • Samtools: For tansforming and manipulating bam/sam files from aligner (Version >=1.9);


  • bowtie2_index: The path to the bowtie2 indexed genome references;
  • dynamic_bin: Genome bins in ~50Kb, the duplication or low complexity regions are excluded;

Download The dynamic_bin files:

Homo Sapain (hg19):

Mus Musculus (mm10):


The paths of all the dependencies should be written to a config file (name as config.ini, for example):

bowtie2 = /path/to/bowtie2
samtools = /path/to/samtools

bowtie2_index = /path/to/bowtie2_index/hg19
dynamic_bin = /path/to/hg19.dynabin.txt


The bowtie2_index path is the prefix of a set of files. For example, if it is set as “/path/to/bowtie2_index/hg19”, there should be files like “hg19.fa/hg19.1.bt2/hg19.2.bt2/…” under the folder: “/path/to/bowtie2_index”.


To install baseqCNV, simply use pip:

pip install baseqCNV

Usage at local server

The pipeline includes three steps at local server.

#It need one fastq file, for pair-end data, pair-end 1 file is OK.
#The path of the sequencing file should be specified after "-1".
#The path of configuration file shoule be specified after "-c".
#The genome nama or version should be specified after "-g".
baseqCNV align -1 Tn5_S1.fq.gz -c config.ini -g hg19

#The aligned bam file should be specified after "-i".
#The path of configuration file shoule be specified after "-c".
baseqCNV bincount -g hg19 -i ./baseqCNV.bowtie2.sort.bam -o bincounts.txt -c config.ini

#Normalize (The resulting file can be uploade to websrever for visualization)
baseqCNV normalize -g hg19 -i ./bincounts.txt -o bincounts_norm.txt -c config.ini

Web-based Visualization

The normalized bincount file can be uploaded to our webserver for CBS and visualization.

Here is an example of a normalized bincount file:,you can try it.