Bioinformatics Projects
All scripts that I developed are by contract proprietary to my employer, and therefore I cannot share any of the codes here. My apologies.
Synteny and Collinearity Search Tool
The script was called MC-Scan, developed by my friend and colleague Haibao Tang. I was only a very small part of the development, but it is the first real bioinformatics project that lead me into the realm. This script is based on Collinear-Scan that scans large and complex plant genomes for collinear segments. The brilliance of this package is its ability to scan for one-to-many segment relationships between two genomes, making it extremely powerful in detecting ancient paralogous sequences and catching paleopolyploidy. In fact, it independently caught the Arabidopsis alpha genome triplication event before the grape genome was published. The previous sentence may not make much sense to many, but if you are in the field, you know it's a big deal.
Species-specific gene identification pipeline
This is a collection of interconnected python scripts. The purpose of the pipeline is the identify the best genomic regions that distinguishes two groups of species by scanning through all segments of their genomes. The segments or genes that are conserved within each group of organisms but differ appreciably between the two groups of organisms are deemed desirable. The pipeline evaluates the usefulness of each segment.
Sequencing result identification pipeline
This is a web tool developed using PHP, Python and MySql. The basic functionality is similar to NCBI web BLAST, but with many additional features such as user-provided database annotation file, tracking and archiving of input sequences, ability to auto-trim and convert ab1 files, read sample information from a tab file and automatic archiving of results into database.
Multiplexed Oligonucleotide Thermo-quality Filtering System
This is essentially a system to design primers and probes. There are commonly used primer design tools since the beginning of days, many of which are quite robust and successful already, such as Primer3. However, none of them quite fits the specific requirements for our assays. Our team at Nanosphere developed an in-house script package that dynamically fits the needs of our different design requirements. It take into consideration specificity, inclusivity, thermo-dynamics, multiplexing compatibility in much detail, which ensured a high success rate in our oligo designs.
NanoDB: Nanosphere Biological Data Warehouse
This is an on-going large project aimed to create a database driven platform that meets all the design and analysis needs in the company. It includes a database schema similar to that of GenBank, a web portal for lab access and data archiving, a collection of perl and python scripts that draws data from the database, execute analysis and archive results.