For this example, I am going to assume that you have a text file open in the browser. In particular, I'm going to assume the text file consists of a listing of two genes sequences in FASTA format, which is actually very simple: FASTA consists of a header line that starts with the > (greater than) angle bracket, followed by one or more lines of AGTC base sequence data (if it's a nucleotide sequence) or a bunch of letters (KPRMIV etc.) if it's a protein sequence. For example:
>M. leprae MBLr_2224... <-- this is the header ATGGCGGTGCTGGATGTC... <-- this is the data
text = document.body.textContent;
Execute this code with Control-L in Firefox or by simply entering a carriage return in Chrome. (If Firefox fills your Scratchpad with text, bracketed by /* and */, that's good; it means the code worked. Delete it and proceed.)
Suppose you have several FASTA records in a row and you want to have an array of gene data. The easy thing to do is "split" the records at each header, discarding the header:
r = />[^\n]+\n/g;
genes = text.split( r );
NOTE: To enter more than one line of code in the Chrome console, you have to hold the Shift key down before hitting Enter. Otherwise, Enter executes the code.
The first line defines a regular expression (r) for the pattern: "greater-than symbol followed by one or more non-newline symbols, followed by a newline." (The caret symbol ^ means to negate whatever's in between the square brackets.) When this code executes, genes will be an array of data, but because of the way split() works, the first item (item zero) in the array will be empty, so get rid of it with:
Now the genes array will contain gene data. The data for gene No. 1 will be in gene, the data for gene No. 2 will be in gene, etc.
Incidentially, if you want an array of headers, just do:
headers = text.match( r );
No need to do headers.shift(). The match() operation creates an exact array.
If your genes are aligned, you can compare them, base to base, in a loop. (If your genes are not aligned, create an alignment using an online ClustalW alignment tool or using the popular Mega6 program.) The following loop construct compares the first 300 bases in two genes, and tallies the differences according to whether the difference occurred in codon base one, base two, or base three:
snp=[0,0,0]; // array to hold base 1,2,3 results
gene1 = genes;
gene2 = genes;
for (var i=0; i < 300; i++)
snp[ i % 3 ] += gene1[i] != gene2[i];
You can display the results in the console simply by adding (on its own line) snp; or console.log(snp). Or if you want to see it in a dialog, execute alert(snp).
The final line of code deserves explanation. Results are placed in the snp (single nucleotide polymorphism) array according to whether the "hit" occurred in base 1, base 2, or base 3 of a codon. The i % 3 construct (i modulo 3) means divide i by 3 and throw away the "answer" but keep the remainder. (So for example, 5 % 3 equals 2, 6 % 3 equals zero, 7 % 3 equals 1, etc.) As i increments, i % 3 simply takes on values of 0, 1, 2, 0, 1, 2, etc.
"abcdefabcdef".match( /abc/ )
will only match the first occurrence of abc, whereas
"abcdefabcdef".match( /abc/g )
will match both occurrences of abc and give you an array of matches.