Wednesday, April 6, 2016

Dropbox crash on Ubuntu 15.10 and what to do about it

My dropbox installation recently stopped working on my Ubuntu laptop. Starting from the icon did nothing and starting from the command line produced the following cryptic output:

bjorn@bjorn-ThinkPad-T450s:~$ dropbox start
Starting Dropbox...Traceback (most recent call last):
  File "/usr/bin/dropbox", line 1535, in <module>
    ret = main(sys.argv)
  File "/usr/bin/dropbox", line 1524, in main
    result = commands[argv[i]](argv[i+1:])
  File "/usr/bin/dropbox", line 1395, in start
    if not start_dropbox():
  File "/usr/bin/dropbox", line 732, in start_dropbox
    stderr=sys.stderr, stdout=f, close_fds=True)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 8] Exec format error

Reinstalling dropbox had no effect, but after some searching I found this post at askubuntu.

They recommended removing the .dropbox-dist folder in the home directory.
I did that, and then issued the following command:

bjorn@bjorn-ThinkPad-T450s:~$ dropbox start -i
Starting Dropbox...Done!

A dialog is initiated informing the user that the Dropbox daemon is being downloaded. After that Dropbox seems to work normally.






Tuesday, February 2, 2016

Kryddor på svenska, engelska och portugisiska

Jag har gjort en tabell med kryddor på svenska, engelska och portugisiska. Kan vara till hjälp om man vill laga svensk mat i portugal. Länkarna går till wikipedia på respektive språk.

Svenska English Português
Kryddpeppar Allspice Pimenta dioica, Pimenta-da-jamaica
Kryddnejlika Clove Cravo-da-índia
Kanel Cinnamon Canela
Kardemumma Green or True Cardamom Cardamomo-verdadeiro
Ingefära Ginger Gengibre
Sirap Treacle Melaço
Pomerans Bitter orange Laranja-azeda
Dill Dill Endro, Aneto
Spiskummin    Cumin                                Cominho 
Basilika           Basil                                   Manjericão-de-folha-larga
Oregano           Oregano                             Orégano
Mejram            Marjoram                           Manjerona
Saffran             Saffron                               Açafrão
Gurkmeja         Turmeric                            Açafrão-da-terra, Açafrão-da-índia
Paprikapulver  Paprika                               Paprica / Colorau / Pimentão-doce
Grönmynta      Spearmint                           Hortelã-verde
Pepparmynta   Peppermint                         Hortelã-pimenta

N.B. Kummin och spiskummin är inte samma sak.

Dill (Endro) i Portugal förväxlas ofta med funcho vilket är fänkål på svenska. Ibland står det t.om. Endro på prislappen, även om det är fänkål.

Många i portugal kan inte skilja på saffran och gurkmeja, troligen beror detta på att saffran inte används i traditionell matlagning.

Jag har aldrig hittat riktig sirap i Portugal (Melaço). Det behövs t.ex. till pepparkaksdeg. Det finns dock en slags sirapsliknande goja som heter "Caramelo" som funkar lika bra (se bilden nedan). Finns på alla varuhus.



Caramelo!



  


Monday, February 1, 2016

Checksums for circular biological sequences

The content of this post is obsolete, go to www.seguid.org for updated information.

 

The SEGUID checksum

Data pertaining to biological sequences such as DNA, RNA and protein are often stored and transferred in electronic form.

This information is often stored simultaneously on several locations while people are working on or with them. This is a potential source of error, since it is easy to introduce small errors that are hard to spot. 

Biological sequences are essentially information, so cryptographic checksums can be used to verify the integrity of sequences just like any other type of information.

Cryptographic checksums have been implemented for protein sequences to provide a stable identifier that only depends on the primary sequence. 

A checksum called the SEquence Global Unique IDentifier (SEGUID) was suggested by Babnigg and Giometti 2006. SEGUID is the Secure Hash Algorithm 1 (SHA-1) checksum calculated on the biological  sequence in uppercase and displayed using the base64 encoding.

The SEGUID identifier has been used to create translation tables between different databases holding the same sequences but typically using different id numbers or ways to identify the sequence.

At 27 characters, the checksum is relatively short. The SEGUID for the DNA sequence Gattaca is:

tp2jzeCM2e3W4yxtrrx09CMKa/8

The SHA-1 algorithm has been broken, meaning that so called "hash collisions" can be constructed given enough time and resources. A hash collision means that the same checksum two different pieces of information gives the same checksum.

No hash collisions has been reported for the SHA-1 algorithm by accident and widely used softwares such as GIT still use SHA-1. Alternatives such as the more secure SHA-3 produces quite a bit longer checksums, and are for this reason less readable. 

The url-safe uSEGUID checksum

Unfortunately the original SEGUID checksum used the original Base64 encoding. This encoding contains the "+" and "/" characters. For instance the DNA sequence CAGG gives the SEGUID:

uZdvA+J+luF/IK4TAj+GBTMz688

The backslash "/"and the "+" prevents the use of the checksum in URLs or as a part of a filename on most operating systems. There is an alternative Base64 encoding cells Base64url that substitutes "/"and "+" for "_" and "-" solving this problem. The uSEGUID (short for "url safe SEGUID") is defined as the SEGUID checksum but with Base64url encoding. The uSEGUID for CAGG is:

              uZdvA-J-luF_IK4TAj-GBTMz688

It is worth noting that the uSEGUID and SEGUID checksums can be constructed from each other by two character substitutions.

The cSEGUID checksum for circular sequences

The same storage and transmission problems that apply to protein sequences also apply to circular DNA sequences, such as plasmids. However, the uSEGUID checksum is not directly useful for circular DNA sequences, since there is up to 2n unique and equivalent representations for a double stranded circular sequence of length n.

For example, if we consider the circular double stranded 6 bp DNA sequence AGCCTA, the twelve sequences below are equal representations:

AGCCTA    TAGGCT
GCCTAA    AGGCTT
CCTAAG    GGCTTA
CTAAGC    GCTTAG
TAAGCC    CTTAGG
AAGCCT    TTAGGC

In my own line of work, I often have to evaluate plasmids constructed in-silico by students. A unique checksum for the correct sequence would make it easier to do this.

For this reason I developed the cSEGUID checksum as a general solution for this problem. The cSEGUID is defined as the uSEGUID checksum calculated from the lexicographically smallest rotation of any rotation of the sequence itself or its reverse complement. The smallest rotation oAGCCTA is marked in blue above.

The cSEGUID for the DNA sequence AGCCTA is:

            OQ1RGvO0Y6C-zYuUxVjE84O-yvI

This is also the uSEGUID of AAGCCT which is the smallest rotation of the sequence.

The cSEGUID is guaranteed to be as unique as the uSEGUID since a circular string that is not a concatenation of two substrings is guaranteed to have only one smallest rotation.

The lSEGUID checksum for linear double stranded DNA sequences

For completeness there should be a checksum definition for linear double stranded DNA molecules. These can take several shapes as they come with either blunt or staggered ends:
                                                                                                                                    
Molecule                 Repr #1          Repr #2

blunt dsDNA:             GATT             AATC
                         CTAA             TTAG

5' overhang:            aGATT            aAATC             
                         CTAAa            TTAGa

3' overhang:             GATTa            AATCa
                        aCTAA            aTTAG

5' and 3' overhang:     aGATTa            AATC
                         CTAA            aTTAGa

3' and 5' overhang:      GATT            aAATCa
                        aCTAAa            TTAG

The table above describes five different double stranded DNA molecules. The two columns contain equivalent representations of the molecules. A checksum for linear DNA sequences should give different values for each of the molecules, but should be the same for each representation. I have defined a checksum called lSEGUID that fulfils these criteria.

The lSEGUID checksum for a blunt DNA sequence is defined as the uSEGUID checksum of the lexicographically smallest of the upper (watson) or lower (crick) strands. For instance, the lSEGUID for the molecule below is the uSEGUID for AACT, since this is lexicographically smaller than GATT.

Molecule                 Repr #1          Repr #2

blunt dsDNA:             GATT             AATC
                         CTAA             TTAG

For DNA sequences that are not blunt, the algorithm starts by selecting the  lexicographically smallest representation. The smallest representation of the staggered molecule below has been marked in blue.

Molecule                 Repr #1          Repr #2

5' overhang:             aGATT            aAATC             
                          CTAAa            TTAGa

Starting from the smallest representation, The lSEGUID checksum it defined as the uSEGUID checksum of a string concatenation called "repr" below with the following definition:

repr = chr(65)*upper_overhang+watson+chr(10)+chr(65)*lower_overhang+crick

The string above can easily be printed in any computer language to produce the representation. Chr(65) is the ASCII single white space character " " and chr(10) is the ASCII line break "\n". The upper_overhang and  lower_overhang are integers describing the number of white spaces needed in order to produce the correct stagger. 

For the molecule below, upper_overhang is zero and lower_overhang has has a value of one.

                   aAATC             
                    TTAGa

For the molecule above, the repr string becomes:

                    "aAATC\n TTAGa"

The uSEGUID of "aAATC\n TTAGa" is:  

            zjuf6OAJQNP1nSAUtAnSOHi5BOA


Implementations of uSEGUID, cSEGUID and lSEGUID are available from the pydna Python package in the pydna.utils module

For uSEGUID, cSEGUID and lSEGUID of a blunt DNA molecule, there is also a standalone seguid calculator software and an online app.








Thursday, October 29, 2015

Connecting Dropbox to Blackboard

The Blackboard LMS has functionality to upload files for the students, such as slides or laboratory protocols. Unfortunately the Blackboard user interface is not very user friendly. In fact I agree with the assessment of this teacher.  The amount of pointing and clicking in order to upload files feels very 1995.

For this reason I have started using Dropbox for distributing files to my students. This blog post is meant to show how. I will assume that you already have Dropbox installed on your computer, otherwise head over here and get yourself an account (they are free up to some level of storage). There are other similar services to Dropbox that would work as well, but I have not used them.

In this example I am teaching a course called "Biologia Molecular Aplicada" or BMA for short.

I first create a folder called BMA15 in my public Dropbox. The public Dropbox is called "Public" and is located inside the main Dropbox folder. New installations may not have this folder activated by default, but this link shows how it is activated.

After creating a Dropbox folder called "BMA15" in the Public folder, Click on the Share box:



You will see the dialog below:


It is important that the permission is set to "Anyone with the link can see it", if not change the permissions. I have edited out some of the URL in this image, and there is an important reason for this which I will be discussed below.

Copy this link to folder and head over to Blackboard.

Create a new content item of the type "web link" ("Link de Web"). 

Paste the link in the URL window and name the link for example "Course Files". I also put some text in the description field. Click "send" or "Enviar" when you are done.


The link visible to the students will look like this:

Clicking on the link will take them to the Dropbox site which shows the folder structure of the BMA15 folder. In the BMA15 folder, I have created one folder for each lesson named by date, but this is optional.


These folders have downloadable files that are reflected by the Dropbox folder on my computer. I have three files in the 2015-10-01 folder:



These files are also present as local files on my computer:



Now I can simply add folders and files in the BMA15 folder and these will be almost instantly visible to the students.

One of the advantages of this workflow is that changes to files will be uploaded to Dropbox without any extra user input. Great for correcting errors in the material.

Now, remember the URL I edited out above. I did this because the folder is accessible for anyone who has the link, and I would like to restrict access to my students. In theory, Google may index the content in the folder, but if there are no links pointing to the folder, Google will not find it. The only link to the folder should be in the Blackboard content area, which is only accessible to students. 

You should not create public web links to the content, or the files may show up in Googles search results. I usually remove the course folder from the Public folder after the course is over.















Tuesday, October 27, 2015

How to embed a google calendar in Blackboard

We use Blackboard for organizing teaching material at our University. Blackboard has a built it calendar, but this is quite difficult to use.

I use google calendar, both professionally and privately. I will show how to embed a google calendar in Blackboard.

Start by creating a new calendar for the course in your Google account:


Click on the "Customize the color, size, and other options" link in the middle of the page.
On the next page, select useful options such as:

  • Default view "week"
  • Week starts on Monday
When you are done, click on the "Update HTML" button on the upper right corner of the page.
Then copy all text in the window like so: 



Go to Blackboard and create a new content item ("conteúdo" in Portugese). Name this "calendar" or something similar. Click on the "Show HTML code" button ("exibição de código HTML")


Paste the text from the google calendar in the HTML window. 



Then click "Atualizar". You will see the screen below afterwards:

Click on "Enviar" and you should be done!


The calendar is dynamic and will be automatically updated by updating the google calendar.




Monday, April 13, 2015

How to create a new mime-type for IPython notebook files in Ubuntu

Installing IPython adds a very useful format for scientific calculations called the IPython notebook. These are text files with JSON (JavaScript Object Notation) code with the .ipynb file ending. It makes sense to open these with the IPython notebook, but under Ubuntu, they are opened by the default text editor (Gedit).

I would like to be able to open them by double clicking in the file manager and also having a nice icon for the files.

To achieve this I needed to create a new mime-type for .ipynb files

Creating a new mime type 

There are many ways to create mime-types in Ubuntu, but this is what I did. I used this blog post by termueske as a guide.

I created a text file called ipynb.xml on my desktop with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'>
<mime-type type="application/x-ipynb+json">
<comment>IPython Notebook</comment>
<glob pattern="*.ipynb"/>
</mime-type>

</mime-info>

I then opened a terminal in desktop and typed the following command:

bjorn@bjorn-UL30A:~/Desktop$ xdg-mime install ipynb.xml --novendor

If there is no response, everything is ok. Then I updated the mime database:

bjorn@bjorn-UL30A:~/Desktop$ update-mime-database /home/bjorn/.local/share/mime/

My username is bjorn and my home folder is /home/bjorn/ so the path above has to be changed for other user names.

That should be it. Check properties of a .ipynb file to see the new mime type.

Obtaining a nice icon

I downloaded "ipython-3.1.0.zip" from https://pypi.python.org/pypi/ipython/3.1.0

inside the zip file (/ipython-3.1.0/docs/resources/) there is a nice svg icon called "ipynb_icon_512x512.svg"


Adding an icon the the new mime type

I renamed the icon file above to:

application-x-ipynb+json.svg

This is the mime-type in the ipynb.xml file above, but with the slash substituted by a minus sign (dash).

Now put this file where you file manager can find it. I tried with:

/home/bjorn/.local/share/icons/hicolor/scalable/mimetypes



I had to create the sub folders /scalable/mimetypes/ 

Then I simply turned off my file manager in the terminal:

bjorn@bjorn-UL30A:~/Desktop$ nemo -qq

I use Nemo, but if you use Nautilus, type instead 
nautilus -qq

Then restart Nemo using the "Files" icon in the unity launcher.

This worked well for me, this is a notebook on my desktop:



Right clicking on this file gives:


Why?

Why did I choose /home/bjorn/.local/share/icons/hicolor/scalable/mimetypes
as the location for the icon?

Trial and error!

They say here that the "hicolor" icon theme is a fallback theme for icons. I looked in  /home/bjorn/.local/share/icons/ and I had only a folder there called hicolor. I wanted to put the icon in my home folder (see below).

Icon themes have an index.theme file that determines where they look for the actual icon files. This file is located in /usr/share/icons/hicolor 



There is an entry in this file called "Directory=" which contains all icon lookup directories. This is a very long comma separated list on my computer.

As my icon is a scalable (svg) I looked for and found the entry "scalable/mimetypes", so it would make sense to use this location. 

I didnt want to put the icon in /usr/share/icons/hicolor/scalable/mimetypes since I would loose the icon when I upgrade or reinstall Ubuntu.

Apparently /home/bjorn/.local/share/icons/hicolor/scalable/mimetypes works just as well.

Hope this can help !

Other useful resources:










Monday, January 12, 2015

Inserting plain text references in Paperpile

I used to use zotero for reference management, but now I use paperpile since it is a bit more stable. One peculiarity of paperpile is that it only provides plug-in for google docs (an advantage in my opinion) since it is easier to collaborate over google docs.

I often encounter having to insert references from a formatted paper in some document format like ms word where the original dynamic references inserted by a reference manager was lost. Doing this manually is very tedious.

There is a work around for this that worked very well when I used it:

and paste the reference list in the window of the crossref site, which returns nicely formatted references.

on the output and uploaded the resulting bibtex file to Paperpile.

Very easy!