Variational Autoencoder (VAE) explored with PyTorch
A variational autoencoder (VAE) is a type of generative model that learns to encode data into a latent space and then decode it back to the original data. It consists of an encoder that maps input data to a latent representation and a decoder that reconstructs the original data from the latent representation. The VAE is trained using a combination of reconstruction loss and a regularization term that encourages the latent space to follow a specific distribution, typically a Gaussian distribution. This allows the VAE to generate new data samples by sampling from the latent space. ...
Trying Helixer
Helixer is a eukaryotic gene prediction tool published in 2025. I became interested in this area during my PhD, when I made extensive use of GeneMark-ES and Augustus. Both Augustus and GeneMark-ES work well, but they are a bit dated. In addition, GeneMark-ES requires a custom license and a key. In contrast, Helixer is licensed under GPL-3.0 and is available on GitHub. Today I tried running Helixer. Installation The authors provide both a Docker image and a manual installation path. Docker would be the easiest route, but since Helixer can use CUDA and I have an AMD GPU I will try the manual route first. ...
Pixi: A Conda Replacement
I have heard about pixi for a while now but have not had the chance to try it out until today. When you head to their documentation, you will find the install instructions consist of a bash script piped directly into the terminal. This is not ideal, though admittedly conda does the same thing. There is also an option to compile the binary from source. Since I have cargo installed, that seemed like the more principled approach. ...
Snow Data in Switzerland
I have recently moved to Switzerland and started skiing again after many years. Curious about historical snow levels, I went looking for public data. Finding out how much snow is available right now is very easy, there are many webpages that provide this information for today but it’s much harder to find the answer to the question: How much snow was there yesterday? Or last week? Or a year ago? ...
Conditional Probability: The Tuesday Boy Problem
Today I came across this conditional probability example and as always it was a nice brain teaser that is worth exploring. In these situations I like to use Python to help me understand. The problem was stated as follows: A family has two children. At least one of them is a boy born on a Tuesday. What is the probability that the other child is a girl? And somehow the answer is 51.85% ...
Immich - ML supported tagging plugin
For the last decade or so, I used Seafile, Owncloud and then Nextcloud to self-host my data on a small homeserver. This has worked wonderfully, and I have nothing but respect for the community that built these wonderful and powerful tools. But one thing that never worked as smoothly as I wanted it to was the photo upload from my smartphone to Nextcloud. The upload works, and it rarely fails, but it’s never instant. So it’s not as seamless as taking a picture, turning on the PC, and there it is. It takes anywhere between 30 seconds and many minutes to sync. ...
MHC and Viruses - Molecular Mimicry
I saw this article, “Molecular mimicry as a mechanism of viral immune evasion and autoimmunity”, and I got immediately interested in reproducing Figure 1b. In there, the authors investigate peptide similarity between viruses and the human proteome. They say certain viruses might have adjusted their peptide use to match peptides found in the human proteome, so that they can evade the MHC recognition. And this is a biologically cool mechanism, and the method they used is rather simple. So I thought: let’s try to reproduce it. ...
Face Detection with Python
In the past I have explored what I can do with image embeddings and used it to train a very usable set of classifiers that sort out random photos and nature photos from my camera roll. If you want to read about that you can find the blog post here: openpaul.github.io/posts/2025-04-06-image-sorting and here a small intro into embeddings: openpaul.github.io/posts/2024-09-28-image-embeddings/ Recently I became interested in detecting faces and identifying people in my photos locally. Apps such as immich support that and if I just want to detect faces and sort my pictures it would be my go-to app. But I want to play around and understand what is going on. ...
Python et al. - Getting to a scientific plot on a new machine
From time to time you and I are lucky enough to start fresh. A new MacBook, a new Linux laptop or maybe a new server? And of course, we need to quickly get it up and running to create lovely plots. With Anacoda, Conda, Mamba, UV, Python, Virualenvs and more it can get confusing quickly. While all of this will be changing over time, today I want to disentangle this status quo as of Summer 2025 and maybe create a bit of order in this chaos. ...
Netxflow and nf-core
When analysing data, especially when analysing complicated genomics data, one quickly learns to appreciate the benefits of well-written workflows. In the past, I have developed my own bash, Snakemake and Nextflow pipelines. But since then some people from the bioinformatics community have put in enormous effort to create general standardized pipelines that anyone can use. For Snakemake this effort is called workflow catalogue and for Nextflow it is called nf-core. Last week I was browsing this catalogue and came across a workflow whose structure is very familiar to me: The MAG workflow. ...