Scienco.org
Nov 13
Watexy-test (with edit and working < and >)
icon1 Mikkel Meyer Andersen | icon4 November 13, 2009 at 01:45 (UTC) | icon3 13 Comments »
icon3 , , , ,

I'm a new version of Watexy with the possibility to edit equations and fixes the bug with < and > not working.

The testing-robot is:
watexy-test@appspot.com

It only works in wave.google.com and not in wavesandbox.com for some reason.

I hope you'll take the time to give it a try and some feedback to improve it.

watexy-test

Nov 11

First of all, thanks for the feed-back in [1].

I've now fixed three things:
1) It's now possible to put more than one equation in a blip, and they all get nicely/properly displayed, thanks to Michael, post 18 in [1] for a possible solution
2) Instead of using http://www.forkosh.dreamhost.com/mathtex.cgi , I've now installed the program on one of my own servers (thanks a lot for both the software and the service)
3) Multi-line is supported, such that the equation can be written over multiple lines and still be matched correctly

Please do not hesitate to comment this version as well!

By the way, I'm sorry for the delay with this new version. My life as an exchange student in Australia has been a bit hectic with all the travelling besides the studies.

[1]: http://www.scienco.org/2009/watexy-latex-robot-for-google-wave/

The new code is as follows:

# Python reference:
# http://wave-robot-python-client.googlecode.com/svn/trunk/pydocs/index.html
 
# Shortcut to the important OpBasedDocument
# http://wave-robot-python-client.googlecode.com/svn/trunk/pydocs/waveapi.ops.OpBasedDocument-class.html
 
__author__ = 'mikl@mikl.dk (Mikkel Meyer Andersen)'
 
import re
 
from waveapi import events
from waveapi import model
from waveapi import robot
from waveapi import document
 
def OnRobotAdded(properties, context):
  """Invoked when the robot has been added."""
  root_wavelet = context.GetRootWavelet()
  root_wavelet.CreateBlip().GetDocument().SetText("Hi. My name is Watexy and I'm here to help you presenting Latex in waves. Just put the latex between  and , e.g. 2+2=5.")
 
def reversed_iterator(iter):
    return reversed(list(iter))
 
def OnBlipSubmitted(properties, context):
  """Invoked when a blip has been added."""
  blip = context.GetBlipById(properties['blipId']) 
  blip_text_view = blip.GetDocument()
 
  matches = re.finditer('\$\$(.+?)\$\$', blip_text_view.GetText(), re.DOTALL)
 
  """
  Reverse list such that the last items will be changed first, such that
  the positions for the first items doesn't change
  """
  matches = reversed_iterator(matches)
 
  for m in matches:
    """
    The +/- 2 is because of the length of the $$'s. 
    If not removed, the loop will run infintely! 
    """
    blip_text_view.DeleteRange(document.Range(m.start(1)-2, m.end(1)+2))
    image = document.Image('http://meyer.fm/cgi-bin/mathtex.cgi?' + m.group(1), caption=m.group(1))
    blip_text_view.InsertElement(m.start(1)-2, image)
 
if __name__ == '__main__':
  myRobot = robot.Robot('watexy',
      image_url='http://watexy.appspot.com/assets/icon.png',
      version='12',
      profile_url='http://watexy.appspot.com/')
  myRobot.RegisterHandler(events.WAVELET_SELF_ADDED, OnRobotAdded)
  myRobot.RegisterHandler(events.BLIP_SUBMITTED, OnBlipSubmitted)
  myRobot.Run()
Aug 28
qqmultinorm.R version 1.1 - more intelligent plot size
icon1 Mikkel Meyer Andersen | icon4 August 28, 2009 at 07:20 (UTC) | icon3 No Comments »
icon3 ,

I've updated the qqmultinorm.R script a bit so that it's now capable of picking a more optimal plot size (less unused space).

The idea refers to deciding the sides (dimensions) of a rectangle if the areal (number of plots) is known i.e., optimising the dimensions of a rectangle given the areal. We want the perimeter as small as possible (for better viewing), preferably wider than longer if square isn't possible. A given number n is factorised in to numbers within a given error. For example if the allowed error is 2, then 7 gets factorised in (1,7), (2, 4), (4, 2), and (7, 1) (here redundant factorisation is included). Although 2*4 = 8, but abs(7-8) = 1 <= 2, so it's acceptable. Actually the default error is 10% of the input areal.

The way the proper factorisations is chosen, is by ordering the pairs by ascending the difference of the components i.e, how much the width and height differs. And then ordered ascending by height so that the plot gets wide screen-like instead of poster-like.

The new script is a follows (only find.optimal.mfrow.size(...) and a bit logic in qqmultinorm(...) added):

# File name: qqmultinorm.R
# Version: 1.1
# Last updated: 2009-08-28
#
# This R-code is made by:
# Mikkel Meyer Andersen, Denmark
# mikl [funny-a] math [.] aau [.] dk or 
# mikl [funny-a] mikl [.] dk
#
# Licence: GPLv2
#
# Feel free to use it, but if you do I'll like to hear about it (just for fun).
# If you make corrections, please submit them back so others can enjoy them as well.
 
qqchisq <- function(y, main, df=2, continuity.correction = 0.5)
{
  n <- length(y)
  y <- sort(y)
  c <- numeric(n)
 
  for (i in 1:n)
    c[i] <- qchisq((i-continuity.correction)/n, df=df)
 
  plot(c, y, xlab="Theoretical Quantiles", ylab="Sample Quantiles", main=main)
  lines(c(c[1], c[n]), c(c[1], c[n]), type="l")
}
 
dec2bin <- function(x)
{
  if (!is.vector(x) || length(x) != 1 || x < 0)
    stop("x must be a non-negative integer")
 
  N <- length(x)
  ndigits <- floor(log2(x)) + 1
  bin <- numeric(ndigits)
 
  for (i in (ndigits-1):0)
  {
    tmp <- 2^i
 
    if (x %/% tmp >= 1)
    {
      bin[i+1] <- 1
      x <- x - tmp
    }
  }
 
  return(rev(bin))
}
 
# Returns the power set without the empty set
power.set <- function(v)
{
  n <- length(v)  
  N <- 2^n - 1
  ps <- vector("list", N)
 
  for (i in 1:(N-1))
  {
    Nbin <- dec2bin(i)
    Nbin <- c(numeric(n-length(Nbin)), Nbin)
    Nbin <- rev(Nbin)
    ps[[i]] <- v[which(Nbin == 1)]
  }
 
  ps[[N]] <- v
 
  return(ps)
}
 
# Input: n
#  - here the number of plots
# Returns c(h, w)
#  - the optimal choice of rows and cols to use in in mfrow
find.optimal.mfrow.size <- function(n)
{  
  w0 <- ceiling(sqrt(n))
 
  # The area can at max contain of 10% unused space
  max.error <- round(n*0.1)
 
  n.minus.error <- n - max.error
  n.plus.error <- n + max.error
 
  # If n is a square, fine!
  if (w0^2 == n)
    return(c(w0, w0))
 
  # Col 1 and 2: w and h
  # Col 3: The difference between w and h: this should be as small as possible
  candidates <- matrix(ncol=3)
 
  for (w in w0:1)  
  {    
    h <- ceiling(n / w)
    n0 <- w*h
 
    # Because of ceiling we know that n0 >= n
    if (n0 <= n.plus.error)
      candidates <- rbind(candidates, c(h, w, abs(w-h)))
  }
 
  # First row is NA
  candidates <- candidates[-1,]
 
  # Uups, something went wront - well, don't panic
  if (nrow(candidates) == 0)
    return(c(w0, w0))
 
  # First order by abs(w-h) and then by h to get a widescreen-look 
  # instead of a poster-look
  candidates <- candidates[order(candidates[,3], candidates[,1]), ]
 
  return(candidates[1, c(1,2)])
}
 
# dataset: variables in columns and observations as rows
# subset.min.size, subset.max.size: inclusive limits
# filename: if specified, the plot are saved as a png file with this filename
qqmultinorm <- function(dataset, subset.min.size = 1, subset.max.size = 4, filename = NULL, use.optimale.size = F)
{
  p <- ncol(dataset)
  n <- nrow(dataset)
 
  if (subset.min.size < 1) stop("subset.min.size < 1")
  if (subset.min.size > p) stop("subset.min.size > p")
  if (subset.max.size < 1) stop("subset.max.size < 1")
  if (subset.max.size > p) stop("subset.max.size > p")
  if (subset.min.size > subset.max.size) stop("subset.min.size > subset.max.size")
 
  if (is.null(colnames(dataset)))
    colnames(dataset) <- 1:p
 
  # We have p variables. If all is to be checked against each other,
  # then we have a power-set with 2^p subsets (including the empty set)
 
  # Here we get subset containing indexes of the variables to include
  # Note that power.set doesn't include the empty set.
  subsets <- power.set(1:p)
  subsets.len <- length(subsets)
 
  # To get the plots with the fewest variables first, we do a litte trick:
  # While we find out which plots to include, we build a list
  # where each element of a list is the index of the subset,
  # and the index of the element is the size of the subset.
  # (The +1 is because the limits are includesive!)
  s.included <- vector("list", subset.max.size - subset.min.size + 1)
 
  plots <- 0
 
  for (i in 1:subsets.len)
  {
    s <- subsets[[i]]
    s.len <- length(s)
 
    if (s.len >= subset.min.size && s.len <= subset.max.size)
    {
      plots <- plots + 1
      s.included[[s.len - subset.min.size + 1]] <- c(s.included[[s.len - subset.min.size + 1]], i)
    }
  }
 
  # Now it's possible to build the subset index vector; 
  # we disregard the size of each subset; no more need to know it.
  s.indexes <- c()
 
  for (s in s.included)
  {
    s.indexes <- c(s.indexes, s)
  }
 
  # We want the best view  
  plot.per.row <- ceiling(plots^(1/2))
  plot.per.column <- ceiling(plots^(1/2))
 
  if (use.optimale.size)
  {
    mfrow.parameters <- find.optimal.mfrow.size(plots)
    plot.per.row <- mfrow.parameters[1]
    plot.per.column <- mfrow.parameters[2]
  }
 
  plot.width <- 300
  plot.height <- 200
 
  if (!is.null(filename))
    png(file=paste(filename, ".png", sep=""), bg="white", width = plot.per.row * plot.width, height = plot.per.column * plot.height)
 
  par(mfrow = c(plot.per.row, plot.per.column))  
 
  # There's no need to calculate a whole lot several times:
  ybar <- as.vector(colMeans(dataset))
 
  S <- as.matrix(var(dataset))
 
  current <- 1  
  for (i in s.indexes)
  {
    s <- subsets[[i]]
    s.len <- length(s)
 
    if (s.len > subset.max.size)
      next
 
    cat("Processing subset no.", current, "out of", plots, "\n")
 
    # Container for our values
    squared.dist <- numeric(n)
 
    # qr.solve(A) = A^(-1)  
    Sinv <- qr.solve(S[s,s])
 
    # Then calculate the squared distance for each datapoint
    for (i in 1:n)
    {
      c <- dataset[i,s] - ybar[s]
      squared.dist[i] <- t(c) %*% Sinv %*% c
    }
 
    # Finding the order statistic
    squared.dist <- sort(squared.dist)
 
    qqchisq(squared.dist, paste(colnames(dataset)[s], collapse=", "), s.len)
 
    current <- current + 1
  }
 
  if (!is.null(filename))
    dev.off()
}
 
# Example:
A <- matrix(rnorm(2000, mean=3, sd=2), ncol=8)
qqmultinorm(A, 2, 2, "multinorm-optimal.size", use.optimale.size = T)
qqmultinorm(A, 2, 2, "multinorm", use.optimale.size = F)
Aug 27
qqmultinorm.R - evaluating the normality of a sample
icon1 Mikkel Meyer Andersen | icon4 August 27, 2009 at 11:44 (UTC) | icon3 No Comments »
icon3 ,

I wrote a R-script that can be used to evaluate for multivariate normality of a sample. It's a kind of generalisation to the qqnorm, but this one just uses sums of the squared statistical distance which is then chi squared distributed with degrees of freedom equalling the number of squared statistical distance summed.

The useful thing about this script, is that it's able to plot all possible combinations of the variables. It's possible to specify the minimum number of variables to compare and the maximum number of variables to compare. All possible combinations are found by calculating the power set (using binary representation through the decimal to binary conversion, dec2bin, because if a set has k elements, then its power set has 2^k elements, and the binary representation of the numbers from 1 to 2^k are then used to pick out the subsets in the power set).

Please notice that it's possible to specify a filename so that the plots are written to a png file. This is by far the easiest thing - and only possibility - if there's more than a few plots.

I haven't wrote a lot of documentation besides this, but feel free to ask if in doubt of anything!

# File name: qqmultinorm.R
#
# This R-code is made by:
# Mikkel Meyer Andersen, Denmark
# mikl [funny-a] math [.] aau [.] dk or 
# mikl [funny-a] mikl [.] dk
#
# Licence: GPLv2
#
# Feel free to use it, but if you do I'll like to hear about it (just for fun).
# If you make corrections, please submit them back so others can enjoy them as well.
 
qqchisq <- function(y, main, df=2, continuity.correction = 0.5)
{
  n <- length(y)
  y <- sort(y)
  c <- numeric(n)
 
  for (i in 1:n)
    c[i] <- qchisq((i-continuity.correction)/n, df=df)
 
  plot(c, y, xlab="Theoretical Quantiles", ylab="Sample Quantiles", main=main)
  lines(c(c[1], c[n]), c(c[1], c[n]), type="l")
}
 
dec2bin <- function(x)
{
  if (!is.vector(x) || length(x) != 1 || x < 0)
    stop("x must be a non-negative integer")
 
  N <- length(x)
  ndigits <- floor(log2(x)) + 1
  bin <- numeric(ndigits)
 
  for (i in (ndigits-1):0)
  {
    tmp <- 2^i
 
    if (x %/% tmp >= 1)
    {
      bin[i+1] <- 1
      x <- x - tmp
    }
  }
 
  return(rev(bin))
}
 
# Returns the power set without the empty set
power.set <- function(v)
{
  n <- length(v)  
  N <- 2^n - 1
  ps <- vector("list", N)
 
  for (i in 1:(N-1))
  {
    Nbin <- dec2bin(i)
    Nbin <- c(numeric(n-length(Nbin)), Nbin)
    Nbin <- rev(Nbin)
    ps[[i]] <- v[which(Nbin == 1)]
  }
 
  ps[[N]] <- v
 
  return(ps)
}
 
# dataset: variables in columns and observations as rows
# subset.min.size, subset.max.size: inclusive limits
# filename: if specified, the plot are saved as a png file with this filename
qqmultinorm <- function(dataset, subset.min.size = 1, subset.max.size = 4, filename = NULL)
{
  p <- ncol(dataset)
  n <- nrow(dataset)
 
  if (subset.min.size < 1) stop("subset.min.size < 1")
  if (subset.min.size > p) stop("subset.min.size > p")
  if (subset.max.size < 1) stop("subset.max.size < 1")
  if (subset.max.size > p) stop("subset.max.size > p")
  if (subset.min.size > subset.max.size) stop("subset.min.size > subset.max.size")
 
  if (is.null(colnames(dataset)))
    colnames(dataset) <- 1:p
 
  # We have p variables. If all is to be checked against each other,
  # then we have a power-set with 2^p subsets (including the empty set)
 
  # Here we get subset containing indexes of the variables to include
  # Note that power.set doesn't include the empty set.
  subsets <- power.set(1:p)
  subsets.len <- length(subsets)
 
  # To get the plots with the fewest variables first, we do a litte trick:
  # While we find out which plots to include, we build a list
  # where each element of a list is the index of the subset,
  # and the index of the element is the size of the subset.
  # (The +1 is because the limits are includesive!)
  s.included <- vector("list", subset.max.size - subset.min.size + 1)
 
  plots <- 0
 
  for (i in 1:subsets.len)
  {
    s <- subsets[[i]]
    s.len <- length(s)
 
    if (s.len >= subset.min.size && s.len <= subset.max.size)
    {
      plots <- plots + 1
      s.included[[s.len - subset.min.size + 1]] <- c(s.included[[s.len - subset.min.size + 1]], i)
    }
  }
 
  # Now it's possible to build the subset index vector; 
  # we disregard the size of each subset; no more need to know it.
  s.indexes <- c()
 
  for (s in s.included)
  {
    s.indexes <- c(s.indexes, s)
  }
 
  # We want a squared view
  plot.per.row <- ceiling(plots^(1/2))
  plot.per.column <- ceiling(plots^(1/2))
  plot.width <- 200
  plot.height <- 200
 
  if (!is.null(filename))
    png(file=paste(filename, ".png", sep=""), bg="white", width = plot.per.row * plot.width, height = plot.per.column * plot.height)
 
  par(mfrow = c(plot.per.row, plot.per.column))  
 
  # There's no need to calculate a whole lot several times:
  ybar <- as.vector(colMeans(dataset))
 
  S <- as.matrix(var(dataset))
 
  current <- 1  
  for (i in s.indexes)
  {
    s <- subsets[[i]]
    s.len <- length(s)
 
    if (s.len > subset.max.size)
      next
 
    cat("Processing subset no.", current, "out of", plots, "\n")
 
    # Container for our values
    squared.dist <- numeric(n)
 
    # qr.solve(A) = A^(-1)  
    Sinv <- qr.solve(S[s,s])
 
    # Then calculate the squared distance for each datapoint
    for (i in 1:n)
    {
      c <- dataset[i,s] - ybar[s]
      squared.dist[i] <- t(c) %*% Sinv %*% c
    }
 
    # Finding the order statistic
    squared.dist <- sort(squared.dist)
 
    qqchisq(squared.dist, paste(colnames(dataset)[s], collapse=", "), s.len)
 
    current <- current + 1
  }
 
  if (!is.null(filename))
    dev.off()
}
 
# Example:
A <- matrix(rnorm(2000, mean=3, sd=2), ncol=8)
qqmultinorm(A, 1, 3, "multinorm-1-3")
#qqmultinorm(A, 4, 4, "multinorm-4")
#qqmultinorm(A, 5, 5, "multinorm-5")
#qqmultinorm(A, 6, 8, "multinorm-6-8")
multinorm-1-3

multinorm-1-3

Aug 3
Splitting up a huge file on one line with sed
icon1 Mikkel Meyer Andersen | icon4 August 3, 2009 at 23:59 (UTC) | icon3 No Comments »
icon3 , ,

Sometimes when I export MySQL to files, I end up having one huge query on one line. It can be quite annoying! I'd rather want it on several lines; in that way it easier to just copy out a fragment of the query. And it also seems that some editors handle several lines better than one huge. Well, in this case my line looked like this:

INSERT INTO TABLE VALUES (...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...),(...);

With this sed-command, every entry got its own line:

sed s/\),\(/\),\\n\(/g export.sql > one-per-line.sql
Jul 30
Watexy - Latex robot for Google Wave
icon1 Mikkel Meyer Andersen | icon4 July 30, 2009 at 00:06 (UTC) | icon3 39 Comments »
icon3 , , , ,

I finally got access to a Google Wave sandbox account, making it possible to start experimenting. The first thing I made was a Latex robot. By adding the robot, it's possible to write Latex in Waves. Afterwards, simply by putting Latex between two $'s:

<span style=\lim_{n \to \infty} \frac{1}{n} = 0$$

Watexy then changes the Wave and inserts the Latex as an image instead of the text.

You can try it yourself by adding watexy [this-funny-curly-a] appspot.com to your Wave-conversations. You can download the source here.

There's still a quite big bug: It's only the first Latex-image that gets substituted correctly. The following gets inserted some positions wrong, and the deletion of the text isn't correct neither. But the images itself are correct, only the substitution isn't working properly. But I'm working on that!

The more technical part: the robot is written in Python, hence using the Google Wave Python API. It's the first time I've tried that (thought it was a good possibility to learn it). The robot basically works by getting triggered when a new blip is submitted. Then it searches for Latex-code in that and substitutes it with pictures. Although the code is quite simple, it shows some very basic concepts on how to alter/change the contents of Waves on-the-fly (deleting text and inserting images, but it could be other things as well) with robots. The Latex-source-to-image is done by the http://www.forkosh.dreamhost.com/source_mathtex.html#webservice service.

The core Python comprising the robot is this:

# Python reference:
# http://wave-robot-python-client.googlecode.com/svn/trunk/pydocs/index.html
 
# Shortcut to the important OpBasedDocument
# http://wave-robot-python-client.googlecode.com/svn/trunk/pydocs/waveapi.ops.OpBasedDocument-class.html
 
""" 
Known bugs:
- From (and including) the second Latex-fragment, the positioning gets wrong
  > Probably the document needs some kind of updatering
  > Even though it's a bit strange since the re matches all the Latex correctly
"""  
 
__author__ = 'mikl@mikl.dk (Mikkel Meyer Andersen)'
 
import re
 
from waveapi import events
from waveapi import model
from waveapi import robot
from waveapi import document
 
def OnRobotAdded(properties, context):
  """Invoked when the robot has been added."""
  root_wavelet = context.GetRootWavelet()
  root_wavelet.CreateBlip().GetDocument().SetText("Hi. My name is Watexy and I'm here to help you presenting Latex in waves. Just put the latex between " /> and , e.g. 2+2=5. This robot uses the http://www.forkosh.dreamhost.com/source_mathtex.html#webservice service.)
 
def OnBlipSubmitted(properties, context):
  """Invoked when a blip has been added."""
  blip = context.GetBlipById(properties['blipId']) 
  blip_text_view = blip.GetDocument()
 
  latex_regex = re.compile('\$\$(.+?)\$\$')
  m = latex_regex.search(blip_text_view.GetText())
 
  """ 
  Only replace one Latex-fragment at a time, because replaceing one fragment
  changes the text positions in the rest. That's why re.finditer isn't used.  
  """
  while m != None:
    """
    The +/- 2 is because of the length of the " />'s. 
    If not removed, the loop will run infintely! 
    """
    blip_text_view.DeleteRange(document.Range(m.start(1)-2, m.end(1)+2))
    image = document.Image('http://www.forkosh.dreamhost.com/mathtex.cgi?' + m.group(1), caption=m.group(1))
    blip_text_view.InsertElement(m.start(1)-2, image)    
    m = latex_regex.search(blip_text_view.GetText())
 
if __name__ == '__main__':
  myRobot = robot.Robot('watexy', 
      image_url='http://watexy.appspot.com/assets/icon.png',
      version='9',
      profile_url='http://watexy.appspot.com/')  
  myRobot.RegisterHandler(events.WAVELET_SELF_ADDED, OnRobotAdded)
  myRobot.RegisterHandler(events.BLIP_SUBMITTED, OnBlipSubmitted)
  myRobot.Run()
Oct 27

Although R supports a lot of different probability distributions, it does not support the scaled inverse chi-square distribution - well, not directly. But it can of course be implemented.

Read the rest of this entry »

Aug 19
Import VCF-file to Nokia 6300 from Ubuntu through bluetooth
icon1 Mikkel Meyer Andersen | icon4 August 19, 2008 at 13:47 (UTC) | icon3 2 Comments »
icon3 , , ,

Today I got a new Nokia 6300 - my old Sony Ericsson W810i had hard times charging. Of course I wanted to copy my phonebook onto my new phone, but it wasn't as easy as I hoped for - but after a while I found an alternative and easy way.

Read the rest of this entry »

Aug 8
FreeBSD 7.0 on Xen 3.2
icon1 Mikkel Meyer Andersen | icon4 August 8, 2008 at 16:05 (UTC) | icon3 No Comments »
icon3 , ,

After I got Xen installed on Ubuntu Server 8.04, I wanted to install FreeBSD 7.0, too. But this was in no regards as easy as installing Ubuntu Server 8.04 as a domU!

Read the rest of this entry »

Aug 4
Ubuntu Server 8.04 as domU
icon1 Mikkel Meyer Andersen | icon4 August 4, 2008 at 10:23 (UTC) | icon3 5 Comments »
icon3 , , ,

If you have followed the guide in "Xen on Ubuntu Server 8.04 (Hardy Heron) with complex disk setup" or have an environment similar to that, please read on - if not please read on, too :-) . Since my server is hosted at Hetzner, this guide will be based on that. I want to use network bridging.

Read the rest of this entry »

« Previous Entries Next Entries »