Using Selenium with PhantomJS

If you don’t want to play around with things like CasperJS to do your simple browser automation, it turns out that PhantomJS has WebDriver support, and in turn, Selenium supports it.

To use it via the Python bindings, you’ll need to:

1) Install PhantomJS somewhere
2) Install Selenium from PyPi (or install it with Pip)
3) Run:

from selenium import webdriver

driver = webdriver.PhantomJS(phantom_path)

(phantom_path is the path to the phantomjs binary).

And that’s it! Selenium will spawn a PhantomJS process that it communicates to over WebDriver and you can use the Selenium API to do your automation. You therefore don’t need Selenium Server either.

(CasperJS seems to be faster at driving PhantomJS than Selenium, based on anecdotal evidence of a single Google search, but it’s always nice to have options).

Posted on April 18, 2014 at 8:28 pm by Carlos Corbacho · Permalink · Leave a comment
In: Python, Web

Running Clojure on Slackware

To run Clojure on Slackware you’ll want a few things:

  1. JDK (use the Slackbuild from /extra to build the latest version yourself)
  2. Leiningen (available from SlackBuilds.org)
  3. Create a command line wrapper to execute Clojure scripts (some examples for learning Clojure expect you to set this up, but don’t really tell you how to).

Assuming you’ve installed Leiningen (in my case, 1.6.1.1), I’ve created ~/bin/clj with the following to do this:

#!/bin/sh
java -cp ~/.m2/repository/org/clojure/clojure/1.2.1/clojure-1.2.1.jar clojure.main "$@"

Posted on January 4, 2014 at 1:59 pm by Carlos Corbacho · Permalink · Comments Closed
In: Clojure, Linux, Slackware

Varnish and Ajax

If you have endpoints on your site that serve up both HTML and JSON depending on the request type, not the URL, then you need to tell Varnish to add this as an extra hash so that it doesn’t return JSON to clients expecting HTML, and vice versa. Oddly, I’ve not actually seen any examples of how to this, so I came up with:

sub vcl_hash {
    if (req.http.X-Requested-With == "XMLHttpRequest") {
        hash_data(req.http.X-Requested-With);
    }
}

(Django uses X-Requested-With for is_ajax(), so it is consistent with that – if you were wondering).

Posted on August 9, 2013 at 11:03 pm by Carlos Corbacho · Permalink · Leave a comment
In: Apache, Django, Varnish, Web

Linux video acceleration

With the advent of open source UVD support for my HD5750 in Linux, I’ve been trying to understand what is needed for accelerating video under Linux, and how all the components work together. So I drew a diagram to try and make sense of it all:

Linux video acceleration

Sufficed to say, it’s not pretty. The key thing is that there are currently two main modern acceleration API’s under Linux – Video Acceleration API (VA-API) and Video Decode and Presentation API for Unix (VDPAU). They don’t support all video formats, just some. Not all video drivers support them all (open source Intel and closed source AMD support VA-API, closed source nVidia and open source AMD supports VDPAU), and the support for each of these varies wildly from one piece of software to the next – for example, VLC uses VA-API, the closed source Flash player uses VDPAU.

If you’re using a driver that only supports VDPAU, you have a bit more luck, because VA-API currently exposes just a subset of the VDPAU functionality (though this will probably change in the upcoming VA-API release). A wrapper driver has therefore been written that can convert from VA-API to VDPAU, so VA-API software can still be accelerated on VDPAU only drivers. As far as I understand though, the reverse is not true of VA-API drivers and VDAPU software.

Posted on June 24, 2013 at 9:49 pm by Carlos Corbacho · Permalink · Leave a comment
In: Linux

Psycopg2 and large result sets

Psycopg2 has a bit of a gotcha when it comes to fetching result sets that can catch out the unsuspecting developer. By default, it does not actually use cursors, but simply emulate them. In practical terms, this means that the entire result set of your query is fetched by the client into memory.

It is documented these days, but buried quite far down if you’re not looking for it:

http://initd.org/psycopg/docs/usage.html#server-side-cursors

Practically speaking though, what does this actually mean? From a DB API point of view, there is no difference memory wise between:

.fetchone()

and

.fetchall()

The entire result set has already been fetched into memory, all you are doing is controlling how much of that you read into Python at one time.

By and large, it’s not actually a bad thing, as long as you don’t execute queries that return huge result sets. As you generally don’t need to do that, the key thing to be aware of in your client code is to write your code in such a way that you do as much filtering as possible at the SQL layer so you can return as small a result set as possible.

As an example, consider something like this:

cursor = connection.execute("SELECT * from cars")
for row in cursor:
    if row[1] == "blue":
        return row

Using an ORM such as the Django one, the equivalent would look something like:

for car in Car.objects.all():
    if car.colour == "blue":
        return car

In the above, we’re trying to find the first car that is blue. (It’s rather contrived that we’re calling .all(), but you could also imagine some other filter that returns a large number of car records). Now, let’s say that our ‘Car’ table has 20,000 cars in it. In both cases, it naively appears that we’re only reading in one record at a time, but this is not quite the case.

As soon as we executed the query, Psycopg2 loaded the entire result set, which in our case is the entire table, into memory. In the Django example, the only saving grace is that we are lazily creating the Car objects from the row, but that’s it – the entire result set is still in memory!

Whilst you could use named cursors (even with things like Django there are various ways to force PostgreSQL to use them), it’s generally not necessary. Simply try to do as much of your filtering as possible in SQL to keep the size of the result set small, rather than filtering in your Python code.

Posted on April 15, 2013 at 11:02 pm by Carlos Corbacho · Permalink · Leave a comment
In: Linux, Python

Linux Containers (LXC), libvirt and Slackware

I’ve spent the last few days getting very frustrated with trying to make a Linux Container (LXC) run via libvirt on Slackware – various weird and wonderful error messages about being unable to mount cgroups (LXC depends on cgroups to provide the networking and namespace isolation).

The short answer is that by default, Slackware mounts a cgroup type filesystem onto /sys/fs/cgroup, which causes every cgroup to be mounted into that directory. Libvirt does not like this for LXC – it expects each different cgroup type to be mounted in a separate directory. I’ve therefore put together the following init script that remounts the cgroups into the format that Libvirt expects (which I’ve based off the contents of fstab for another machine running Ubuntu):

#!/bin/sh
#
# /etc/rc.d/rc.cgroup: Cgroup mounting script
#
# Remount cgroups under a tmpfs directory in /sys. By default,
# Slackware mounts /sys/fs/cgroup - however, this does not work
# with using libvirt for Linux Containers, because it expects each
# cgroup to have its own directory. So, let's do that.

# Unmount the existing /sys/fs/cgroup
umount /sys/fs/cgroup

# Create a tmpfs structure to hold all the new mounts
mount -t tmpfs -o mode=755,noatime tmpfs /sys/fs/cgroup

for cgroup in cpu cpuset cpuacct memory devices freezer blkio perf_event; do
    mkdir /sys/fs/cgroup/$cgroup
    mount -t cgroup -o $cgroup,noatime cgroup /sys/fs/cgroup/$cgroup
done

To use this, add it to rc.local before rc.libvirt is called (as libvirt needs to use the cgroups).

Posted on October 18, 2012 at 10:09 pm by Carlos Corbacho · Permalink · Leave a comment
In: Linux, Slackware

Slackware, Amarok and Transcoding

As I’ve recently been importing CDs in as FLAC, I want something that can transcode this for me when and manage my collection when working with my iPod. Amarok has now finally re-gained transcoding support as of 2.4:

However, trying this in Slackware 14.0 RC3, it doesn’t work – trying to copy FLAC tracks to my iPod failed with an error message telling me the format is unsupported. A little bit of digging, and it turns out that to get transcoding support, you need to have ffmpeg installed when you build Amarok.

So to get it working, you’ll need to install ffmpeg and the relevant dependencies (SlackBuilds.org has this), then rebuild Amarok with ffmpeg installed. Your new, shiny Amarok will then offer transcoding as an option the next time you want to copy FLAC to your iPod.

Posted on August 27, 2012 at 11:09 am by Carlos Corbacho · Permalink · Leave a comment
In: KDE, Linux, Slackware

USB 3G Modem on Slackware – DNS

A follow up from last years post – clearly having not played around with the 3G card since last year, it’s only today I realised that DNS wasn’t actually working. PPPD was correctly requesting the nameservers from the remote peer, but by default, puts them into /etc/ppp/resolv.conf. This isn’t terribly helpful.

PPPD will try to call out to /etc/ppp/ip-{up,down} when bringing a connection up or d’own. By creating these and making them executable, we can get them to set up DNS for us.

I’ve therefore created them as follows:

/etc/ppp/ip-up:

#!/bin/sh

# Change DNS resolvers
if [ -f /etc/resolv.bak ]; then
  echo "/etc/resolv.bak exists!"
else
  cp /etc/resolv.conf /etc/resolv.bak
  rm /etc/resolv.conf
  ln -sf /etc/ppp/resolv.conf /etc/resolv.conf
fi

/etc/ppp/ip-down:

#!/bin/sh

# Change DNS resolvers back.
if [ -f /etc/resolv.bak ]; then
  rm /etc/resolv.conf
  mv /etc/resolv.bak /etc/resolv.conf
else
  echo "/etc/resolv.bak missing!"
fi

With this, running wvdial now also takes care of DNS properly and I can actually resolve domain names (you can also extend these scripts if you want things like VPN, etc, but in my case, this is more than sufficient).

Posted on August 24, 2012 at 11:02 pm by Carlos Corbacho · Permalink · Leave a comment
In: Linux, Slackware

USB 3G Modem for Vodafone on Slackware

In spite of having to spend too much of my work time getting various 3G dongles working, I’ve never had to do it at home.

For Vodafone, it’s actually quite simple:

1) Install wvstreams
2) Install wvdial
3) Add the following to your /etc/wvdial.conf

[Dialer vodafone]
Init3 = AT+CGDCONT=1,"IP","internet"
Phone = *99#
Password = pass
Username = user

Vodafone don’t actually use a username and password, so you just need something to keep wvdial happy. In my case, I’m then creating a symlink to /dev/modem from /dev/ttyUSB0, but I could just as easily add a ‘Modem = /dev/ttyUSB0′ line as well here.

Connecting is then just a matter of running as root (or using sudo):

wvdial vodafone

Posted on September 4, 2011 at 2:33 pm by Carlos Corbacho · Permalink · Leave a comment
In: Linux, Slackware

Python 2 and Unicode

I’ve been meaning to write this up for a while, so let’s see how we go.

One of the problems I have with Python 2 currently is that Unicode support is a bit of a hit and miss game. The problem is that Unicode was a bolt on extra in Python 2.x – in the brave new world of Python 3, they’ve actually fixed this up properly, with unicode objects being the default, and a new type, bytes, to represent byte strings. Unless you’re doing I/O, then you really do want Unicode (hint – what do you expect len(“£”) to be?) To give an example:

a = "foo"

In Python 2, this will return a str byte string. If you want Unicode objects, you either need an explicit cast:

a = u"foo"

Or, if you’re using Python 2.6:

from __future__ import unicode_literals

a = "foo"

Which is all nice, except libraries are the falling down point, as per usual. The situation on 2.x is basically a mess – not entirely unsurprising, given the origins of the unicode type in Python 2.

Here’s a small selection of the unicode support in Python 2.x libraries:

csv – str only for input and output
ElementTree (and faithfully reproduced in LXML) – Depends. Accepts Unicode, but on return, it tries to coerce everything to ASCII encoded str objects, and if it fails, returns the original internal unicode object.
PyGTK – Accepts unicode, always returns UTF-8 encoded str
PyQt – either QString, or if you switch on the v2 API, unicode objects.
Django – Returns unicode objects
Pyscopg2 – Returns str in the client encoding, unless you specify you want unicode objects (globally or per connection – Django sets this globally)

As for other libraries, I can’t really speak, but I would guess the situation is not much improved there either.

So what are the solutions?

1) Close your eyes, put your fingers in your ears and pretend no-one uses anything but ASCII.

2) Try to only use libraries that actually have proper Unicode support. Oh, and don’t forget to declare every string literal as unicode while you’re there (or use

from __future__ import unicode_literals

)

3) Use Python 3 (though quite a few libraries still haven’t been ported to it yet, so perhaps not better than 2).

Posted on October 28, 2010 at 9:31 pm by Carlos Corbacho · Permalink · Leave a comment
In: Python