
Posted by
Wicher
Topic:
Code
January 6th
2012
I wanted to know whether I could use some tricks to make more efficient use of the 4GB Compact Flash storage i have in my Alix (running Gentoo Linux). I would like to keep a local portage tree on it — it currently mounts the tree over NFS due to space concerns.
Why no harddisk? Because that goes against the idea of having a small, reliable, always-on, energy-efficient home server.
Techies like pictures of hardware with the cover off so here’s my Alix:

Such a setup can lead to unfortunate situations, such as needing a package to restore connectivity to the network on which the fileserver resides that contains the package that I need to restore connectivity to the network on which… see where I’m going? Nowhere! ;-)
The portage tree fulfills the role of a package database for us Gentoo ricers. I need it locally.
Diego, one of the Gentoo devs, wrote a blog post about space inefficiency incurred through the use of many small files in the portage tree. He puts the Portage tree on different filesystems to arrive at an accurate and detailed picture of incurred overhead.
I wanted to quickly find out what filesystem block size would suit different parts of my filesystem (such as /etc/, /usr/src/linux/, /usr/portage/, /var/db/pkg/) best, and how much I could save. Testing all block sizes with all parts of my FS was not very appealing, so I decided to simply calculate the file slack and wrote this simple Python script:
slacktastic.py
#!/usr/bin/env python3
"""
slacktastic.py - calculate filesystem file slack for different block sizes.
Invoke me thusly:
find /path/to/tree -xdev -type f -printf "%s\\n" | slacktastic.py 4096
for a calculation using the contents below /path/to/tree with a block size of 4096 bytes.
"""
import sys, functools, math
blksz = None
try:
blksz = int(sys.argv[1])
except (IndexError,ValueError):
print(__doc__, file=sys.stderr)
print('I need a blocksize as the first argument.\n', file=sys.stderr)
sys.exit(1)
sizes = [int(strsize) for strsize in sys.stdin.read().strip().split('\n')]
sumslack = functools.reduce(lambda sumslack, sz: sumslack + (blksz - (sz % blksz)), sizes, 0)
sumblks = functools.reduce(lambda sumblks, sz: sumblks + (math.ceil(sz / blksz)), sizes, 0)
sumsizes = sum(sizes)
print('{:n} total slack'.format(sumslack))
print('{:n} bytes in files'.format(sumsizes))
print('{0:n} total blocks of {1} bytes'.format(sumblks,blksz))
print('{:.2%} inefficiency'.format( sumslack / (sumblks*blksz) ))
This doesn’t take into account things such as tail packing (à la ReiserFS), compression (Btrfs), or directory slack. Just file slack.
On my laptop filesystem, with a 4096 block size, this leads to the following observation:
find /var/db/pkg -xdev -type f -printf "%s\n" | slacktastic.py 4096
171942364 total slack
92655140 bytes in files
64506 total blocks of 4096 bytes
65.08% inefficiency
My /var/db/pkg, Gentoo’s ‘database’ of installed packages (containing their build environment and all kinds of stuff you wouldn’t need on a binary distro) contains 65% air! That’s 170 megs of waste which I don’t want that on my Alix’s 4GB CF card. With a 1024 byte block size — Ext4’s minimum — the situation is better, but it’s still over 40 megs of hot air.
I ended up choosing btrfs with compression. It has a fixed 4096 byte leaf/node size but it does tail packing (good for small files) and compression (for my /var/log). My script is useless for estimations on such a filesystem so I ran some actual tests and it turns out I can fit my /usr/src/linux, /var/db/pkg, /var/log and /usr/portage on a 1GB btrfs filesystem. They didn’t fit on an bs=1024b Ext4 FS.
Tags: block size, English, filesystem overhead, python, slack, slack space —

Posted by
Wicher
Topic:
Howto
February 9th
2011
When SQLite is not enough
SQLite is great for ad-hoc SQL DB-ing, but it’s not so great if you need to serve multiple processes (on writing, it locks the complete database).
It just so happens that, for a certain project, I need the ad-hocishness of SQLite *and* multiprocess write concurrency.
It appears that PostgreSQL can easily be made to run in “hassle-free” mode. Meaning: no need to ask your mum or your sysadmin anything, no need to write any config files.
Minimalism
Well, almost. I put in some extra options to disable the TCP socket, and I set some permissions on the unix socket.
For these examples, I assume bash or zsh and a unixish system. And PostgreSQL 9.0.
create a database:
initdb ~/mypostgres
You now have a directory ‘mypostgres’ in your homedir. It contains some config files with defaults. We don’t care.
start a server:
postgres -D ~/mypostgres/ -k ~/mypostgres/ --listen_addresses='' \
--unix_socket_permissions=0660 --unix_socket_group=$(id -g)
Let’s go over these options:
-D ~/mypostgres/ — Use ~/mypostgres as datadir (it’s where you created the database, so.)
-k ~/mypostgres/ — Use ~/mypostgres as the directory to store the unix socket for clients to connect to. Look closely and you’ll discover it (it starts with a dot for no good reason). Put it somewhere else (/tmp springs to mind) if you want other users to connect to your db, more on that later.
These are not strictly necessary:
--listen_addresses='' — Don’t listen on any TCP sockets. We don’t really authenticate users. So if you expose a TCP socket (default, on the localhost interface) any user can do anything to your databases just by running something like psql -h 127.0.0.1 -d postgres -U yourusername
--unix_socket_permissions=0660 — By default, permissions are o=rwx. If you put the socket in a public place such as /tmp, then with our authentication-free mode of operation, you probably do not want any and all users to be able to access your DB.
--unix_socket_group=$(id -g) — This is completely superfluous but it’s here so you don’t shoot yourself in the foot when blindly copy-pasting teh codez. $(id -g), of course, is your primary group ID and is what the group ownership of the socket will have been set to without this specification anyway (hence the superfluousness). So when would you use this? You’d use it when you want to let other users access your DB. You’d set the gid to some appropriate group that you and the other users belong to (and you’d put the socket in a public place, that much should be clear now).
some goodies:
--silent-mode=true — Daemon mode. To quit the server, get the PID from ~/mydb/postmaster.pid and send it a SIGTERM.
-F — Wicked fast writes mode. Also, wickedly trashed DB mode if the server is interrupted abnormally.
Create and connect
Now the server is up and running, create a database:
createdb -h ~/mypostgres mydb
and connect to it:
psql -h ~/mypostgres -d mydb
By default psql will try to log you in to the DB using the same username as your current unix username. Convenient, because that user has full privileges on the DB.
Should you need to specify the user as whom to connect, you can do so with -U.
Wrapper?
Just like with sqlite, you determine access to the DB by setting unix permissions. We do it on the socket, with SQLite you do it on the DB file itself.
So, how about a wrapper Python module to make all this just as easy as
import sqlite3
conn = connect("create_me_if_I_don't_exist.db")
Yes, one day. Fork off a db server process, terminate it using the atexit decorator. It just might be that simple.
Tags: English, postgreSQL, python, sqlite —

Posted by
Wicher
Topic:
Code
December 9th
2009
Just finished up a 0.1 version of a LIRC (Linux Infrared Control) plugin for the Exaile media player. Now you can use your remote with Exaile efficiently. The plugin is in the public repository and is called Lircaile.
I haven’t touched Python much as of yet, but I’m pleased with it: it appears to be a consistent language. Well, here’s my 0.1 effort. I desperately wanted to have some fun with introspection, but I have the feeling the nested exception logic is a bit… unusual.
Update #20110222: Updated the inline code preview below to 0.3.0.
# A LIRC plugin for Exaile. Depends on pylirc from http://sourceforge.net/projects/pylirc/
# Copyright (C) 2009-2011 Wicher Minnaard, http://smorgasbord.gavagai.nl / wicher@gavagai.eu
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
import pylirc, logging, threading, select
LIRCAILE = None
def enable(exaile):
_enable(None, exaile, None)
def _enable(eventname, exaile, nothing):
global LIRCAILE
LIRCAILE = Lircaile(exaile)
def disable(exaile):
pylirc.exit()
class Lircaile():
def __init__(self, exaile):
self.exaile = exaile
self.logger = logging.getLogger(__name__)
sock_fd = pylirc.init('lircaile')
waitlirc = threading.Thread(target=self.wait_lircevent, args=(sock_fd,), name='Thread-lircaile-waitlirc')
waitlirc.daemon = True
waitlirc.start()
def wait_lircevent(self,sock_fd):
while True:
"""Pops all queued signals off of the LIRC queue and hands them to
handleCode() for further processing."""
select.select([sock_fd],[],[])
try:
[code] = pylirc.nextcode()
self.handleCode(*code.split())
except TypeError:
pass #empty queue
def handleCode(self, command, *arg):
"""Takes LIRC signals and uses introspection to try to find appropriate
exaile functions to call based on the name of the signal. """
if (command == 'chvol'):
self.exaile.player.set_volume(self.exaile.player.get_volume() + float(arg[0]))
elif (command == 'seek'):
self.exaile.player.seek((self.exaile.player.get_position()/1000000000) + float(arg[0]))
else:
func = None
# Look for a matching playlist function
try:
func = getattr(self.exaile.queue, command)
except AttributeError:
# No? Then look for a matching player function
try:
func = getattr(self.exaile.player, command)
except AttributeError:
# No? Then we're out of options
self.logger.warning('No function to handle the "%s" LIRC event' % command)
if callable(func):
func()
Tags: English, exaile, lirc, lircaile, python, remote —