Tag Archives: Shell

Meta-programming in Shell

Wikipedia defines meta-programming as:

programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself while running

Uncle Wiki

I had to write a “framework” at work where a shell program would run other shell programs “dynamically”. Let’s dig in!

As I mentioned in my earlier post Two Colons Equals Modules, you can “emulate” modules and functions in Shell (at least in FreeBSD’s /bin/sh) by using ::, so it would be module::function

Here we will do the same, however we will do hook::module.

The goal is to have a Shell program that would take a pid as an argument and do something with that PID, say print a group of information, maybe use DTrace to trace it, etc.

Let’s start by writing our main program.

#!/bin/sh
set -m

usage()
{
  echo "${0##*/} pid"
}

# print usage if argc < 1
[ "${#}" -lt "1" ] && usage && exit 1

# load scripts
load_scripts()
{
  for ctl in ./*.ctl.sh;
  do
    . "${ctl}"
  done
}

# stop the runner by killing the PIDs
runner_stop()
{
  IFS=":"
  for pid in $1;
  do
    kill $pid
  done
  exit
}

# Stop the runner if user sends an input
# This is useful if the runner is executed via a controller
wait_input()
{
  read command
  runner_stop ${PIDS}
}

# a.k.a. main()
runner_start()
{
  # make sure the process exists
  _pid="$1"
  ps -p "${_pid}" 1>/dev/null
  [ $? != 0 ] && exit 2

  # initiate scripts
  load_scripts

  # change IFS to :
  # loop over $SCRIPTS and execute the add hook
  IFS=":"
  for ctl in ${SCRIPTS};
  do
    add::${ctl} "${_pid}"
  done

  # now that we know the commands, loop over them too!
  # inside the loop set IFS to "," to set args
  for cmd in ${COMMAND};
  do
    IFS=","
    set -- "${cmd}"
    run::$1 $2
  done

  # Add trap for signals
  trap "runner_stop ${PIDS}" EXIT SIGINT SIGPIPE SIGHUP 0
  # reset IFS
  unset IFS
  wait_input
}

RUNNERDIR=$(dirname "$0")
(cd $RUNNERDIR && runner_start "$1")

Let’s digest a bit of that. First, we check if the number of arguments provided is less than 1

[ "${#}" -lt "1" ] && usage && exit 1

then we call usage and we exit with return code 1

The load_scripts function will load a bunch of scripts (from the same directory) as long as the scripts are suffixed .ctl.sh

Here’s an example script, say fds.ctl.sh, which will print File Descriptors used by the process, we will use procstat internally.

#!/bin/sh

add::fds()
{
  COMMAND="fds,$1:$COMMAND"
}

run::fds()
{
  procstat --libxo=xml -w 5 -f "$1" &
  PIDS="$!:$PIDS"
}

export SCRIPTS="fds:$SCRIPTS"

Here’s where meta-programming comes into use (I think), we have a variable named $SCRIPTS, which is modified to add the script name into it, $PATH-style, and two functions, add::fds and run::fds. As you have guessed add:: and run:: are the hook names.

I’ll add another script, it will use procstat as well, but this time we will print the resource usage

#!/bin/sh

add::resource()
{
  COMMAND="resource,$1:$COMMAND"
}

run::resource()
{
  procstat --libxo=xml -w 5 -r "$1" &
  PIDS="$!:$PIDS"
}

export SCRIPTS="resource:$SCRIPTS"

The same applies here, one variable, $SCRIPTS and two functions, add::resource and run::resource.

Which means, after loading our scripts all four functions will be loaded into our program and the environment variable $SCRIPTS will have the value resource:fds:

Good? Okay let’s continue.

Since we used : to separate the name of the scripts we must set IFS to :, and we start looping over $SCRIPTS. Now we just run add::${ctl}, which would be add::fds and add::resource. We also pass the ${_pid} variable, if we need to

These two functions would do more meta-programming by setting the $COMMAND variable to script_name,arguments:$COMMAND, again PATH-style.

Which means that the $COMMAND variable has the value fds,89913:resource,89913:

The next bit is a bit tricky, since we’ve set $COMMAND to prog0,arg1:prog1,arg1,arg2: (well, not really arg2, but we could’ve) then we need to

  1. Use “,” as IFS
  2. Tell sh to set the positional parameters, so prog0 becomes $1 and arg1 becomes $2, etc.

and now we execute run::$1 $2, which would be run::fds 89913 then run::resource 89913.

I think I can make this better by running run::$@, where $@ is basically all the parameters, but will test that later

– antranigv at 6am reading the code that he wrote drunk

In the end, we add some signal trapping, we reset IFS and we just wait for an input.

Okay, so we now have a piece of software that reads other programs and modifies itself while running. We have a meta-program!

Let’s give it a run.

# ./runner.sh 89913
<procstat version="1"><files><89913><procstat version="1"><rusage><89913><process_id>89913</process_id><command>miniflux</command><user time>01:37:54.339245</user time><system time>00:19:43.630210</system time><maximum RSS>61236</maximum RSS><integral shared memory>5917491656</integral shared memory><integral unshared data>1310633336</integral unshared data><integral unshared stack>114278656</integral unshared stack><process_id>89913</process_id><command>miniflux</command><files><fd>text</fd><fd_type>vnode</fd_type><vode_type>regular</vode_type><fd_flags>read</fd_flags><ref_count>-</ref_count><offset>-</offset><protocol>-</protocol><path>/usr/local/jails/rss/usr/local/bin/miniflux</path><page reclaims>16939</page reclaims><page faults>7</page faults><swaps>0</swaps><block reads>5</block reads><block writes>1</block writes><messages sent>12603917</messages sent></files><files><fd>cwd</fd><fd_type>vnode</fd_type><vode_type>directory</vode_type><fd_flags>read</fd_flags><ref_count>-</ref_count><offset>-</offset><protocol>-</protocol><path>/usr/local/jails/rss/root</path><messages received>14057863</messages received><signals received>807163</signals received><voluntary context switches>79530890</voluntary context switches><involuntary context switches>5489854</involuntary context switches></files><files><fd>root</fd><fd_type>vnode</fd_type><vode_type>directory</vode_type><fd_flags>read</fd_flags><ref_count>-</ref_count><offset>-</offset><protocol>-</protocol><path>/usr/local/jails/rss</path></89913></rusage></procstat></files><files><fd>jail</fd><fd_type>vnode</fd_type><vode_type>directory</vode_type><fd_flags>read</fd_flags><ref_count>-</ref_count><offset>-</offset><protocol>-</protocol><path>/usr/local/jails/rss</path></files><files><fd>0</fd><fd_type>vnode</fd_type><vode_type>character</vode_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><ref_count>4</ref_count><offset>0</offset><protocol>-</protocol><path>/usr/local/jails/rss/dev/null</path></files><files><fd>1</fd><fd_type>pipe</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><ref_count>2</ref_count><offset>0</offset><protocol>-</protocol><path>-</path></files><files><fd>2</fd><fd_type>pipe</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><ref_count>2</ref_count><offset>0</offset><protocol>-</protocol><path>-</path></files><files><fd>3</fd><fd_type>kqueue</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><ref_count>2</ref_count><offset>0</offset><protocol>-</protocol><path>-</path></files><files><fd>4</fd><fd_type>pipe</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><fd_flags>nonblocking</fd_flags><ref_count>2</ref_count><offset>0</offset><protocol>-</protocol><path>-</path></files><files><fd>5</fd><fd_type>pipe</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><fd_flags>nonblocking</fd_flags><ref_count>1</ref_count><offset>0</offset><protocol>-</protocol><path>-</path></files><files><fd>6</fd><fd_type>socket</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><fd_flags>nonblocking</fd_flags><ref_count>3</ref_count><offset>0</offset><protocol>TCP</protocol><sendq>0</sendq><recvq>0</recvq><path>192.168.10.5:63835 192.168.10.3:5432</path></files><files><fd>7</fd><fd_type>socket</fd_type><fd_flags>read</fd_flags><fd_flags>write</fd_flags><fd_flags>nonblocking</fd_flags><ref_count>3</ref_count><offset>0</offset><protocol>TCP</protocol><sendq>0</sendq><recvq>0</recvq><path>::.8080 ::.0</path></files></89913></files></procstat>

Why XML? Because libxos JSON output is not “real” JSON when procstat‘s running in repeat mode, but that’s a story for another day.

All code examples can be found as a GitHub Gist.

That’s all folks…

Two Colons Equals Modules

Days ago I tweeted a shell function which is part of jailio’s code base. Jailio is a project I’ve been working on for the last 6 months. As the name implies, it’s a container management software for FreeBSD Jails.

It has two unique things compared to other Jail management software. First of all, it has no dependencies, it’s written purely in Shell. You can say the same about BastilleBSD, however, Jailio’s second unique thing is that it uses base tools only and requires the base system only. For example, you need to have bastille_enable in BastilleBSD, it also uses its own config files, etc. In Jailio, you need to have jail_enable, because technically Jailio automates jail.conf files. It also uses my patch to automate the jail.confs in /etc/jail.conf.d.

Anyway, back to our topic about Colons and Modules.

I like modules, I got introduced to them when I started programming in school. In Syria, we learn programming at 7th grade but in our school we started a year early, so 6th grade. We always start with block diagrams and then Turbo Pascal!

Yes, 16-bit Turbo Pascal was my first programming language and it had the concept of modules which we called Units.

And then you have languages like C or Shell which don’t have modules. If you use modules you KNOW that it’s hard not to use modules after that.

While reading the source code of vm-bhyve I learned that you can use two colons (::) as part of the function name, which can give you an amazing new superpower to take over the world write cleaner code.

For me this was a life-changer. I write a LOT of Shell code. I ship them to production too. No, you don’t need to write everything in a fancy new language and run it on kubernetes, you can always use simple languages like Shell and run them in a FreeBSD Jail. Or in my case, write in Shell to automate FreeBSD Jails.

Here’s an example code with “modules” in Shell. Note, this works in FreeBSD’s shell, I have not tested other Shells yet.

main.sh

#!/bin/sh

. ./mod1.sh

mod1::func1

mod1.sh

#!/bin/sh

mod1::func1(){
  printf "Here I am, rock you like a hurricane\n"
}
antranigv@pingvinashen:~ % ./main.sh 
Here I am, Rock you like a hurricane

As you can see it all relies on the concept that the function name itself has two colons in its name.

Here’s the code from jailio that I tweeted.

jail::get_next_id(){
  expr $(
    ( grep -s '$id' /etc/jail.conf.d/* || echo '$id = "0";' ) |
    awk -F '[="]' '{print $3}' |
    sort -h |
    tail -1
  ) + 1
}

After tweeting the code above Annatar replied that this should NOT work elsewhere and that’s how I got introduced to The Heirloom Project which provides traditional implementations of the original Unix tools from the original Unix source code.

Hopefully, I will see more people using “modules” in Shell scripts. Hopefully this trick works in other Shell implementations like Bash and zsh.

That’s all folks.