Just yesterday, President Obama signed an executive order that requires government agencies to publish their data in “open, machine readable formats”:
the default state of new and modernized Government information resources shall be open and machine readable.
I have a hard time imagining better uses of the President’s dictator-like power than this.
Personally, I don’t believe the President (or any individual for that matter) should have the ability to make Laws without first submitting them to a review process and subsequently a vote. Executive Orders are problematic because they bypass Congress, it is a flaw in an otherwise reasonably balanced system:
However, the consequences of this particular executive order are in our favor, so this is a good thing, despite the fact that it came about because of a bad mechanism. Forcing the Bureaus to open up their data for public consumption enables individuals and groups outside the government to do things with that data that most of the bureaucrats could never have imagined.
All of this is great, provided the data are accurate, it is entirely possible that data could be ‘fudged’, ‘massaged’ or just plain made up. So in addition to the newly hackable government data, there should be a more active skepticism about the accuracy of that data. For example, if the Department of Homeland Security is reporting that there are cyber attacks coming from China, that data should be cross-checked with that of ISPs to ensure that there is a legitimate threat before any laws are passed or executive orders signed.
I think this is a good thing that came about for the wrong reasons, but the consequences are more important than the intentions, because the consequences really happen, intentions are just in the mind.
(Check out that link above, fmota wrote about how they discovered a
fixed point in the base64 encoding function, it’s very interesting)
Ruby’s Fixnum class has an instance method called hash. It is the
hash function
used by the Hash class to locate the value.
One thing to note that is interesting,
1
42.class==42.hash.class# true
The integer literal 42 is an instance of Ruby’s Fixnum class,
which is exactly the type that is returned by Fixnum#hash. So, if we
let N be the set of all Fixnum values, and h be the hash function,
then the function
\[h: N \to N \]
Does h have a fixed point? Let’s find out, the generic way to find a
fixed point is to apply the function over and over and see if any of
the iterates are the same:
So the integer 4237831107247477683 is a fixed point of
Fixnum#hash, that means that in the implementation of Hash, the
value 4237831107247477683 would have itself as a key.
There are more examples (play with the code yourself!), and I would
like to look deeper into why this hash function has a fixed point.
I am currently working my way the Structure and Interpretation of
Computer Programs
and I’ve skipped past exercise 1.14, and come back to it after a bit
of thinking, here’s the problem, and then the exercise.
The Problem
How many ways are there to make change of a given amount a with the
following kinds of coins?
pennies
nickels
dimes
quarters
half-dollars
There is a recursive solution to this combinatorial problem, which can
readily be made into executable code in Scheme, this kind of solution
is very standard in enumerative combinatorics:
The number of ways to change amount a using n kinds of coins
equals:
the number of ways to change amount a using all but the first
kind of coin, plus
the number of ways to change amount a - d using all n kinds of
coins, where d is the denomination of the first kind of coin
Note that those two items are mutually exclusive and exhaustive
conditions, so the result can be calculated by simply adding the two
values.
In scheme, the above list could be transliterated as:
12
(+ (cca(- n1))(cc(- ad)n))
Where (cc a n) computes the number of ways of changing amount a with n
kinds of coins.
The full code for the count-change procedure can be found
here.
The Exercise
With the count-change procedure at hand, Exercise 1.14 is to “draw
the tree illustrating the process generated by the count-change
procedure in making change for 11 cents.”
The Solution
The count-change procedure uses the (cc a n) procedure where
n = 5, and the cc procedure naturally gives rise to a binary
tree that locally looks like this:
I prefer to make the computer go through all the steps and produce an
image for me, so I took a break on 1.14 and thought about it for a
while.
To represent the tree, I used the graph-description language
DOT
To generate the tree, I started by adding a print statement around the
recursion steps, the problem with that is that there can be distinct
nodes that happen to have the same argument values, that is, the node
in the tree may be labeled (cc a n), but there may also be multiple
nodes with the same a and n values. To avoid this, each node must
be given a unique id, and then be displayed with the (cc a n) label.
One way to label a binary tree’s nodes is to make the id be a map of
the location of the node in the tree. For example, if a node of the
tree has id x, then the root’s children will be xl and xr,
respectively, where l stands for ‘left’ and r stands for ‘right’.
If the root’s id is s, then a typical node would be labeled
something like sllrrl. Starting at the root, you can find the node
by going left two times, right two times, and then left.
Here is the full source of the tree-generating code cc-graph:
(define (cc-graphamountkinds-of-coins)(define display-node(lambda (labelamountkinds-of-coins)(begin(display " ")(display label)(display " [label=\"")(display `(cc,amount,kinds-of-coins))(display "\"];")(newline))))(define display-edge(lambda (ab)(begin(display " ")(display a)(display " -> ")(display b)(display ";")(newline))))(define base-case(lambda (amountkinds-of-coins)(or (< amount0)(= kinds-of-coins0)(= amount0))))(define left(lambda (label)(string-append label"l")))(define right(lambda (label)(string-append label"r"))); label is the unique label of the function invocation(define (recurselabelamountkinds-of-coins)(if (base-caseamountkinds-of-coins)(display-nodelabelamountkinds-of-coins)(begin(display-nodelabelamountkinds-of-coins)(display-edgelabel(leftlabel))(display-edgelabel(rightlabel))(recurse(leftlabel)amount(- kinds-of-coins1))(recurse(rightlabel)(- amount(first-denominationkinds-of-coins))kinds-of-coins))))(begin(display "digraph {")(newline)(recurse"s"amountkinds-of-coins)(newline)(display "}")))
Finally, the output of running (cc-graph 11 5), then piping the
results into GraphViz gives the desired tree:
I love this way of visualizing recursion, you can see how the problem
is reduced into simpler sub-problems, and that there is a distinct
‘shape’ to the computation.
There are more than 100 edges in that tree, I would not have wanted to
do that by hand, all for a measley value of four.
The final value of (cc 11 5) is 4, that is, there are 4 ways of
making change for 11 cents. Unfortunately, this solution doesn’t say
what exact combinations of coins, only that there are four.
Just thinking about it, you can make 11 cents with
11 pennies
6 pennies, 1 nickel
1 penny, 2 nickels
1 penny, 1 dime
I would like to generalize cc-graph so that I can get a
visualization of any recursive function in Scheme, this will take more
knowledge of the language and it’s introspective features, stay tuned!
There is a quote, usually attributed to Mark Twain that goes something
like:
“There are three kinds lies.
Lies, Damned Lies, and Statistics.”
My interpretation of this is that statistics are supposed to be the
worst kind of lie, or that the worst kinds of lies use
statistics.
The thing that bothers me most about this quote (and the innumerable
minor variations of it that get repeated) is that the word
‘statistics’ comes at the end.
Why does that matter? Notice that the list is presented as a sequence,
an increasing sequence of damned-ness, and the presence of the word
‘statistics’ at the end is supposed to imply that it is ‘more damned’
than damned lies.
This interpretation bothers me because the implied damned-ness is
based on the initial correlation, and that correlation is only based
on two data points. The quote depends on a misunderstanding of
statistics. Anyone who has studied a little bit of statistics will
know not to trust an inference based on a correlation in a data set
of only two points!
About a week ago I wrote about how to hook up a light switch to a raspberry pi. Having a computer-controlled light switch is nice, but the novelty wears off pretty quickly. The next question that arises usually is how can I make this useful?
At work, our continuous integration server, which runs Jenkins, lets us know when one of the team members has broken the build. To make sure that we get the memo promptly so we can commence with the public shaming, we use tools that change color to indicate the current test status.
The problem with our current way of doing things is that there is no sound, and it requires that someone be at their computer. To remedy this situation, we wired up physical lights to a raspberry pi running a Debian GNU/Linux variant, and wrote this script to toggle the lights.
On means Passing (all jobs passed)
Off means Failing (at least one job failed)
The data flows from Jenkins to the Raspberry-Pi as follows:
gpio stands for (General Purpose Input/Output), the utility is part of the wiringPi package
NOTE: You need to set the following environment variables:
And our goal is to unmarshal it into a Go data structure. The article goes into more detail, and two solutions were proposed. A commenter came up with a third solution, and another commenter dustin proposed using his library called jsonpointer, which operates on the raw byte array of the json string, instead of unmarshalling first and then traversing the data structure.
I used Dustin’s library, and to great avail, the only gotcha was that json strings were returned with the double quotes in them and some trailing spaces, but I made a little function that returned a slice of the original bytes:
123
functrimJsonBytes(toTrim[]byte)[]byte{// implementation found in solution.go}
The powerswitch tail looks like an extension cord with some holes in it to wire it into your own circuit. Connect the powerswitch to the raspberry pi as in the image below (on is connected to pin 23):
Then, the following python program will allow you to type ./switch on or ./switch off from the command line as root.
123456789101112131415161718
#!/usr/bin/env python# Turn switch on/off# by tlehmanimportRPi.GPIOasioimportsysio.setmode(io.BCM)pin=23io.setwarnings(False)io.setup(pin,io.OUT)# set pin 23 as outputiflen(sys.argv)>1:state=sys.argv[1]# get command line argument ('on' or 'off')io.output(pin,state=='on')# if state is on, let pin 23 connect the circuit, otherwise break the circuitelse:print("Usage: ")print(" ./switch (on|off)")
To run this, carefully plug in a lamp (or other appliance that uses a standard 120V U.S. outlet), then
123
sudo su
./switch on
./switch off
Here is a video of the light being switched off and then back on, not very exciting, but it works:
This on it’s own is not very useful or amusing, but this can easily be tied together with any API or command line utility. For example, I plan to connect this to our continuous integration server at work so that every time the tests fail, the switch turns some lights off, this could be achieved with a cron job, or perhaps a hook on Jenkins that sends a signal to the raspberry pi, there are so many possbilities.
Using the following properties of the XOR function:
Associativity
\[(a \oplus b) \oplus c = a \oplus (b \oplus c) \]
Commutativity
\[a \oplus b = b \oplus a \]
Identity
\[a \oplus 0 = a \]
Self-Inverse
\[a \oplus a = 0 \]
As a bit of trivia, note that all n-bit integers form an Abelian Group under XOR. The proof of which can be found by using the obvious isomorphism of n-bit integers with {0,1}n under addition modulo 2. Note that addition modulo 2 is equivalent to bitwise XOR.
So, using the C programming language, we can use the convenient ^= operator as a way to swap the values of a and b without using an intermediate variable.
123
a^=b;// (a ^ b)b^=a;// b ^ (a ^ b) which is the original aa^=b;// (a ^ b) ^ b which is the original b
Here is a full working program that implements this operation using a C macro:
123456789101112131415161718
#include <stdio.h>#define show(a,b) printf("a = %d, b = %d\n", a, b);#define swap(a,b) \ a^=b; \ b^=a; \ a^=b;intmain(intargc,char*argv[]){inta=3,b=5;show(a,b);swap(a,b);show(a,b);return0;}
About a month ago I wrote about a command line utility I made that calculates word and character frequencies. It was packaged as a ruby gem and it interacted well with the Unix pipeline interface.
Then, about 2 or 3 weeks later, I come across this post on Twitter:
Show how many times each line in a sorted file is repeated: uniq -c
And I realized that I could construct a one-liner that does what my gem did. Probably faster too. I know about uniq and sort, and I’ve used awk a little bit, but am not really familiar with most of it’s capabilities.
The two features I implemented in ruby were (1) counting word frequencies and (2) counting character frequencies. I defaulted everything to lower case and stripped out non-alphanumeric characters.
Using @UnixToolTip’s suggestion of uniq -c, I came up with this alternative:
1
for word in $(cat filename); do echo$word; done | sed 's/[^a-zA-Z0-9]//g' | tr '[A-Z]''[a-z]' | sort | uniq -c | sort -nr | head
This just outputs the file, splits everything up by whitespace, strips out anything that isn’t alphanumeric, then lowercases, sorts, and counts the number of repetitions using uniq -c. The result of that is then sorted numerically, to get the most frequent items at the top of the output, and then displays just the top 10 lines using head. There are some small numerical differences between this and my gem, and that is most likely because I split by word boundary in ruby, but split by whitespace on the bash one-liner.
For the problem I was trying to solve, I could have saved some time by digging through the manpages instead of writing another gem. I did enjoy working with the Rubygems packaging system, but I am starting to think that was overkill.
NOTE: For the character count feature, all I have to do is output one character per line, then I can insert that into the pipeline to get the desired output:
1
(CONTENTS OF FILENAME, 1 CHARACTER PER LINE) | sed 's/[^a-zA-Z0-9]//g' | tr '[A-Z]''[a-z]' | sort | uniq -c | sort -nr | head
I’m not sure how to do this at the moment, I think awk can do it pretty simply, I’ll read the manpages, but for now I have to get to work.
OpenCV is a mature Computer Vision library written in C++. There are python bindings available that make working with the library very convenient.
To extract the frames from an mp4 file using the Python OpenCV bindings, you first need python, and the pip package manager for python.
Also, you will need a system package manager such as Portage for Gentoo Linux or Homebrew for Mac OS X. You will need to compile OpenCV with your specific python version, for this example I am using python 2.7, you can find out which version you have by running python --version. This example will use Mac OS X 10.8.
The easiest way to extract frames is to use the read() method on the vidcap object. It returns a tuple where the first element is a success flag and the second is the image array. To extract the image array:
12
success,image=vidcap.read()# image is an array of array of [R,G,B] values
To convert the whole video to frames and save each one:
1234567
count=0;whilesuccess:success,image=vidcap.read()cv2.imwrite("frame%d.jpg"%count,image)# save frame as JPEG fileifcv2.waitKey(10)==27:# exit if Escape is hitbreakcount+=1
The above loop runs until success is false or the user hits the Escape key. Each iteration saves the current frame as a JPEG.
This is one way to extract a sequence of frames from a movie. Another way would be to change the body of the above while loop to do something else.