I was part of a fun little excursion into the world of fintech (née “financial technology”) this weekend. At the FinTech hackathon in New York, Ben Gundersen, an extemely talented guy and a wizard with JS, and I wrote a simple visualization tool we call “GlobalFX”. The idea was to create an interesting and somewhat useful way to view foreign exchange rates over time.
First, a quick shout out to some awesome companies who helped us out. Some wonderful people (thanks, Francesca and Matt!) at 10Gen got us in touch with the MongoHQ folks who, at 11PM EST on a Saturday, spun up a 20GB MongoDB instance for us. Right before that, Tammer at Quandle basically gave us unlimited API calls. We also used Oanda (who have incredibly fast data serving ability), and intend to integrate them even more fully with the application in the coming days.
The Problem & Solution
We came into the event knowing that we wanted to make a novel, useful visualization. In retrospect, that’s a rather tall order. I think we ended up doing that, but I’ll leave that judgement to you, gentle reader.
The problem? Visualization the value of currencies relative to each other, over time. ForEx traders have very few comparative visual analysis tools, making EDA rather difficult. As someone who is interested in data as a medium for driving discovery, EDA is very important to me.
We chose to make a dynamic global chloropleth map of the value of currencies relative to a selected country, over time. That’s quite a mouthful, so here’s a screenshot to show you what that might look like.
Here, China is selected. To the north, where it is a very vibrant red, you can see that Russia is doing particularly poorly. This indicates that Russia’s ruble has declined in value relative to the China renminbi from the start date of the data (here set to 2012-01-1) to the current date of the animation (2012-03-30). Conversely, Iran’s (bright green) currency has done quite well.
Oh yeah. That’s right: it’s an animation. The values change before your eyes, as the globe rotates slowly (or you drag it, zoom in on it, or keep it in place by hovering over it). You can check it out here, though I don’t know how much longer our Mongo instances will be running (and thus, how much longer this will work).
The Technology
Our backend is an extraordinarly simple Python application using Flask to serve our content and run an api for returning data from our MongoHQ DB (their RESTful interface wasn’t working for us, but that was probably a result of our brains failing/it being 6AM when we tried to use it). The rest of our data came directly from Oanda, and we just pulled that in with AJAX, as it was in the right format for us right away. The much more interesting frontend is a JavaScript application using the usual suspects (JQuery, Underscore, and Bootstrap), and the magical, wonderful d3.js to map our GeoJSON and make visualizing the data the way we do possible in fewer than 24 hours.
The data processing came in the form of transforming a ton of CSVs from Quandl into a few GB of documents we could quickly serve from Mongo. We used pandas to make a few throw-away scripts handling the ETL of our data. It wasn’t the most interesting thing, but it made our app useable and fast. I was a bit jealous of the magic happening on Ben’s monitor, though…
Overall, we had a lot of fun. We didn’t win the whole thing, but we got a nice prize from Oanda, and met a lot of great people both working on projects and from the sponsor companies. Thanks to all the sponsors and organizers who put on this great event, especially Nick for making it happen, and Novus for sponsoring a lot of it and being such great people in general. We all had a great time: I know I’ll be there next year.
Though I know I’m late to the party, I’ve been enjoying poking around Chas Emerick’s newest project, Clojure Atlas. It’s particularly pleasant to use it in conjunction with the Github page for Clojure, move to the relevant .clj and browse around it with the Atlas in accompaniment.
The Atlas is essentially a live graph connecting functions, macros, “Classes”, reader syntax and just about everything else you can think of. A single click brings up the source code, as well as the documentation. My one complaint is that the graph takes too long to stop moving, making it difficult to quickly navigate: according to his Twitter (and my memory), this is something he’s working on. My one wish for the product: integration with ClojureDocs, a great resource for burgeoning Clojurers and a way to make Clojure Atlas more accesible to those who are newer to the language, as well as make it more of an encyclopedia.
It costs $40 if you buy the Atlas for Clojure 1.2 and 1.3 (coming soon), or just $25 if you’d like to buy the 1.2 version. It is definitely worth a look.
It is a truth universally acknowledged, that a programmer using Clojure will want to perform IO. Let me help you out (put).
I’ll go over some of the basics of IO, focusing on what you can use Clojure to do directly. I’ll move on after the basic introduction, to some of the more interesting and generally useful classes that Java offers, giving a little context for each.
In
Reading files in is generally one of the first things I want to do when playing with a new language, so I’ll start there. Before I get started though, I should mentioned that in Clojure, strings are always encoded using UTF-16. Generally this saves time and worry, but it’s something to keep in mind should you run into problems on the encoding front.
slurp
Clojure comes with a handy little function called slurp that takes in a string representing a filename (or, really, pretty much anything; a File, a stream, a byte array, URL, etc) and returns a string containing the contents of your file. It’s pretty handy if you just need to get some information from a file that’s relatively small, and you’ll be parsing it yourself.
(slurp "/Users/ihodes/Desktop/afile.txt")=>"A little bit\nof information here."
A nice thing about slurp is that you can easily build up file paths with str. For example, say you want to output to a file based on information you find at runtime:
But slurp is pretty basic, and once your files get large enough, totally impractical. Nonetheless, it’s a handy function to know about.
As a useful and comical aside, the function spit is the counterpart to slurp, except that instead of reading input, spit does output. More on this in a future article, though.
line-seq
One of my favorite IO functions has got to be line-seq; line-seq takes a reader object (which must implement BufferedReader) and returns a lazy sequence of the lines of the text the reader supplies. This is handy when you’re dealing with files (if this offends you, let’s be Unixy here for now and say that everything is a file) that are too big to merely slurp, but that are \newline delimited (or CR/LF delimited, if you’re of the Windows persuasion).
(use'[clojure.java.io'(reader)])(take 2(line-seq (reader"bobby.txt")))=>("Bobby was a good boy,""and didn't complain too much")
Notice how we take 2 from the sequence we get from using line-seq. We can take as much or as little as we need; we won’t be reading much (Clojure will read a bit more than you tell it to in order to get more IO performance, but let’s not worry about that) more than we specify. We can do anything we want with the resulting seq; that’s the beauty of line-seq and the ubiquitous sequence abstraction.
Back in the day, Clojurists had to sink a little lower than the clojure.java.io namespace to use line-seq; two Java classes were needed. One of these Java classes is the most wondrous and amazing thing just below the surface of the more elegant and beautiful Clojure code; BufferedReader. Here’s how we used to do it;
(import '(java.ioFileReaderBufferedReader))(take 2(line-seq (BufferedReader.(FileReader."bobby.txt")))=>("Bobby was a good boy,""and didn't complain too much")
This might give you a better sense of what’s going on when you use reader, though in reality reader is far more complicated than just that: you can trust it to handle a variety of “readable things” and return to you a BufferedReader if possible.
FileReader will return a Reader on a file, and BufferedReader takes and buffers a Reader, as you might have extrapolated from the name. Readers are basically just objects upon which a few methods (like read, skip and close) may be enacted and expected to return reasonable results. line-seq essentially reads up until a line-delimiter and returns the read chunk as an element in the sequence it is generating.
While on the subject of files, I should probably mentioned the file function, from clojure.java.io. file takes in an arbitrary number of string arguments, and pieces them together into a file hierarchy, returning a File instance. This can come in handy.
Rivers? inputStreams? Brooks?
Streams are an especially useful class of readers. Oftentimes you’re reading in text; that’s what Readers do. But often you need to read in a stream of bytes; that’s where you need to use clojure.java.io’s input-stream.
As you can see, instead of getting characters from this file (like we get when we use a reader), we’re getting integer byte values. This can be useful when reading, for example, a media file.
In general, strings are always UTF-16, which are 16-bit pieces of data, whereas byte-streams are 8-bit pieces of data. It bears repeating that the stream operators should be used when you’re not dealing with strings: they are not trivially interchangeable, as they might be in other languages where strings are syntactic sugar for byte arrays.
RandomAccessFile
Finally, let me introduce to you a spectacularly useful Java class. RandomAccessFile is a class which allows you to quickly jump around in a large file, and read bytes from it.
Note the second argument of the constructor, ”r”; this indicates that we’re opening the file just for reading. Now that we have f, we can use it to navigate and read the file:
(.readf)=>105(.lengthf)=>2015;; this is the number of bytes this file is in length(.skipBytesf20)(.getFilePointerh)=>21;; the position we're at in the file(.readf)=>89
As you can see, you can jump around (quickly!) through a file, and read from the parts you want, and skip the parts you do not want. The key methods/functions here (among many others that can also be useful; be sure to check the documentation) are read, length, skipBytes, seek and getFilePointer.
Closing
Every file that is opened should be closed, and what we’ve been doing is a little unsafe. In order to close an open reader/file, we should use the close method on it; in the above example, when you’re done with f, simply execute (.close f) to tell the file system that you’re done with the file. Alternatively, and more idiomatically, you can open your files with the handy with-open binder:
When you’re done with f, Clojure will close it, and you won’t have to worry one iota about it.
Digging Deeper
Should slurp and line-seq not be enough for your reading needs (and chances are that, should you code enough in Clojure, they won’t always been), you might want to explore clojure.java.io some more, as well as some of the Java classes (namely, those stemming from Reader and BufferedReader, as well as InputStream and BufferedInputStream) mentioned above. See my previous article on using Java if you’re unfamiliar with using Java.
Next up is an introduction to the “outs” of Clojure and Java. Stay tuned!
I owe a big thank you to [Phil Hagelberg](http://technomancy.us/) for reading over this essay and offering advice. If you don't already, you should be using his [Leiningen](http://github.com/technomancy/leiningen) for both dependency management and a stress-free development environment.
A few weeks ago a provoking question was posted on StackOverflow asking if, for example, Common-Lisp’s numberp could be written using only the primitives that John McCarthy presented in his now famous essay introducing LISP.
My answer was yes, but it wasn’t the kind of yes you might expect. In his essay, McCarthy did not even mention numbers. There are no primitives to deal with them; only primitives to construct and deconstruct lists (i.e car, cdr, cons), as well as a few to enable the creation and naming of functions (i.e. lambda, label). But there are no number primitives: no direct interaction with the ALU, no way to directly access and manipulate integers and floats. So we have figure something else out.
It’s All Lists From Here
So how can you represent numbers?
I’m going to eschew using the troublesome cons cell in my explanations. Instead, we’ll use the more common list notation that’s both easier to explain and easier to write. Of course, I’ll be using the cons function (how could I not?) and you should at least have an idea of how that works.
It turns out that there are a number of ways to represent the natural numbers. Paul Graham postulates one way of representing them in his essay The Roots of Lisp. I’ll state here, slightly tidied up, his suggestion:
Let the empty list be zero.
Let 1 = (cons 0 ())
Let 2 = (cons 1 ())
etc.
In this way we can represent all the natural numbers. It’d be trivial to write a function number? that would test to see if if its argument had the form of a number. There are a number (hah!) of things we could do with this representation, but I think we can do better.
Edit: Phil Hagelberg reminds me of another, very important, representation of the natural numbers I’d be remiss to leave out: the Church Encoding. Church Encoding is way of representing the natural numbers using only lambda calculus: there are no lists involved. Instead, a number n is representing by a higher order lambda abstraction (a function), and returns the result of that function being composed with itself n times. There’s actually a fair amount that can be done with this representation, but it’s outside the scope of this essay. For the curious, the afore-linked Wikipedia article does a fine job of covering the basics.
The More Lists, the Merrier
But really, that’s not enough lists, is it? (((((((((()))))))))) is not nearly an unwieldy enough definition for 10. We need more lists.
Let’s take a step back and look at set theory; a branch of math and CS that’s informed a lot of underpinning of CS and programming in general. Set theory has already provided us with a handy way of defining numbers based on lists. Well not lists, sets, but sets are basically lists. In fact, contrary to popular/programming belief, sets can hold the same thing multiple times, but in set theory that means nothing. We’ll say the same thing about lists: “when you say my list a contains b and b, I just hear that b is in a”.
So how do we represent numbers using only sets? Well, we start with the empty set {} being zero. But, and here’s where things get weird, we now define a function S called the “successor function”. The successor of n is the union of n and the set of n, where union is a function of n and m that returns the set which contains all the elements in n and all the elements in m. S(n) = {n and all the elements in n}. To make this clearer, let’s look at a few examples.
At the very least, this is an interesting definition of the natural numbers. The predecessors of each number are contained within it. And the set of all the natural numbers has a form similar to an individual natural number. Neat.
Let’s see how we can represent this with lists. Our goal is a function, S, that returns the successor of a given list.
(define S(lambda (n)(unionn(cons n()))))
Perfect, though we don’t exactly have a union function yet:
Now we do. You can test it out in your own REPL, or online with TryScheme. We’re going to move on now.
So What?
Thats a valid question. At this point, it seems as though we’ve merely complicated Mr. Graham’s elegant and simple proposition. It turns out that we’ve actually made things easier, in the long run.
Before we even get to addition and the like, let’s look at one example of where we’ve made things easier and more elegant. Let’s look at an ordering on the natural numbers; <. With this definition, we don’t need anything else to define it: n < m if (and only if) n is in m. That’s a straight forward definition, and one we can check rather quickly with a Lisp function you might have on hand anyway (The Little Schemer calls it member?).
But first we need a better eq?. The current eq? operates only on atoms; we need an equal? that works on our list representation of numbers. To keep things simple, we’re going to assume our lists are ordered. That will be the case if we build our numbers with just S.
This works because all our natural numbers are either the empty list or a list of natural numbers, thanks to our nicely defined S function.
It turns out that this successor function is also nice for defining and working with ideas like ordinals and cardinals, and just generally makes out theorems easier to prove and reason about. But, without writing a book on it, suffice it to say there’s a certain elegance about it. I’ll leave it at that.
In Addition…
But back to our problem: numbers on a simple Lisp machine. We’ve now got a suitable representation of the natural numbers (though I’ve neglected to introduce the axioms required to fully justify this; let’s just use common sense and carry on), and we have a way of comparing two numbers.
We also have a way of adding one to a number in S. How about adding two natural numbers together?
That is a nice recursive definition, but we have a mysterious new P function now, as well. P stands for “predecessor”, and it gives us the natural number than m is a successor of.
Before defining P, let me present the set theoretical definition of addition:
A(n, 0) = n
A(n, S(m)) = S(A(n, m))
This is also recursive, but it has the unfair advantage of what the Clojurist in me calls the “de-structuring” of its operands. A can check to see if its second operand is a successor, and if it is will take the successor of the result of another A’s value.
With Lisp, we can check to see is m is zero, and if it isn’t, we know that it is a successor, we just don’t know what it is a successor of. That is where P comes in.
And out of nowhere comes a higher-order function! While set theory may have some expressive advantages over Lisp, we’ve got some special sauce, too. In this case (R union () n) is equivalent to the set theoretical “union over a set” that the Zermelo-Fraenkel Union Axiom gives provides. Most people call R “reduce”.
You might notice that we get duplicates in our lists, but as I said before, we don’t pay those any attention. To keep things clean and working well, we’ll implicitly apply a helper function, distinct, to the result of every operation. distinct is left as an exercise to the reader; but this is a very important exercise: some of our function will not work if distinct isn’t applied to the result of each function.
With a proper P defined, A now works flawlessly. Now let’s define M, multiplication and B, subtraction:
It gets easier and easier as we build off of our previous functions, and combine them to yield more complex, more useful abstractions. Finally, let’s write our number predicate;
And there you have it; we’ve got the basics of our natural numbers, represented using McCarthy’s basic LISP.
But, Wait! There’s More
Ah, we’ve left off the integers. And floats!
Fortunately, this isn’t a big deal. With our newly defined natural numbers, we can represent integers as a tuple of natural numbers, where the first natural number in the tuple represents the “negative” portion of the integer, and the second natural number represents the “positive” portion of the integer.
In this manner we can represent all the integers. Of course, we’ll have to create new functions for adding, subtracting and the like, but that becomes a matter of building new, higher level abstractions with our existing functions.
For example, we can define adding two integers like this:
Something important to keep in mind, however, is that different tuples can represent the same integer. For instance, the integer (2, 5) is the same as the integer (0, 3); they’re both what we would have called “3” back in the natural numbers section. In fact, it makes sense to call both of them “3” here. But the second example here is preferable, I’d say. This is something that is dealt with in set theory by equivalence relations; we can do the same thing in Lisp. Below I’ll define a function which returns the “simplest” version of a given integer; one where either the negative portion is zero, the positive portion is zero, or they are both zero.
Subtraction, multiplication and other operations on integers begin to fall into place with these definitions. Things don’t get more complicated as you move further away from your base representation (unless you try to formally prove all of my assertions): if anything, they get easier. This is a consequence of appropriate abstraction.
So far, we have arbitrarily large integers available to us (limited in practice only by memory and processing power). But we’re still missing a type of number most programmers expect to have available to them: the float. Well, we can do that too.
First, let’s make sure we know what a float can and cannot represent. Floats can’t represent all real numbers: that is to say, every float is representable by a ratio of two numbers. In fact, traditional floats can’t even represent all rationals: for instance, 1/3 is 0.333… repeating forever. We would run out of bits to represent it on a real computer. But we can represent all numbers a float can, and more, with ratios. And, as I’m sure has now become clear, we definitely have ratios in the form of a tuple of integers, the first representing the numerator and the second the denominator. These are the rational numbers.
Again, an equivalence relation on the rationals is called for, as are new operations. This time, though, they are left as an exercise for the reader.
Final Words
Though a lot more can be said about the intersection of set theory and Lisp, I’ll stop while this post is still of Internet-length. If there’s enough interest (mostly on my part), perhaps this will be extended into a series of essay on the subject, mirroring the progression of a elementary set theory book.
This essay is a product of my investigation into the possibility of a truly programmable and reprogrammable Lisp environment, one in which such a definition of the natural, integer and rational numbers would not only be feasible and efficient, but idiomatic. I’m a long way from explaining how such a system would work, but I thought I should record at least some of the byproducts of my thought process so that someone might obtain some value from my mental wanderings.
I should note that while I used the operators and and or, neither were in McCarthy’s original LISP. Consider (and a b) a stand-in for (cond (a b) (else #f)) and (or a b) for (cond (a #t) (else b)), and all is well.
If there are errors in my explanations or code samples, please let me know. I certainly simplified much of the content so that it could fit in about 2000 words, but it should be correct.
The Golden Ratio is a number that’s been fascinating artists, mathematicians and philosophers for millennia. The fascination started with Mr. Phidias , circa 400 BCE, the sculptor involved in making the statues of the Parthenon. He considered the ratio a rule by which to craft the perfect human form, and it’s after him that we know the Golden Ratio as φ, the Greek letter phi.
A few mathematicians and philosophers later, Fibonacci discovered that the ratio, from term to term, of his eponymous sequence approaches φ. Pretty cool, that.
Used in architecture, painting, drawing, and many works of adulatory prose, φ has garnered a lot of attention over the ages. But there’s a darker side to φ
Calculating the Golden Ratio
As already mentioned, the Golden Ratio can be calculated using the Fibonacci sequence. In order to do that, though, we need the Fibonacci sequence. I’ve decided to make a lazy version of the sequence, so we can think of it as an infinite sequence. I split my fibo into two parts: fibo-from, which gives you the entire sequence after the point you give it, and fibo, which gives you the entire sequence.
If we map double (the function) across the ratios we get in return, we can cleary see that we’re getting closer and closer to the Golden Ratio. Of course, we’ll never actually reach it. On the subject of ancient Greece, you could say φ tantalizes us.
Another way of calculating our fickle ratio is with a simple continued fraction. I’ve generalized the simple-continued-fraction function from my previous post on finding e, but the idea is the same:
(defn generalized-continued-fraction"Returns the value of the GCF of the sequences of numerators and denominators. 'denomseq must have one more element than 'numseq."[numseqdenomseq](let [sn(reverse numseq)sd(reverse denomseq)](loop [hn(first sn)sn(rest sn)hd(first sd)sd(rest sd)](cond (empty?sd)hd:else(recur(first sn)(rest sn)(+ (first sd)(/ hnhd))(rest sd))))))(defn simple-continued-fraction"Returns the final value of the SCF of 'sequence."[sequence](generalized-continued-fraction(take (dec (count sequence))(repeat 1))sequence))
The sequence that represents the Golden Ratio is rather simple and beautiful, and also surprising if you haven’t seen it before. It is 1, 1, 1, 1, . . . and so forth. Therefore using simple-continued-fraction to find it is trivial:
But in this seemingly beautiful number, a dark secret hides. But first we need to understand what makes a number evil. It’s a simple idea, really: if the sum of the first n digits of the fractal part of a decimal number is 666, the number is evil.
But in order to get precise results, we need to get a precise representation of the Ratio. Here, we get to harness the power of Java! In this case, we’re going to use the BigDecimal class.
Now we need to see about getting the digits from the fractal portion of the number. To save time, I’m going to cheat a little, and transform the result into a string and obtain the digits from that. Assume that I’ve stored the result of the previous calculation in the var gold.
(map #(Integer.(str %))(drop 2(str gold)))=>(61803398874. . . );; and so on
Now let’s see how many numbers we have to take to get the number of the beast. Assume that our sequence of fractal digits found is stored in the var goldseq.
And there you have it, the sum of the first 146 fractal digits of φ is 666. Verified: φ is evil. Of course, if it had not been evil, or it had taken the sum of more than 200 fractal digits to equal 666, this wouldn’t have worked. It would have been easy to test though: continue testing the sum of the first n digits of the number until the sum exceeds 666. (Which could fail if we continue to get zeroes. Per the halting problem, we might never know if a number is evil, depending on its fractal component. Unless we could prove something else about it. But that’s another story…)
But it’s not just the fact that φ reminds us of one of Hades’ cruelest trials (the trial of Tantalus), or that the first 143 digits sum to 666. No, there is yet more evil to be found in φ. If you take the ratios of the sections of a pentagram, you’ll again end up with the Golden Ratio. Evil everywhere!
In all seriousness, φ is a lot of fun to play with, and that’s about it. If you were serious about calculating φ properly, you’d find the positive root the quadratic equation which defines it. That equation is:
…and solving for the positive root yields the Ratio.
Regardless, I find sequences and continued fractions a lot more interesting and conducive to play in Lisp with.
Let me leave you with this piece of advice: beware the evil phi!