Wednesday, August 16, 2017

Tuesday, September 6, 2016

PDX Viz

Is this thing still on?  Just a tiny post to my most recent side project, PDX Viz.  It's a way to map/visualize cycling, ped, and transit connectivity around Portland.

Friday, January 16, 2015

cljx-sampling: A Clojure(script) library for sampling and random numbers

For a current hobby project I need the ability to generate seeded random numbers and/or sample items from collections in either the JVM or the browser. The PPRNG lib offers seedable random numbers, but it uses different generators depending on the environment. Given a seed I want to generate the same sequence of numbers regardless where the code is running.

So I've open-sourced a little library, cljx-sampling, that uses a seedable 32-bit Xorshift random number generator for consistent results in both Clojure and Clojurescript. I also reused some of my code from bigml/sampling and combined it with the Xorshift RNG to allow for convenient (and still consistent) in-memory samples over collections. Maybe you'll find it useful?

https://github.com/ashenfad/cljx-sampling

Wednesday, September 3, 2014

Sketching/hashing Algorithms in Clojure

Just a short note that I (and BigML) have open sourced a library of hashing / sketching based stream summarizers for Clojure.

Specifically, the library includes techniques that take streams of items and return summaries that can be queried for set membership (bloom filters), set similarity (min-hashes), item occurrence counts (count-min sketches), and the number of distinct items (with my favorite, the magical HyperLogLog).

This library was largely an educational exercise for me, as I wanted to better understand the world of streaming summaries for categorical data.  It's written in almost pure Clojure and backed by plain Clojure data structures.  So it's (hopefully!) easy to use and easy to serialize.  All the summaries are merge friendly making them a nice fit for distributed settings.  The big caveat is that I didn't spend much effort optimizing for speed.  Those in need of maximizing every CPU cycle may need to look elsewhere.