Internet Workshop

Your guide to the rapidly evolving landscape of software engineering and web technology – since 1995.

JAVA_HOME on linux (at least for compiling ruby-hdfs)

Posting this as the links I found for setting JAVA_HOME seemed to erroneously state to set it to the full path to the java executable. I’m not sure if there is a case where you would want to do that, but if you want to compile the ruby gem ruby-hdfs, then JAVA_HOME should be set to the directory above where the java binary is located, ie

export JAVA_HOME=/usr/lib/jvm/java-6-sun-

where the java executable is located at /usr/lib/jvm/java-6-sun- Then the install is straightforward

$ gem install ruby-hdfs
Building native extensions. This could take a while...
Successfully installed ruby-hdfs-0.1.0
1 gem installed

hadoop fs is space-sensitive

HDFS, high density file system, is useful for big data. However, hadoop fs is not quite there as a shell replacement. Today I kept getting the message

cp: When copying multiple files, destination should be a directory.

when trying to copy multiple files to a directory using

hadoop fs -cp /path/to/files/*  /path/to/destination/directory

Finally figured out that the problem was I had two spaces between the file list and the directory path, which made hadoop not see the directory path in the command. Aaahh.

ruby non-intuitive multi-dimensional array assignment

All I want to do is work with an array of arrays…

ruby-1.9.2-p290 :012 > a =,[]) # here lies the problem...
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :013 > a[1].push("a")
=> ["a"]
ruby-1.9.2-p290 :014 > a
=> [["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"]]

Trying again…

ruby-1.9.2-p290 :019 > a =, # This doesn't solve it
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :020 > a[1][0] = 'a'
=> "a"
ruby-1.9.2-p290 :021 > a
=> [["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"]]

Argh! Perl makes this so easy…

Ok, the problem was, that the first two ways of initializing the array, were just creating 8 pointers to the SAME array

Now do it the right way:

ruby-1.9.2-p290 :031 > a = { } # NOW we have an array of different arrays
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :032 > a[1].push('a')
=> ["a"]
ruby-1.9.2-p290 :033 > a
=> [[], ["a"], [], [], [], [], [], []]

Ahh…but I miss an interpreter that always tries to ‘Do The Right Thing’

And, I wish the two versions didn’t look so identical when inspected…

WordPress debug notes

Note: I’m not a wordpress expert, just returning to it after several years without having touched PHP – and looking for the best way to quickly understand the flow of a wordpress site using buddypress and a few other plugins. Raw notes here, will be annotated as I progress…

Clojure makes the JVM a friendly place…

Yes! Someone gets it – “It has always been an unfortunate characteristic of using classes for application domain information that it resulted in information being hidden behind class-specific micro-languages, e.g. even the seemingly harmless employee.getName() is a custom interface to data. Putting information in such classes is a problem, much like having every book being written in a different language would be a problem. You can no longer take a generic approach to information processing. This results in an explosion of needless specificity, and a dearth of reuse.”
–Rich Hickey,

Data is just data. Please, coders – free the data from all those bureaucratic OO controls on it, and just expose the rules if any, let us obey them thoughtfully our own way. It has got to be better than all these little bureaucratic fiefdoms exerting paranoiac control over their bits of data.

Ok, so it sounds good…now to finish the clojure koans, having realized that the ___ construct is just where you put your answers and not a new triple-underscore special variable. But I’m also hacking on a larger clojure app while learning the basic syntax…it really does seem promising…

need flat, fast namespacing + tags

tags are ok but they are missing a namespace; maybe there is implicit one from the blogger but for global discovery what about

gvelez:chickens (means my chickens, not chickens in general)

quick to write without requiring much forethought or looking up namespaces, but more precise and less prone to overlap than flat tags

fwix beats for ease of use

Looking for local? (ie, an API to get local data specific to a subject). Tried, the API looked promising and their site had good data, but after a week of 403 errors using their example code snippets and no response from the forums or support, was about to give up for awhile. Then I noticed the Weather Underground folks using They have an extremely simple API, decent quality data (at least if you filter for News or Places), and although its not as complete as I’d like, it does have more than just twitter posts. For ease of use they get 5 stars. Will see how they do on updating, relevency and completeness…

Relations API enhancement

Just posted a patch: Expanded predicates for Relations API that gets one step closer to being able to combine free tagging with RDF. Basically the idea is simple: use the existing free tagging capabilities of Drupal to let users build up a smart domain-specific vocabulary, and use those words as predicates in RDF statements. I’m kind of surprised there didn’t seem to already be an easy way to do this.

The patch above does the step of including the vocabulary terms as predicates for Relations API. I still need to do another patch, probably to taxonomy_xml, to automatically expose them as RDF, and then expose something that makes it easy to output them in nice clean RDFa with the nodes. Once that is done, it seems pretty powerful to me. I like being able to write something and in a structured way say it was inspired by a book, that it implements a philosophy, or that it is useful for some specific goal.

This is partly done at now; the relations are visible to the user along with each node. They aren’t output in RDFa yet, though, that still needs to be done along with the automatic vocabulary exposure as RDF.

Anyway, I wanted to post what I have so far, as its in production on a live site and might be useful for someone else in its current form.

expand on the rel= link attribute

It seems sort of amazing, that with all the man-hours of development of the semantic web, that I’m still not aware of a widely-used standard for tagging a link with a meaning. Page tagging has gotten us pretty far, but as I understand it what Tim BL’s original app did, was to assign a meaning to each link.

We do have the rel= attribute, which we’re allowed to expand upon according to the spec:

Since a source doc, target doc, and link tag comprise a triple, why not leverage this more by encouraging the use of vocabularies in the rel= attribute? It would seem to me to be the simplest possible solution and could be supported by many different tools.

rel= is indeed used within the RDFa spec, but I’d argue that by itself a link containing a rel= tag is sufficient to define a triple. I’d also like to see a commonly used curie of


to mean ‘anyword’ in the english language. Lets also leverage the existing structure of language – Google does!

Vocabulary reuse not well solved yet…but RDFCCK may just be a step

Some links relevent to vocab reuse/microformats. In particular RDF vocabs should be simple to import into all the modern CMS’s, but we’re not there yet.

Evoc drupal module for importing RDF vocabs and exposing properties in Drupal
RDFCCK drupal module by the same author, Stephane “scor” Corlosquet
article by Marco Neumann on importing semantics into Drupal

Import SKOS into Drupal taxonomy but then do you need to use the RDFa module to have your Drupal content part of the linked data initiative?
Follow-up (If I post as a comment, not visible from front page – need to configure WP to do follow-up threads better):

Reading Scor’s paper, sounds like work is heading in the right direction. The autocompletion sounds promising

“To this end, our module adds a new tab “Manage RDF mappings” to the content
type administration panel of CCK for managing such mappings cf. Fig. 2. An autocomplete list of suggested terms is shown, based on the keyword entered by the user.

The terms are coming from two different sources, which are detailed below.

External vocabulary importer service

The module RDF external vocabulary importer…”

  • 1
  • 2