Welcome to Internet WorkShop

We provide consulting and custom software development services, as well as licensing of the Webglimpse search engine. We also collaborate with partners on ad-supported citizen journalism sites. Please visit our Services page for more details.

everyone wants to know you

Late night coding thoughts…as a prerequisite to any ultimate organizer we need an open source way to represent a packet of public and private info about self – the private encrypted. Then you can upload this packet to youtube, facebook, linkedin and all the Gb sites that want to know you, if you want them to. Allow peer to peer updates or have software that knows who you gave it to when you update it, or specify one site as the master and update it there first. Only sites with your private key that you trust could access/update your private data, or use it only locally. This will speed the development of organizer/personalization apps by decreasing the cost of updating profiles…and organizer apps would focus on the private data more

clue that you have left out a dash:

Enter host password for user 'ser':

result of ‐user someuser:somepass where it should have been ‐‐user someuser:somepass

Note to self: must name a user ’ser’ with password ‘ass’ so I can write

curl -user -pass [url]

or

mysql -user -pass [db]

CGI.pm does not preserve Content-type of multipart non-file data

Ok, that sounds a bit complex, but basically if you have data such as

POST https://someurl
Content-Length: 503
Content-Type: multipart/form-data; boundary=xYzZY

–xYzZY
Content-Disposition: form-data; name=”someFile”; filename=”somefile”
Content-Type: text/plain

lines in
somefile

–xYzZY
Content-Disposition: form-data; name=”otherFile”; filename=”otherfile”
Content-Type: text/plain

lines in
otherfile

–xYzZY

Content-Disposition: form-data; name=”RegularData”
Content-Type: text/xml

<RegularData>
<Parameter1>Data1</Parameter1>
<Parameter2>value number two</Parameter2>
</RegularData>
–xYzZY–

and you use CGI.pm to process it, there is NO WHERE in the query object returned by CGI->new that stores the fact that RegularData is Content-type: text/xml. You can see this in the CGI.pm code here:

if ( ( !defined($filename) || $filename eq '' ) && !$multipart ) {
my($value) = $buffer->readBody;
$value .= $TAINTED;
push(@{$self->{param}{$param}},$value);
next;
}

The only place that knew about the Content-type: text/xml was in %header, a local variable that goes out of scope when we go to the next parameter.

Not a huge deal, but sometimes it matters...could patch CGI.pm, use some other method of parsing the multipart data, or guess the format by inspection...probably I'll be lazy and inspect the data.

This is for CGI.pm version 3.51 and 3.60

search by article length/features

Google knows stuff about pages, but its not exposed for users to search on.  Trying to read something in depth on a nontechie topic (parenting) results in a lot of shallow SEO-aimed pages.  Searching for longer, authored articles would probably help, or specifying print magazines, or .edu only.   Books are still much much higher quality than the highest-ranked web pages for most non-technical topics.

p not div

quick SEO tip – Facebook seems to recognize <p> tags as the preferred place to quote paragraphs from, rather than taking earlier content enclosed in <div>’s. Perhaps other sites prefer p as well as an indicator of real text.

vote for Strawberry Perl

At least when requiring XML::LibXML in a Windows environment, it was much easier to get running with Strawberry Perl than with ActiveState, mainly because Strawberry Perl has libxml and libxslt included with the install. Also I like using cpan better than ppm. ppm archives for perl 5.14 do not seem complete, and ActiveState will not give you an earlier perl in the community edition.

So, while I appreciate the contributions of both maintainers, the most painless path is the one to take…and in this case that was Strawberry Perl 5.14 with a cpan install of some other required modules (XML::LibXML was already installed).

Test::Unit::Assert(qr/pattern/,…) seems to fail on multiline strings

Recently had to change a number of lines from

$self->assert(qr/some_pattern/, $string, $message)

to

$self->assert($string =~ qr/some_pattern/m, $message)

Not sure why – the $string was a multi-line XML fragment, and the pattern was a phrase containing / and . chars. May need to do some more testing to isolate the reason. Works with the change. /m should not be necessary to match against a multiline string, but failed without this switch for my particular test cases.

Ctrl-Alt-Del for VirtualBox on MacOS

Ok, not a showstopper, but….running Windows in VirtualBox for MacOS, and had to press Ctrl-Alt-Del to start. There is an Alt key on my MacBookPro, but its fn-option and that didn’t work. The answer turned out to be under the VirtualBoxVM menu Machine->Insert Ctrl-Alt-Del. There might be other ways as well, but that was enough to get it going…

JAVA_HOME on linux (at least for compiling ruby-hdfs)

Posting this as the links I found for setting JAVA_HOME seemed to erroneously state to set it to the full path to the java executable. I’m not sure if there is a case where you would want to do that, but if you want to compile the ruby gem ruby-hdfs, then JAVA_HOME should be set to the directory above where the java binary is located, ie

export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24/

where the java executable is located at /usr/lib/jvm/java-6-sun-1.6.0.24/bin/java. Then the install is straightforward

$ gem install ruby-hdfs
Building native extensions. This could take a while...
Successfully installed ruby-hdfs-0.1.0
1 gem installed

hadoop fs is space-sensitive

HDFS, high density file system, is useful for big data. However, hadoop fs is not quite there as a shell replacement. Today I kept getting the message

cp: When copying multiple files, destination should be a directory.

when trying to copy multiple files to a directory using

hadoop fs -cp /path/to/files/*  /path/to/destination/directory

Finally figured out that the problem was I had two spaces between the file list and the directory path, which made hadoop not see the directory path in the command. Aaahh.