Building out a social consumer internet product that could change quickly and evolve over time puts special requirements on the underlying data store. You need to be prepared for scale but not investing too much too early, your business may need to pivot in different directions so data models can’t be set in stone and you need to be able to search that data to enable many of the features users expect from an online social product. I have the additional requirement that I wanted all the entities in the system, regardless of where they are stored, to be accessed the same way and be described by the same data description language for consistency and maintainability. I’ve created what I think to be a novel solution to these requirements in the form of HAvroBase. Continue reading
HAvroBase: a searchable, evolvable entity store on top of HBase and Solr
Static-typing is a powerful metadata database, exploit it!
Today someone decided to pretend they knew something about how a modern statically typed language developer works. Perhaps they are big emacs fans or something because they felt that static-types leading to autocomplete in an IDE was somehow a feature in the language FOR THE IDE. The IDE gets no benefit, in fact, editors like emacs and TextMate are much simpler. But I can tell you that the developer gets a tremendous benefit by having this metadata database not only present in the code and easily queryable but heavily exploited by her tools. What is amazing about this metadata database is that it is actually populated by that same IDE. There are very few times that I even type a Type — 90% of those times are when declaring a new Type — the rest of the time the IDE is very patiently maintaining and leveraging that database for me. Even better, when someone else is looking at the source code, or even using the library, they instantly get 80% of the documentation. All that is left are the semantics of the calls. Perhaps we need to have some sort of meetup where we look over each others shoulders and actually understand how someone who is on the other side really works rather than dynamic-typing language people assuming that the programmer is maintaining the huge metadata database present in the static-typing language developers code. And on the otherside that static-typers are assuming that the dynamic-typing language developer is memorizing all kinds of arcane method call names, argument shapes and the shapes of all the libraries they are using. Do you really memorize all that trivia?
Building the ideal web application template engine
A month and a half ago I had put out a call for what I was calling the ‘ideal web application template engine‘ along with a list of requirements that I thought would be present in such a system. Since then I looked a bunch of them and decided that I like the simple markup that mustache defined but none of the implementations were up to doing what I wanted. This led me to embark on building a new engine with for that markup for my chosen platform, Java, which I called creatively, mustache.java. Though it claims to be ‘logic-less’ I would say that it has some amount of logic. It will loop over a set of objects and it will check booleans, but I think it is about as close to logic-less as you would want to be in a template language. So, let’s look at each of the requirements and how I ended up implementing them:
Continue reading
Android Dalvik VM performance is a threat to the iPhone
One of the peculiarities of Apple is that they have set themselves down a path where every Apple developer needs to learn Objective-C (and C/C++) to build applications for their platform. The biggest characteristic of Objective-C vs Java is dynamic dispatch. At runtime Objective-C can send arbitrary messages to objects and they may or may not respond to them. This has the nice property that you can write code that is very dynamic and loosely bound but it also has the property that method calls in Objective-C are very slow and the more code that you write in Objective-C instead of in C/C++ the slower your codebase becomes. Up until Android 2.2 (Froyo) the JVM (really a Dalvik JVM for licensing reasons) on the Android platform was playing with one hand tied behind its back. Different from desktop/server Java, the JVM was still an interpreter, like the original JVM back in the Java 1.0 days. It was very efficient interpreter but an interpreter none-the-less and was not creating native code from the Dalvik bytecodes that it uses. As of Android 2.2 they have added a JIT, a just-in-time compiler, to the stack that translates the Dalvik bytecode into much more efficient machine code much like a C/C++ compiler. You can see the results of this in the benchmarks of Froyo which show a 2-5x improvement. As they add more and more JIT and GC features that have appeared in HotSpot, JRockit, etc, you will likely see even more improvements over time — without having to change or recompile the 3rd party developed software.
Off-loading real work to YQL instead of using your server
Just saw a cool blogpost about summing up all your subscribers on RSS, Facebook and Twitter. The only issue is that it requires PHP to work. I decided to quickly rewrite it as a YQL Execute table instead. So rather than write that code in PHP, here is the code for my table. Pretty simple. You pass it a URL for Feedburner, an id for Facebook and an id for Twitter and it generates some simple HTML for you to display as a widget.
The html page that he uses didn’t have to be changed that much, you can find the same demo that he uses here. This was really quick so right now there is no way to leave out one without it freaking out. This is more a tech demo than it is a real solution.
Maven artifacts need to be more discoverable
The laundry list of repositories that are filling the POM in Maven projects has to go. The ideal of having a central store of all artifacts is clearly dead and we have to move on. My proposal is two fold. First, the repository to end all repositories that doesn’t actually store the files but simply redirects to a known good location for the group id / artifact id / version combination. The second is that we create an ad hoc standard for artifact discovery based on the group id. For example, if the group id is com.sampullara.cli-parser then you should be able to find a repository that stores all artifacts at http://cli-parser.sampullara.com/repository/. Perhaps we could even put some sort of discovery file at the root to find them. This would allow anyone to distribute Maven artifacts to developers without having to publish to any central location. It would also drastically reduce the amount of repository cruft that is creeping into so many of the POM files I have seen recently. Further, it would make it really easy for github, googlecode, apache, etc to help their developers automatically publish with just the right naming conventions in place.
Come to think of it, maybe the whole thing is a little suspect. Maybe we should have been using URLs the entire time. How much value are we getting out of the layer of indirection? This is intended to be a discussion…
How Facebook could open the Open Graph APIs
I was pondering how Facebook could really answer it’s critics in a profound way that would really up-level the social fabric of the internet while also satisfying those that want ultimate control over their data. The basic premise of the solution would be to allow any node in the Open Graph to be redirected to a 3rd party developers server. From that point on, Facebook would then treat that node just as if they were a developer that current gets access to Facebook users.
Continue reading
The ideal web application templating system
When a designer and a developer work together to create an HTML web application one of the biggest issues is how you translate their vision into code in a way that allows for iteration and flexibility when those designs change. Another issue is that while you are testing a design you would like it to be filled with actual data but having it hooked up to a live server at all times is very painful. So far I haven’t found a system that really gives me even these attributes and as it turns out there are more requirements that I would like that are also mostly unfulfilled. Here is a list of requirements I would like to see met by a templating system:
- Works well with HTML5/CSS3 progressive enhancement
- Allows mock data within the template that is replaced at runtime
- Client-side version that leverages the mock data for shift-reload debugging
- Composable components, not monolithic pages
- Very little or no business logic in the templates
- Concurrent evaluation possible
Right now I am looking at mustache.js and its various server-side implementations as a possible solution to this. It has many of those qualities but I would likely need to make some of my own modifications for things like concurrent evaluation.
Any suggestions?
Making Open Data Tables even easier: YQL Storage Editor
One of the big benefits of YQL is that you can make your own Open Data Tables and use them. However, one of the disadvantages is that you either have to host your tables on a publicly available URL or you have to painstakingly use the yql.storage table to manage your environments and table definitions. There has been an open feature request for the YQL console for a while now to integrate the ability to edit yql.storage content but it isn’t the highest priority thing for the team right now. Since I am using YQL more and more lately I decided to build it myself, leveraging Bespin, Mozilla’s web based editor.
Continue reading
Coders At Work Interview
Peter Seibel wrote a great book called Coders at Work which contains interviews of various programmers throughout computer science history. Yury Lifshits has put up a site called Interview 2010 that lets you both define interviews and respond to them. Peter graciously agreed to put up a subset of his questions for the interviews that he did and I have my answers below. After that, you will find a form to enter your own answers to this set of questions.
Sam Pullara
Long-time computer programmer
Peter Seibel: How did you learn to program?
Sam Pullara: Back when I was around 10 I asked for a computer. My mother looked at the available choices and decided that the Atari 400 was just a game machine and the VIC-20 looked like I might actually learn something. She was right. I started out programming in BASIC and then later 6502 machine code on the 64 and 128. In high school I finally learned a ‘real’ computer programming language, C, by reading a 700 page volume book in a single night. One day I will take a computer class.