Code is Part of the Machine

I intend to share with you a realization that recently crystallized around discussions related to software complexity and stack comparisons.

Computers are very complex lately. They have multiple processors with multiple cores each, communicating with each other to let you wait for idle seconds while that spreadsheet software opens. Or while the kitty “gifs” (likely rebranded mp4’s) load.

Processors can also have microcode, and on top of them sits an operating system, on top of which is our stack, on top of which is our code.

Computers, with all their hardware and software, are machines that imitate other machines – often better than those that inspired them. For instance, almost any text editor in use today we can think of as an impossibly powerful typewriter.

The code we write, and the code we build upon, are part of the machine we are putting together. Yet this is one of the aspects that need to be taken into account when building stuff for the long run: the amount of code present in your production machines is part of the machine.

Complex machines are harder to reason about, keep running, and upgrade than simpler ones.

There are other things to consider, of course, but with the simple comparisons out there of how different programming languages can be more or less complex I think we need to separate things and think about each aspect individually.

Even though this

print("Hello, World")

is easier to read this

#include <stdlib.h>

int main(int argc, char* argv[]){
    printf("Hello, World");
    return 0;
}

All else being equal, the latter is by far the simpler machine.

I’m not certain whether I’ll get to develop this idea as much as I want to in the future, or write about other aspects that need to be considered when you’re building long term systems, so here are some related thoughts:

  • All the code in the production computer, no matter where it comes from, is part of the machine you’re building.
  • All production computers that interact in order to make the system you’re building work are part of the same machine.
  • Yes, this includes the client’s computer, and the client’s phone, with all the apps that are running in the device.
  • Routers and switches are computers.
  • Now that I reflect upon it, I think this is something that Microsoft has known for ages, and part of the reason stuff running on MS systems keep working for relative aeons through upgrades. Some of Raymond Chen’s blog, The Old New Thing has convinced me of that.
  • Ease of writing code doesn’t mean ease of maintaining code doesn’t imply ease of running the code.
  • The most valuable thing to come out of DevOps is the very sane notion that the development team should be involved in making sure the system keeps chugging along.
  • Considering the way early digital computers worked and were programmed, thinking about modern computers as “machines that rewire themselves continuously on the fly” becomes very attractive.
  • The total number of possible machines your production computer can become is part of the complexity of your setup, and is a corollary of the idea that code is part of the machine.
  • Whether your dependencies will become updated or not, the interfaces you depend on die or not, or there are vulnerabilities in your code are a different issue.

If we’re going to reason about all this, it’s important to not muddle our thought process and address as much or as little of the environment of building and running applications as we really mean to.

There is more where that came from, but I don’t know that I’ll have the tools to write it properly. So there we go, I hope this tickles your mind and gives you a new way to think about software systems.

Individuals and Team Performance

Recently, Tim Ottinger posed a question on Twitter regarding the change
in teams and the impact of the change in their effectiveness.

This made me reminisce about some things. I’ve rewritten this post a few times, unsure of how much detail to go into, or whether to publish this at all, but I think at least a few remarks bear mentioning.

I have seen people who accelerate every team they join, and people who put the breaks on every team they join (yes, there is an in-between, but the extremes prove a point). I’ve seen teams swapped whole and the results become “permanently” faster, and I’ve also seen them become slower.

I think the direction and magnitude of change comes down to a few different factors: how the individuals interact with the craft of software development, how they interact with other team members, and how the organization interacts with the software projects they own.

People who respect software development as a craft, would feel embarrassed about making avoidable mistakes and like efficiency in general tend to accelerate software development when they’re included in a project.

People who make their teammates feel comfortable, are willing to learn, teach, and grow together, and don’t feel ashamed of having to learn new things tend to accelerate software development when they’re included in a project.

Organizations with sane rules, specially about developing internally transferable skill sets in collaborators, knowledge sharing, and architectural uniformity tend to suffer less with team changes.

The opposites are opposites, and there’s a lot of nuance that doesn’t fit in this margin, and I can’t claim to comprehend everything completely.

Managers tend to have an big impact in their teams, which should be obvious, but feels worth mentioning. A supervisor who doesn’t know their team, doesn’t teach, motivate, cultivate, and prune their team, makes the company (and the projects the team is responsible for) more fragile.

Teams are bound to change. Requirements change, people move on, people move in. Having every change in the team be a possible cause to a “permanent” slow down in development is a mind boggling, yet alarmingly commonly accepted proposition.

Software development should happen in a way that a change in personnel is more like going trough a bumpy section of the road and less like slashing a vehicle’s tires.

I have sometimes felt that some companies try to treat their employees as exchangeable. This is a liability. Everyone is different, and software should be written in such a way that unique people can bring their unique strengths to the table. This implies, as in any kind of team, that taking care of the composition is crucial. And this goes not only for skill level and skill type, but for attitudes and tendencies. Compare with team composition in your favorite team sport.

*Permanent is in quotes because … well, obviously I’m not looking into eternity. I mean over a period of time when the team has settled into their “day to day performance level” plateau.

Retrospective: Live Code Streaming 2

Today we built up on the previous progress (see what it was here)

I had my twitter app credentials stored somewhere they’d not be streamed, and developed a bit of code to read and use them without showing them on screen. It’s important that those data don’t make it into a code repo! Afterwards, I used the tweepy API to create a stream following a “track”, which is just what we need to get all publications of a particular hashtag.

What remains is to really flesh out the twitter client, make it store the relevant data (see what data is really available), and get what I need.

It seems it won’t be too long – and it’s been longer than needed because I’ve been explaining things.

If you’re reading this, I want you to know that the process has been with trial and error; we found a bug in tweepy’s documentation (it’s fixed in master, which is not published yet), for instance, and I had to work out how to do things properly.

I invite you to join me on the next live stream, I will write on my twitter (@iajrz) when the date is defined.

Retrospective: Live Code Streaming 1

Yesterday I streamed for some minutes while starting the process for building an application which consumes the twitter API.

I’ve not used the tiwtter API before, and I’ve not consumed any APIs from Python… the point is to demonstrate how to learn something new on the run, and how straightforward or not things may turn out.

Up to now, I’ve found out that app.twitter.com is where you register the application, and I chose tweepy as an SDK.

I read on some of the documentation for the proper application behavior, such as the backoff protocol and their unique HTTP 420 error.

Reading on-screen is awkward, I’ve tried to make it lively by commenting on the content and pointing out what calls my attention. As it turns out that reading documentation and exploring is a part of development, I don’t want to hide it from the viewers. There are already a ton of streamers who use their known tools to work on straightforward problems… I hope to enhance the experience bringing to the table something I’ve not yet seen.

As I announced some time ago, this stream is scheduled for Wednesdays at 8:00PM, UTC-4. Next time I’ll have the needed parameters to authenticate to the twitter API and will go on from there, without showing the actual secret values.

If you’re not very experienced and have Spanish as your mother language, I recommend you join the stream and the chat, where we can discuss the information and implementation.

Learn Ruby The Hard Way

I’m working through the book as a break from other work, and it’s pretty good. It’s aimed to someone getting initiated in programming, which makes it rather easy for me to breeze through the excercises.

What I like most is that the book, just like the rest of the “Learn Code the Hard Way” series, is organized around excercises. You type code, then poke it to break it, fix it and improve upon it.

A good read, and by proxy so is Learn Python the Hard Way, on which this book is based.

Some Comments On Code Optimization

I often code for fun, building tiny toys to amuse myself. Be it simple board games that play themselves (tic-tac-toe and checkers), or some simulations (I have some Windows 7 gadgets laying around), or… well, you get the idea.

In the simulations, for which I developed a taste thanks to a former coworker, I tend to sit down and consider every little calculation I’m going to be performing. I usually run them in an HTML canvas, which both makes them suitable for use as Windows 7 desktop gadgets and lets me have good fun in a healthy environment: It’s easier to e-mail one html file than an exe, and the tools to profile browser javascript code are good to be familiar with.

In any case, one of the things I learned while doing this is that the way in which you run the loops has a tiny impact on performance. It adds up when you try to calculate an immense amount of vectors applied to an immense ammount of particles in a dynamic array… and as we’re running on top of a VM on top of a browser on top of the OS, every little bit counts. Whenever possible, I write the loops in the form

while(n–>0){
// do stuff
}

Why? Javascript implementations, just like the Java VM and the CPython runtime and x86 assembly have special codes to represent a handful of numbers which usually include 0, -1, and 1 through 5 at least. Not a big deal – unless you’re avoiding cache misses. As it’s a cheap optimization to make (doesn’t hurt readability), whenever it’s logical I use it.

This kind of optimization is nice and clean fun, but not necessarily impactful. The best way to optimize code is:

  1. Write it in a readable way. Forget about optimizations.
  2. Measure the hotspots, that is: measure where time is being spent
  3. Optimize that

Of course, before ever writing a line of code, some thought must be given to what you’re going to write. The fastest code is the one which does the least, at any given level. In the example above, the optimization is at the bytecode or assembly level; when you choose the wrong data structure or the wrong algorithm, you affect every level. This is the value of that “algorithms and data structures” class or book which you didn’t pay too much mind to at school.

I’m going to put forward two examples of choice, one of data structure and one of algorithm.

An Example of Data Structure Choice

If you need to organize a list of existing data, you may choose to use a list – after all, the amount of data to be organized could change in time. But different implementations of list have different characteristics. In Java, for example, you have LinkedList and ArrayList implementations of a List (among others). Any implementation of List can be used as an argument for of those you could invoke Collections.sort(List <T> list). This implementations, of course, mirror what you could write by hand if you wanted to. Indeed, I’d urge you to write an ArrayList implementation.

In any case: Which one would you use?

If you chose the ArrayList, pat yourself in the back. To organize data, you need to constantly compare elements, and the fastest way to do that is with a contiguous, indexed block of memory. The difference in performance can be observed in a desktop organizing random integers.

An Example of Algorithm

I’ll share something very real: An e-mail from GNU grep author Mike Haertel in a FreeBSD mailing list in 2010, in which he dissects the reason GNU grep is faster than BSD grep. Long story short: GNU grep executes fewer operations to achieve the same goal. Read it through, I could hardly improve upon Mr. Haertel’s exposition.

Conclusion

Toy around, and have fun with your micro-optimizations all you like. But be careful with the algorithm and data structure you choose.

Sometimes you’re calling some library or external service only to discard the result if some condition isn’t met. This kind of mistake is especially abundant in inherited codebases which accrue functionality over time.

Most important: if code is being slow, use a profiler. Optimize the hot spots, leave the rest alone. How good can this be? Read the answer here. Have fun :^)

Want to learn programming?

Derek Sivers has been a great inspiration for me in many aspects. My now page was basically his idea – there’s a whole movement around what people are focused on doing at the moment, and the sort-of community feeling keeps me accountable.

Some time ago, Derek wrote about how nice it would be to just have someone tell you what to do, and the fruit of that line of thought is is “Do This. Directives – part 1” article, which appears to be the first of a series.

I don’t yet have directives; not hard and fast ones. I used to have a page up with things you should read and work through, which would level you up. It’s more in the style of How To Become a Hacker.

I’ve not taught many people to code, so I don’t have the crystallized view of what, exactly, would get you to become a competent programmer. While that comes along – I pretend to teach many people to code in the near future – here’s a rendition of my advice on becoming a competent programmer:

  • Work through Learn Python the Hard Way. Internalize the mechanism for learning.
  • Read How to Become a Hacker and follow its advice
  • Work through Nand2Tetris
  • Work as much as you can through seminal works, like The Art of Computer Programming and Structure and Interpretation of Computer Programs
  • Join usergroups in your vicinity.
  • Get something done – something nice for yourself. At this point you should’ve already.
  • When you make mistakes: Identify them. Catalogue them. Learn to avoid the whole category of the mistake, if possible. Share it, that others may learn to avoid it.
  • Stop saying you’ll do it. Stop wondering whether this has something to do with what you want to do. Stop making to do lists. Start. Now. Hurry.

See you on the other side 🙂

Retrospective: Clean Code – Boy Scouts, Writers, and Mythical Creatures

Yesterday I gave a talk on Clean Code, based in content by Uncle Bob and Geoffrey Gerriets.

I had some technical issues – so I had no access to my presenter notes, damping my performance somewhat… and after I’d taken such pains to learn from Geoffrey’s talk at PyCaribbean on Code Review, Revision and Technical Debt.

The subtitle for my talk was: “Clean Code, Boy Scouts, Writers, and Mythical Creatures”.

It starts out by talking about the features of clean code, as described by Uncle Bob and his interviewed few in his book Clean Code – and comparing each group of aspects to physical things we can look up to, like a Tesla Model X, a pleasant beach, or a bike… all of which share traits desirable in our code.

Then going on to the maxim of “leaving the campground better than we found it”, with a nice example of some code taken from the IOCCC and how much more legible it became merely by reindenting it,  putting in relief the long term impact of little incremental changes.

The latter half of the talk was derived from lessons learned at Geoffrey’s talk: the process of a professional writer, compared to the process of a professional coder, and how they’re alike; the lessons form the writers’ day to day can be applied to our coding: design, write, revise, rewrite, proofread; some attention was given to the way that reviews may be given. The mythical creatures section –  which represent the different stages at which a developer may find himself – are an aid to this latter part of the talk by pointing out patterns of behavior that identify what may be important or not for a certain developer at a certain point in their growth. The advice to treat things that may be beneath a developer’s level as trivia and/or minutiae, as well as the advice on focusing and choosing improvements to point out instead of “trouble” may be the best of this part of the talk.

After realizing I’d burnt through the presentation and posing some questions to the audience, we discussed some interesting points:

  • How can code comments make code cleaner or dirtier?
  • How can rewrites alter our coding behavior?
  • How can we find time to have a re-writing flow if the management doesn’t know any better?

The mileage may vary, of course, so several people pitched in and we didn’t draw any firm conclusions, only presented ideas to try, which was interesting.

In the end, we came out with some good ideas on how to keep code from stagnating… hopefully our future selves will have ever fewer messes to deal with :^)

Retrospective: Reasons why you should love emacs

Last Saturday I was at Dominican College O&M’s campus at La Romana, as one of the speakers for the “OpenSaturday” series of events.

This was a complete success: full-room, engaged audience, excellent speakers.

I had the opportunity to give my first talk on emacs.

It was delivered using org-tree-slide-mode for one part — which was really cool for the audience and for me, too.

On the second half of the presentation, I used org-mode and demonstrated custom “ToDo” states and timestamps, org-mode html export (C-c C-e h H), syntax highlighting, Immediate mode, and Emmet mode. Of course, I demonstrated having multiple buffers open in several windows at the same time.

It was all in a hurry, because it was a 10-minute talk; I couldn’t demonstrate the keyboard macros – which would’ve been nice, as I was going to demonstrate an extraction of html form item names and the generation of php code to get it from the $_REQUEST superglobal; this makes use of emac’s ability to use search and all functions as part of the macro, which I know for a fact several editors can’t do.

The show-stealer was Emmet mode – I actually thought people would’ve been more surprised at noticing that the presentation was within emacs, but they weren’t. As many are CS students who are learning HTML, seeing html>(head>title)+body>div#content>ul>li*5 grow into the corresponding tree blew them away.

I’m planning to enhance that presentation to fill a 45-minute slot featuring keyboard macros, elisp functions + keybindings, and select parts of my .emacs file. Perhaps the presentation will be accompanied by a “Reasons to love Vi” by one of my colleagues, which would be sweet.

In any case, a great Saturday – hopefully things will keep on being fun.

On programming productivity

Measuring programmer productivity is notably hard. It’s the topic of numerous, variable length publications.

Much of coding can be succintly cuantified and estimated; time should probably be spent automating those tasks, as those are the boring, repetitive, well-defined ones, like creating a CRUD or converting some programming-language-level construct into a interface-level-representation such as JSON or XML.

The other part is hard to estimate, mostly because it combines several tasks, like getting to know the domain, figuring out what needs to get done and actually doing it in a polished manner.

Some things that are usually oversimplified in attempts to measure programmer productivity, sometimes to hilarious effect: amount of lines of code, time spent sitting at the computer, and amount of artifacts produced.

All of those means of measurement can backfire hideously by creating the wrong incentives (lots of boilerplate, woolgathering, overengineering, overestimating work length).

Here are a few important measurements that can be made to help track this elusive statistic:

  • Explain your work to your teams regularly. Have them rate it. Keep a history. State what’s being solved, why it was solved in this particular way, what tradeoffs were involved, any difficulties you ran into, and how you overcame them. Ratings on two indicators are crucial: problem complexity and performance. They should include justifications to help you home into better practices.
  • Keep track of all the bugs in your code, the stage at which they were noticed, and the time that fixing them required.
  • Keep track of the references to your code, especially if you’re writing tools.
  • Have your peers rate you on helpfulness and knowledgeability.

If you encounter any unintended side effects or incentives, please let me know. Up to now, the only bug I’ve found for this kind of process is the popularity-contest-like aspect it can sometimes take. Thus the objective numbers I slid in there to help balance. If you find other ways to improve on this, let me know.