Retrospective: Live Code Streaming 2

Today we built up on the previous progress (see what it was here)

I had my twitter app credentials stored somewhere they’d not be streamed, and developed a bit of code to read and use them without showing them on screen. It’s important that those data don’t make it into a code repo! Afterwards, I used the tweepy API to create a stream following a “track”, which is just what we need to get all publications of a particular hashtag.

What remains is to really flesh out the twitter client, make it store the relevant data (see what data is really available), and get what I need.

It seems it won’t be too long – and it’s been longer than needed because I’ve been explaining things.

If you’re reading this, I want you to know that the process has been with trial and error; we found a bug in tweepy’s documentation (it’s fixed in master, which is not published yet), for instance, and I had to work out how to do things properly.

I invite you to join me on the next live stream, I will write on my twitter (@iajrz) when the date is defined.

Retrospective: Live Code Streaming 1

Yesterday I streamed for some minutes while starting the process for building an application which consumes the twitter API.

I’ve not used the tiwtter API before, and I’ve not consumed any APIs from Python… the point is to demonstrate how to learn something new on the run, and how straightforward or not things may turn out.

Up to now, I’ve found out that app.twitter.com is where you register the application, and I chose tweepy as an SDK.

I read on some of the documentation for the proper application behavior, such as the backoff protocol and their unique HTTP 420 error.

Reading on-screen is awkward, I’ve tried to make it lively by commenting on the content and pointing out what calls my attention. As it turns out that reading documentation and exploring is a part of development, I don’t want to hide it from the viewers. There are already a ton of streamers who use their known tools to work on straightforward problems… I hope to enhance the experience bringing to the table something I’ve not yet seen.

As I announced some time ago, this stream is scheduled for Wednesdays at 8:00PM, UTC-4. Next time I’ll have the needed parameters to authenticate to the twitter API and will go on from there, without showing the actual secret values.

If you’re not very experienced and have Spanish as your mother language, I recommend you join the stream and the chat, where we can discuss the information and implementation.

Learn Ruby The Hard Way

I’m working through the book as a break from other work, and it’s pretty good. It’s aimed to someone getting initiated in programming, which makes it rather easy for me to breeze through the excercises.

What I like most is that the book, just like the rest of the “Learn Code the Hard Way” series, is organized around excercises. You type code, then poke it to break it, fix it and improve upon it.

A good read, and by proxy so is Learn Python the Hard Way, on which this book is based.

Some Comments On Code Optimization

I often code for fun, building tiny toys to amuse myself. Be it simple board games that play themselves (tic-tac-toe and checkers), or some simulations (I have some Windows 7 gadgets laying around), or… well, you get the idea.

In the simulations, for which I developed a taste thanks to a former coworker, I tend to sit down and consider every little calculation I’m going to be performing. I usually run them in an HTML canvas, which both makes them suitable for use as Windows 7 desktop gadgets and lets me have good fun in a healthy environment: It’s easier to e-mail one html file than an exe, and the tools to profile browser javascript code are good to be familiar with.

In any case, one of the things I learned while doing this is that the way in which you run the loops has a tiny impact on performance. It adds up when you try to calculate an immense amount of vectors applied to an immense ammount of particles in a dynamic array… and as we’re running on top of a VM on top of a browser on top of the OS, every little bit counts. Whenever possible, I write the loops in the form

while(n–>0){
// do stuff
}

Why? Javascript implementations, just like the Java VM and the CPython runtime and x86 assembly have special codes to represent a handful of numbers which usually include 0, -1, and 1 through 5 at least. Not a big deal – unless you’re avoiding cache misses. As it’s a cheap optimization to make (doesn’t hurt readability), whenever it’s logical I use it.

This kind of optimization is nice and clean fun, but not necessarily impactful. The best way to optimize code is:

  1. Write it in a readable way. Forget about optimizations.
  2. Measure the hotspots, that is: measure where time is being spent
  3. Optimize that

Of course, before ever writing a line of code, some thought must be given to what you’re going to write. The fastest code is the one which does the least, at any given level. In the example above, the optimization is at the bytecode or assembly level; when you choose the wrong data structure or the wrong algorithm, you affect every level. This is the value of that “algorithms and data structures” class or book which you didn’t pay too much mind to at school.

I’m going to put forward two examples of choice, one of data structure and one of algorithm.

An Example of Data Structure Choice

If you need to organize a list of existing data, you may choose to use a list – after all, the amount of data to be organized could change in time. But different implementations of list have different characteristics. In Java, for example, you have LinkedList and ArrayList implementations of a List (among others). Any implementation of List can be used as an argument for of those you could invoke Collections.sort(List <T> list). This implementations, of course, mirror what you could write by hand if you wanted to. Indeed, I’d urge you to write an ArrayList implementation.

In any case: Which one would you use?

If you chose the ArrayList, pat yourself in the back. To organize data, you need to constantly compare elements, and the fastest way to do that is with a contiguous, indexed block of memory. The difference in performance can be observed in a desktop organizing random integers.

An Example of Algorithm

I’ll share something very real: An e-mail from GNU grep author Mike Haertel in a FreeBSD mailing list in 2010, in which he dissects the reason GNU grep is faster than BSD grep. Long story short: GNU grep executes fewer operations to achieve the same goal. Read it through, I could hardly improve upon Mr. Haertel’s exposition.

Conclusion

Toy around, and have fun with your micro-optimizations all you like. But be careful with the algorithm and data structure you choose.

Sometimes you’re calling some library or external service only to discard the result if some condition isn’t met. This kind of mistake is especially abundant in inherited codebases which accrue functionality over time.

Most important: if code is being slow, use a profiler. Optimize the hot spots, leave the rest alone. How good can this be? Read the answer here. Have fun :^)

Want to learn programming?

Derek Sivers has been a great inspiration for me in many aspects. My now page was basically his idea – there’s a whole movement around what people are focused on doing at the moment, and the sort-of community feeling keeps me accountable.

Some time ago, Derek wrote about how nice it would be to just have someone tell you what to do, and the fruit of that line of thought is is “Do This. Directives – part 1” article, which appears to be the first of a series.

I don’t yet have directives; not hard and fast ones. I used to have a page up with things you should read and work through, which would level you up. It’s more in the style of How To Become a Hacker.

I’ve not taught many people to code, so I don’t have the crystallized view of what, exactly, would get you to become a competent programmer. While that comes along – I pretend to teach many people to code in the near future – here’s a rendition of my advice on becoming a competent programmer:

  • Work through Learn Python the Hard Way. Internalize the mechanism for learning.
  • Read How to Become a Hacker and follow its advice
  • Work through Nand2Tetris
  • Work as much as you can through seminal works, like The Art of Computer Programming and Structure and Interpretation of Computer Programs
  • Join usergroups in your vicinity.
  • Get something done – something nice for yourself. At this point you should’ve already.
  • When you make mistakes: Identify them. Catalogue them. Learn to avoid the whole category of the mistake, if possible. Share it, that others may learn to avoid it.
  • Stop saying you’ll do it. Stop wondering whether this has something to do with what you want to do. Stop making to do lists. Start. Now. Hurry.

See you on the other side 🙂

Retrospective: Clean Code – Boy Scouts, Writers, and Mythical Creatures

Yesterday I gave a talk on Clean Code, based in content by Uncle Bob and Geoffrey Gerriets.

I had some technical issues – so I had no access to my presenter notes, damping my performance somewhat… and after I’d taken such pains to learn from Geoffrey’s talk at PyCaribbean on Code Review, Revision and Technical Debt.

The subtitle for my talk was: “Clean Code, Boy Scouts, Writers, and Mythical Creatures”.

It starts out by talking about the features of clean code, as described by Uncle Bob and his interviewed few in his book Clean Code – and comparing each group of aspects to physical things we can look up to, like a Tesla Model X, a pleasant beach, or a bike… all of which share traits desirable in our code.

Then going on to the maxim of “leaving the campground better than we found it”, with a nice example of some code taken from the IOCCC and how much more legible it became merely by reindenting it,  putting in relief the long term impact of little incremental changes.

The latter half of the talk was derived from lessons learned at Geoffrey’s talk: the process of a professional writer, compared to the process of a professional coder, and how they’re alike; the lessons form the writers’ day to day can be applied to our coding: design, write, revise, rewrite, proofread; some attention was given to the way that reviews may be given. The mythical creatures section –  which represent the different stages at which a developer may find himself – are an aid to this latter part of the talk by pointing out patterns of behavior that identify what may be important or not for a certain developer at a certain point in their growth. The advice to treat things that may be beneath a developer’s level as trivia and/or minutiae, as well as the advice on focusing and choosing improvements to point out instead of “trouble” may be the best of this part of the talk.

After realizing I’d burnt through the presentation and posing some questions to the audience, we discussed some interesting points:

  • How can code comments make code cleaner or dirtier?
  • How can rewrites alter our coding behavior?
  • How can we find time to have a re-writing flow if the management doesn’t know any better?

The mileage may vary, of course, so several people pitched in and we didn’t draw any firm conclusions, only presented ideas to try, which was interesting.

In the end, we came out with some good ideas on how to keep code from stagnating… hopefully our future selves will have ever fewer messes to deal with :^)

Retrospective: Reasons why you should love emacs

Last Saturday I was at Dominican College O&M’s campus at La Romana, as one of the speakers for the “OpenSaturday” series of events.

This was a complete success: full-room, engaged audience, excellent speakers.

I had the opportunity to give my first talk on emacs.

It was delivered using org-tree-slide-mode for one part — which was really cool for the audience and for me, too.

On the second half of the presentation, I used org-mode and demonstrated custom “ToDo” states and timestamps, org-mode html export (C-c C-e h H), syntax highlighting, Immediate mode, and Emmet mode. Of course, I demonstrated having multiple buffers open in several windows at the same time.

It was all in a hurry, because it was a 10-minute talk; I couldn’t demonstrate the keyboard macros – which would’ve been nice, as I was going to demonstrate an extraction of html form item names and the generation of php code to get it from the $_REQUEST superglobal; this makes use of emac’s ability to use search and all functions as part of the macro, which I know for a fact several editors can’t do.

The show-stealer was Emmet mode – I actually thought people would’ve been more surprised at noticing that the presentation was within emacs, but they weren’t. As many are CS students who are learning HTML, seeing html>(head>title)+body>div#content>ul>li*5 grow into the corresponding tree blew them away.

I’m planning to enhance that presentation to fill a 45-minute slot featuring keyboard macros, elisp functions + keybindings, and select parts of my .emacs file. Perhaps the presentation will be accompanied by a “Reasons to love Vi” by one of my colleagues, which would be sweet.

In any case, a great Saturday – hopefully things will keep on being fun.

On programming productivity

Measuring programmer productivity is notably hard. It’s the topic of numerous, variable length publications.

Much of coding can be succintly cuantified and estimated; time should probably be spent automating those tasks, as those are the boring, repetitive, well-defined ones, like creating a CRUD or converting some programming-language-level construct into a interface-level-representation such as JSON or XML.

The other part is hard to estimate, mostly because it combines several tasks, like getting to know the domain, figuring out what needs to get done and actually doing it in a polished manner.

Some things that are usually oversimplified in attempts to measure programmer productivity, sometimes to hilarious effect: amount of lines of code, time spent sitting at the computer, and amount of artifacts produced.

All of those means of measurement can backfire hideously by creating the wrong incentives (lots of boilerplate, woolgathering, overengineering, overestimating work length).

Here are a few important measurements that can be made to help track this elusive statistic:

  • Explain your work to your teams regularly. Have them rate it. Keep a history. State what’s being solved, why it was solved in this particular way, what tradeoffs were involved, any difficulties you ran into, and how you overcame them. Ratings on two indicators are crucial: problem complexity and performance. They should include justifications to help you home into better practices.
  • Keep track of all the bugs in your code, the stage at which they were noticed, and the time that fixing them required.
  • Keep track of the references to your code, especially if you’re writing tools.
  • Have your peers rate you on helpfulness and knowledgeability.

If you encounter any unintended side effects or incentives, please let me know. Up to now, the only bug I’ve found for this kind of process is the popularity-contest-like aspect it can sometimes take. Thus the objective numbers I slid in there to help balance. If you find other ways to improve on this, let me know.

 

PSA: Secure your build processes

I need to say this because there’s too much moaning and grinding of teeth going on about npm packages loads of projects depend on.

If you have a project with dependencies, do yourself a favor and have an in-house mirror for those. It’s even more important if you’re a software shop which works primarily with one technology, which I presume is a very common case.

I’m not too node.js savvy, but in the Java and Maven repositories world, we cover our backs using either Nexus Repository (which appears to work with npm, too) or Apache Archiva.

That way, when we “clean install” the last checked in code for final delivery into the QA and deployment teams, we don’t run into crazy issues like having it not build because someone decided to take their code down – or had it taken down by force.

In a Netflix chaos monkey-like approach, try to foresee and forestall all causes for unreliability at go time, not only with this but any other kind of externalized source of services. You, your family, significant others, pillow, boss, co-workers and customers will all be happier for it.

Use LetsEncrypt

I’ve successfully updated my SSL certificate for this website, and automated the transaction as a result of the first repetition of the maneuver.

It’s a really nice way to keep your site secure, and it pushes you towards automating the renewals by having a relatively short certificate life span. Plus, it’s free.

Receiving the alert is very refreshing; I still had 19 days to renew my certificate, which would’ve given me plenty of chance to do it even if I’d not had a shellscript handy waiting only for a good chance to test it and schedule.

Keeping your client-server communications secure is a must-do; even if most of what I write here will eventually see the light of the day, much damage can be done to my image if the site was compromised; and this is just a personal website. If you have a website where clients log in and trust you their data, and for some reason you do not have secure connections enabled, do yourself a favor and fix that problem.