Monthly Archive for January, 2010

That DOM removal thing, again

Earlier this month, I pondered what looked like a bug in JavaScript/DOM/jQuery: removing an element from the DOM with jQuery (either manually with remove() or by setting the html() of its parent to something else) kept most of the data bound to the element around, but removed all event handlers from it. You could then re-insert the element, but its event handlers would be lost.

I then gathered from several sources, such as Stack Overflow, that this is a jQuery issue (or rather, feature) and not a JavaScript one.

The underlying cause is explained by Douglas Crockford:

When a DOM object contains a reference to a JavaScript object (such an event handling function), and when that JavaScript object contains a reference to that DOM object, then a cyclic structure is formed. This is not in itself a problem. At such time as there are no other references to the DOM object and the event handler, then the garbage collector (an automatic memory resource manager) will reclaim them both, allowing their space to be reallocated. The JavaScript garbage collector understands about cycles and is not confused by them. Unfortunately, IE’s DOM is not managed by JScript. It has its own memory manager that does not understand about cycles and so gets very confused. As a result, when cycles occur, memory reclamation does not occur.

A common solution to this problem is to remove the cycles when the element is removed from the DOM. Since a major source of cycles in your average jQuery program is the presence of event handlers, then removing the event handlers when an element is removed from the DOM solves the problem most of the time.

With the release of jQuery 1.4, the new documentation for .remove() makes mention of this fact:

In addition to the elements themselves, all bound events and jQuery data associated with the elements are removed.

The documentation for .html() still makes no mention of this. If you want to remove an element and keep all the goodies you bound to it, jQuery 1.4 provides you with .detach():

The .detach() method is the same as .remove(), except that .detach() keeps all jQuery data associated with the removed elements. This method is useful when removed elements are to be reinserted into the DOM at a later time.

Shouldn’t Happen…

Design and development is turning the great unknown chaos into tiny bits of controlled functionality with promises about what the result will be, and expectations about what the input should be.

There is an interesting duality between two categories of expectations, depending on whether they are the responsibility of the user, or of the programmer.

User errors are classic mistakes involving incorrect input, such as attempting to load a file that does not have the right format, or visiting a web site that does not exist, or entering an incorrect email address. A program is expected to, at the very least, gracefully handle these situations (because nobody likes errors) and the best programs are actively designed to reduce the possibility of error though appropriate user interface choices.

Programmer errors are the most frequent ones, but most of there are luckily caught by a compiler (or, in the case of the less lucky interpreted languages, the parser). The basic idea is that if you expect a function parameter to be an integer, and you tell your compiler, then static analysis will determine that you will receive a string argument, and the universe will collapse build will fail.

Static Analysis

Static analysis can be very smart. It can prove beyond any doubts complex properties about complex software written in obscenely low-level code (such as C with inline assembly). The problem if that working with a static analysis tool can add unusual constraints on the developers themselves: the halting problem dictates that no tool can safely predict the behavior of a program, so any given tool will either have false negatives (undetected bugs) or false positives (safe code reported as dangerous) and the general trend for static analysis tools is to avoid any false negatives at the cost of false positives.

The quality of a static analysis tool is determined by how hard it is to write code without false positives (usually done by manually coding around the blind spots of the tool).

Static analysis tools have two problems. One, they’re not available for every single language and platform out there. Some of use are still using languages with eval(), throwing Java exception-safety out the window because we find it too constraining, doing without those pesky type systems and generally making a childish fuss about those “warning” thingies. Two, static analysis tools can only check constraints that are described by the developer in some form, such as assertions, preconditions, postconditions, type annotations or some other kind of attribute added to the code.

So, if you forget to “assert” it, nobody is going to check it for you. For instance, no tool is going to warn you that you unwittingly leak a credit card number to a third party.

The Elephant Statue

In a sense, predicting user errors is the mirror activity of gathering specifications. Both force you to think about all possible situations your software will face, and decide what should happen: maybe you have to display an error, maybe you will have to tread the input in a clever but predictable way, or maybe you will have to rework your process to prevent that situation from happening.

This is akin to creating an elephant statue by starting with a block of stone and carving out everything but the elephant. Deciding what your users can do implicitly defines what your users cannot do. Depending on the situation, you may guide your design with either approach.

DOM removal and events

Let’s try something… go to a page with jQuery enabled (such as this one), and run the following code in your Javascript debugger console (such as Firebug):

var button =
  $('<button>Click me</button>')
  .click(function(){alert('Clicked!')})
  .appendTo('body')

In case you were wondering, this creates a brand new button, causes it to display a “Clicked!” message box when it’s clicked, and appends it to the document you are viewing.

Click on the button that just appeared : the message box appears. Not very surprising.

Now, run the following code on the same page :

$('body').html('');
button.appendTo('body')

As expected, everything on the page, including the button, disappears. However, the button is still referenced by the button variable, so it sticks around and we can append it back to the document. And indeed, it does appear on the page.

Click on the button again. This time, no message box appears.

I honestly have no idea why.

Interest(ing) rates

The most common way of investing money is putting it in a savings account. You lend a fixed amount of money to someone, and they pay interest over that money at a predetermined rate. Let’s say you lend 1,000 € at an interest rate of 3%, paid every year: at the end of the year, you would receive 30 € as payment for your lending. You would spend these on fine wine or nice clothes and wait until the next year to get another 30 €, and so on.

Savings accounts work on the basis of simple interest : what you get paid is a linear function of both time and money. Lend for half a year? 3% ÷ 2 = 1.5% Lend for two years? 3% ×2 = 6%

An important thing to bear in mind is that interest is paid at fixed intervals, for instance at the beginning of January. You don’t have to spend those 30 € : you can them on the savings account and earn simple interest on them after a year (3% of 30 € is 0.90 €).

Using this strategy, lending for two years is done at a 6.09% rate instead of 6%, because you get interest on interest. This is known as compound interest : what you get paid is an exponential function of time. Lend for two years ? (+3%)² = +6.09% Lend for three years ? (+3%)³ = +9,27%

The mathematical justification is that, with a 3% interest, your total amount of money is multiplied by 1.03 every year:

1,000 + 30 = 1,000 + 3% of 1,000 = 1,000 + 0.03 × 1,000 = 1.03 × 1,000

So, after two years, the amount is multiplied by 1.03 two times, and so on.

1,060.90 = 1.03 × 1,030 = 1.03 × 1.03 × 1,000

In short, percentages have a multiplicative effect.

And now, pop quiz : I’ve gained +5% weight over the winter holidays. What percentage of my weight do I have to lose to be back to normal ?

If you answered -5%, you missed the point. Multiplicative effect means the total change of weight would be +5% × -5% = 1.05 × 0.95 = 0.9975 = -0.25%. I would be losing too much weight !

The correct answer was 1 ÷ 1.05 = -4.76%.

Similarly, if the number of graduates of a given school increases by +10% on year one and +25% on year two, the total increase is +37.5% and not +35%.

Duality

This is where mathematicians (and computer scientists) use an interesting little concept called duality. Percentages are numbers that are easy to understand, but hard to combine. We can transform them into something that is a little bit harder to understand, but easier to combine.

The traditional way to transform multiplication into addition is to exponentiate, due to an interesting property of the exponential function:

exp(a) ×exp(b) = exp(a + b)

So, I wish to find a percentage operator (§) such that:

  • we conserve some values, 0§ = 0% and 100§ = 100%
  • applying A§, then B§, is equivalent to applying (A+B)§

Then this uniquely defines an operator which is called exponential percentage:

A§ = B%  ↔  A = 100 × log(1 + B ÷ 100) ÷ log(2)

Some common values:

0% = 0§ +100% = +100§ -100% = -∞§ 200% = 158.4§
+1% = +1.4§ +99% = +99.2§ -1% = -1.4§ -99% = -664§
+10% = +13.7§ +90% = +92.6§ -10% = -15.2§ -90% = -332§
+25% = +32.2§ +75% = +80.7§ +50% = +58.4§ -50% = -100§

percent

So, if I gained +5§ weight over the holidays, I can lose -5§ weight and be back to where I started, and if a number increases by 10§, then by 25§, it increases by 35§ overall.

And of course, a yearly interest rate of 4.2§ = 3% compounded over ten years is 42§ = 34%.

No Free Lunch

Normal percentage rules make compounding hard, but it’s reasonably easy to estimate a percentage based on a fraction. Exponential percentage rules make compounding easy, but evaluating a percentage based on real figures is harder.

In practice, compounding happens less often than evaluating, so humans use normal percentage rules. And computers are good at compounding through multiplication, so they don’t need exponentiation.

Duality does have some other uses, though. For instance, there’s the duality between two representations of complex numbers:

a + ib = r exp iθ

The cartesian (a,b) notation makes it easier to add numbers, but multiplication is harder:

a + ib + c + id = (a+c) + i(b+d)

The polar (r,θ) notation makes it easier to multiply numbers, but addition is harder:

r exp iθ × s exp iφ = (r × s) exp i(θ+φ)

For mathematically-oriented computer scientists, duality is a gold mine, because it lets one reduce a complex problem in one area to a simpler problem in another area (whether simpler means faster, as in the case of FFT, or easier to think about)..

The Law of DSLs

There’s one common duality that is fundamental in the computer world: the correspondence between data and code. In a fit of narcissism, let me sit wisely atop a tall mountain to announce Nicollet’s Law of Domain Specific Languages:

Any sufficiently complex data processing algorithm is as an interpreter for a small domain-specific language, and the data being processed is a program executed by the interpreter.

In some cases, this law only complicates things further. In many cases, however, the different angle it provides leads to many advantages, one of them being to transform a non-programming concept (such as an accounting file format) into a concept programmers are familiar with (a programming language).

A minimalist language design culture is enough to grasp several interesting concepts about executing code, which can be quite handy when processing data:

1. Compile to Bytecode

Interpreters don’t execute a string of characters. They tokenize that string, turn the tokens into an abstract syntax tree representing operations, functions and variables, then turn that syntax tree into a sequence of small, executable operations. That sequence is then fed into a virtual machine (or further compiled to machine code) to perform the actual operations.

If the input data for your algorithm is very complex, you can begin on the other side: what will the algorithm do with the data? Will it be inserting the data into a database? Constructing a data object from bits and pieces? What you are looking for is a set of atomic operations you can apply to generate the result. Implement these operations, then start working on a translation algorithm to turn the input data into such operations.

There are several common and friendly representations for such atomic bytecode:

Instruction lists are executed in order. This is your classic assembler listing, without the jumps. A typical “parse file and insert into database” algorithm would generate such an instruction list, and every instruction would be an INSERT, DELETE or UPDATE. Works best when you can read the data and generate the instructions in the right order: if you cannot get the list in the right order from the start, consider another approach.

Dependency graphs work like makefiles: you have several instruction lists floating around with relationships between them, indicating that one list has to be executed before another. A topological sort of the graph results in a single classic instruction list you can execute. A multi-file import, where some files contain data needed in other files, can be the way to go.

Nested scopes are the typical extension to instruction lists: every item in a list can be either an instruction, or another list, possibly tagged with some data. This could be a conditional (if this condition is true, execute this list), a loop (though it is best to avoid these) or a context (a “polygon” scope contains “insert vertex” operations that apply to that polygon). You can even allow variables in a let-in fashion (of which the polygon example above is just a special case) ! Note that nested scopes can be easily represented as XML.

2. Static Analysis

A side-effect of compiling to bytecode is that you get to process the entire file before you actually perform the intended operations. This makes a rollback easier if you notice that there’s an error on the last line of the file: if you make sure that no atomic operation in your target language can fail due to bad input (such as incorrect data values), then you can check your input data for correctness without doing anything to your program state.

Even better, if your compilation process is cheap (linearly traverse a file for parsing) and you have heuristics for predicting how much time and resources your individual instructions require, then you can try to accurately predict the needs of the entire process.

Static analysis also means you can optimize. If, for instance, you’re inserting data into a database and need to resolve names or keys frequently (such as “add this item to list #732″), you can easily construct a table of needed keys (that you can get in one query when the processing starts) using the dependency graph approach.You can also optimize resource allocation by using common register allocation techniques: sort your dependency graph to keep as few resources in memory as possible at any given time.

3. Caching

Try to perform most of the processing offline.

For instance, if you frequently “apply” one file to another, such as a nearly-constant “list of categories” file used to resolve the “category” key in a daily object import, you can benefit from compiling the nearly-constant file to an easily loaded, easily applied format.

You see a cached dictionary that maps keys to categories? I see a DSL that allows dictionary literals as part of the language, and a source file that contains a literal mapping keys to categories, with an interpreter that can apply constant propagation to dictionaries.

Another benefit is when applying changes to mission-critical software. Inserting lots of data into a web database can create a heavy load on the server and make the site unavailable to visitors. It might therefore be preferrable to pre-compile the imported data into requests through a process that keeps a light load on the server, then run the requests.

Besides, with proper nested scoping, you can slice an import into several transactions. This keeps the lock count low, allows spreading the transactions over time to reduce the load, and lets you resume the import process if, for some reason, it gets interrupted.

December 2009 PDF Vulnerability

All file formats follow the same evolution.

  1. They start by grouping together some static content, with some nifty features for presenting and editing that data. Think text files, bitmaps, RTF documents… The file format is reasonably easy to understand, and the reader/writer is so simple that it would take a bad programmer to create vulnerabilities.
  2. Then, they start including plug-ins that let them handle more and more types of contents. This lets you include an image inside an HTML page or an Excel spreadsheet in a Word document. This relies on many plugins for getting things right. It sometimes happens that a given plugin contains a security fault that can then be exploited, for instance Internet Explorer had an issue with images in PNG format. The user would visit a page, that page would display an image, and the computer would be contaminated.
  3. Finally, they need to become interactive, so they include a scripting language of some sort. Excel has a macro system that uses Visual Basic, HTML includes Javascript…

The PDF format followed the same process to end up where it is now. In addition to any static document data (text, vector and raster images) and extended content (flash animations, videos, reader extension signatures) a PDF also contains short JavaScript that let authors create interactive documents. This means a PDF document on your desktop can:

  • Accept user input (such as checkboxes or text fields). The input can be saved to the disk if the reader supports it and allows it (Acrobat Reader, used by the vast majority of computer users, only allows saving a file if its author purchased a reader extensions license and signed the PDF file with it).
  • Change its layout at will, for instance displaying a “spouse” page only if the “spouse” checkbox was ticked.
  • Be cryptographically signed, and display information about who signed it. This kind of signature is actually accepted as valid legal proof in many countries.
  • Compute a scannable bar code from user input, so that it can be printed, then scanned on the other side with reduced error rates.
  • Send data over the internet. It can even send itself as an attachment to an email.

Needless to say, with all these features, there are inevitably going to be some exploitable security issues in the mix. Being a popular program, like Acrobat Reader, only increases the number of black hat hackers looking for vulnerabilities. One of these is the recent CVE-2009-4324 from December 2009. There are many types of vulnerabilities, their common feature being that they end up executing arbitrary operations on the computer (as opposed to the safe operations Acrobat Reader normally allows). These operations are usually to download or install trojans, so that the attacker can gain complete control over the computer.

CVE-2009-4324 is of the use-after-free kind. In short:

  • it creates a resource (which uses some memory),
  • it frees (destroys) the resource to recycle its memory,
  • it writes something to that memory,
  • it attempts to use the resource

Normally, the program should stop at step four and say “you can’t use the resource, it’s been destroyed”. A bug can cause it to believe that the resource is still there. The programmer probably assumed that the memory still contained a valid resource and did defend against the memory containing something else… and accessing that as if it were a valid resource executes some code that the attacker wanted to execute. Bingo.

In the case of CVE-2009-4324, this happens as part of the Doc.media.newPlayer method which, for performance reasons, was not completely implemented in Javascript—a bug in some Javascript code can cause the document to misbehave, but it cannot do anything that the Javascript couldn’t do on its own. Those parts that were written in a lower-level language, with access to the computer, contained the exploited bug.

The bug causes the processor to start executing code at a different memory location. In an ideal hacker world, that location would be precisely where some nasty code is present. Buffer overflows, when used to rewrite pieces of the stack, do allow such deterministic jumps. However, CVE-2009-4324 only allows a jump to an undetermined location.

The hacker solution is to use heap spray. The basic idea is that you have a short piece of code you want to execute (the payload). You create a block from that payload by adding no-ops (machine instructions that say “skip me”) before the payload. Then, you create lots of these blocks in memory, and trigger the exploit.

The exploit causes the computer to jump to an undetermined memory location. If it falls within the no-op section of any of the blocks you’ve created, you win: the computer skips over the no-ops, reaches the payload and executes it. If not, the program will crash. Too bad…

Quick Test

Here’s a very simple question:

How many times can you subtract 5 from 73, and what is left ?

Find out what your answer means by clicking here.

  • An imperative programmer answers, “you can subtract it 14 times and the remainder will be 3.”
  • A functional programmer answers, “you can subtract it as many times as you wish, and you always get 68.”

Improve

« Nobody’s perfect », they say. We all wish for improving ourselves in some areas. Want to lose some weight? Become a better dancer? Spend less time debugging your code? Sound smart in meetings?

There are three exceedingly simple steps to improving yourself. These are, in order:

  1. Identify what you are doing, precisely, that you would rather not do anymore.
  2. Find a way to stop doing whatever you identified in step 1.
  3. Gather enough willpower to follow the way you found in step 2 until you succeed.

Now go forth and conquer!

What?

Fine. I said those steps were simple, not that they were easy. They can still be helpful, though: even if you cannot seem to improve, at least you can find out which of these steps is giving you a hard time, and concentrate on that specific area.

You need to stop

There are always two ways of looking at any given improvement. You either see it as “I started doing something“, or you see it as “I stopped doing something“. Improvement is change, and change means something ends and something else begins. You start being a good dancer, you stop trampling your partner’s feet. You stop writing buggy code, you start spending less time debugging.

We bloggers love splitting people into groups: you are either a can’t-start person, or a can’t-stop person. If you want to quit smoking, but always end up lighting another one, you are acting as a can’t-stop person. If you want to exercise daily, but always find a good excuse not to run your laps, you are acting as a can’t-start person. And you had better find out which group you are in.

Because if you are a can’t start person, and you think of potential improvements in terms of starting doing things, trouble is coming your way.

Step one is all about looking at your problem from the other side. To start doing things right, you need to stop doing them wrong. As you stop making mistakes, what is left usually counts as improvement.

food_anotherslice

Ask and accept

Some humans are blessed with the ability to see, in a clear and unmistakable fashion, what went wrong about something. Enlightenment comes in a slap-your-forehead moment where the root cause of your problems is discovered.

Most of us experience trouble as a hazy and painful feeling that seems to come from everywhere at once, with no clear cause and no clear idea of how things ended up the way they did. If you cannot really understand what you are doing wrong or why you end up in trouble, consider asking someone else.Whether an expert opinion, a different angle on the situation, or the act of putting your issues in words and sentences, you will get something out of it.

This reminds me of Bob. When I was still a young intern, without much experience in the ways of men, an older developer joined my team. Despite his reasonable technical skills, Bob had a conflictual relationship with our manager, and this had a negative impact on team morale. I had been told our manager disliked it when people were late for work, even though he did not insist on the matter too much, and my experience with him confirmed that information. Since Bob was at least fifteen minutes late every morning, I decided to share that information after a quite gruesome clash, thinking it would help :

“You know, he gets angry when people are late. If you can’t come in earlier, you really should tell him about it.”

What do you mean? I’m not late.

You came in at 9:25 am this morning.

No way, I came in at 9:00 am!

I suspected that Bob, out of pride, would deny being late in the morning, but that he would mull over the notion and come in on time the next day. I was wrong, and he was thirty minutes late the next morning.

It is very important to carefully examine any criticism before rejecting it. When someone you know takes the time to explain that you’re doing something wrong, the least you could do is wonder if they are right, or why they thought they were right. Criticism is free advice, carefully selected to apply to whatever you do that annoys others most.

Even if Bob really was on time every morning, my advice was still worth considering because it indicated that some people thought he was late, which is quite important in a corporate environment.

Mental triggers

Once you have identified what you need to stop doing, you can move on to step two: find a way to prevent it from happening. Since you cannot keep thinking about it all the time (and even if you did, it would reduce the efficiency of whatever else you were doing), you need to find a way to remember about your decision when it actually matters.

If you experience a feeling when you perform the unwanted action, you happen to be in luck, because it is far easier to associate your decision to stop with that feeling.

While I have always been comfortable in one-on-one conversations, I used to have a lot of trouble speaking up in groups, because I tend to take a short time to think before I speak, and this means someone else in the group is going to start speaking before I can gather my thoughts. If there were pauses in the conversation, I could certainly pass off as the wise experienced guy in the corner who only speaks up when nobody knows what to do anymore, but most of the time I looked like the shy silent guy in the corner who has nothing to contribute to the discussion.

I noticed that most people who spoke up in group discussions said things that were mostly irrelevant and generally only served as an anchor to remind others of what the position and motives of that person were. A precious few people, however, seemed to always grab the attention of others and say something interesting and relevant. Among these was Jamie, a nice lady in her mid-thirties. After observing her for a while, I noticed a pattern in her way of speaking up: she would repeat the last sentence that was said.

…and we should test if the government servers can handle the load.

Test if they can handle the load, yes. I just received a report about…

By repeating the sentence, she was able to start speaking faster than anyone else, and she used that time to think about what she was about to say next. Not to mention that repeating someone’s sentence implies some level of agreement, which is always good to have in a meeting.

So I started training. Whenever I missed my turn in a conversation, I felt frustrated, which reminded me of my decision to use Jamie’s technique. So, on the next try during the same conversation, I did not think silently about what I had to say and instead repeated the last part of what the speaker said. After a while, I did not have to think consciously about it anymore.

Without a feeling to anchor your reminder to, you will have to find something else.

One trick is to artificially increase the time before you can take the unwanted action. For instance, I live in a flat in Paris and my fridge is a ten-second walk away from my desk, so it’s easy for me to stand up and fetch a quick snack without thinking. I once spent my holidays with a friend who lives in a large house in the countryside, where the fridge was two floors below my appointed room and it took a full minute of navigating slippery stairs and cold, narrow corridors before I could get to said snack, which means I never got to eat without thinking. The longer it takes you to start an activity, the greater the chances that you notice “Hey, I’m about to do this, and I decided I wouldn’t do it anymore!” and desist from doing it.

If the activity is continuous (such as “not washing the dishes“) you can try to make the consequences of that activity as obvious as possible. If the house (or source code) is a mess, then adding small amounts of messy laisser-aller are unnoticeable. If the house (or source code) is cleanly arranged and organized, any amount of mess is going to stand out and be very obvious. It is easier to keep a house or project clean, than it is to clean it up later on.

The will to go on

Finding enough willpower to enforce your decisions can be hard. A good strategy is to know your weaknesses and exploit them. For example, if you hate mediocrity and often think that “I’m worth better than that“, then thinking of unwanted activities as shameful and worthless can give you that little boost you need.

You can even shame yourself into respecting your decisions by telling other people about it. Ask other people to look over your source code daily, even if it’s not an actual code review, and you will often be too ashamed to leave undocumented methods and badly named variables around.

<div xmlns:cc=”http://creativecommons.org/ns#” about=”http://www.flickr.com/photos/avlxyz/2684089255/in/set-72157606283805523/”><a rel=”cc:attributionURL” href=”http://www.flickr.com/photos/avlxyz/”>http://www.flickr.com/photos/avlxyz/</a> / <a rel=”license” href=”http://creativecommons.org/licenses/by-sa/2.0/”>CC BY-SA 2.0</a></div>


693 feed subscribers
(readers who polled a feed this week)