Bad Language!

Having just read of the Tesco Bank and Three data thefts, and now the Linux streaming media issue, I thought it would be worthwhile to publish an article on the nature and state of IT security as a whole, and give a few insights as to why these ongoing breaches just never seem to end. To understand how this situation has come about means talking about a few dirty secrets within the IT industry, subjects which the arbitrators of standard and accepted IT practices would rather were not talked-about too much.

We need someone to blame

The current stance on this seems to be that we blame the banks and e-commerce outfits for data security breaches. The consequence for suffering a data breach is a heavy fine. Yet, despite a fair number of massive fines being handed-out, the security breaches continue unabated. This should tell us something. Indeed, the most likely thing it tells us is that the people being fined are not able to fix the problem, and that is because the mechanisms controlling it are outside of their remit.

Then again, if an e-commerce operator can't be blamed, the hack is typically blamed on users having 'insufficiently complex passwords' or on a 'Man in the middle attack' or such like. Basically, while these may be low-order security issues in their own right, they are not the typical causes of these incidents. It would be more accurate to say that these are straw men put-up to deflect attention away from the real security problems.

If the trading firms are unable to fix the problem though, should we instead fine their website developers and maintainers? Well, even if we did, we would likely find that it didn't make a whole lot of difference either. If it had an effect, it would simply be to deter the reputable website developers from working in those fields, for fear of being fined. That would only exacerbate the problem by making it that the calibre of the remaining software engineers available to the traders was lower, and hence problems even more frequent.

So, I think we have to reckon with the fact that punishing the trading and banking firns - who are basically victims of the hackings just as much as their clients are- is simply a case of victim bashing, and achieves nothing useful. If we want to fix this problem we need to look elsewhere, and the first and foremost thing is to gain a better understanding of why, in the present situation, the trading firms and their software guys can't fix the security problems.

Software vulnerabilities

Firstly, let's take a look at the question of errors in software which make computers vulnerable to hacking. From every software vendor comes a constant stream of patches to deal with the security issues which have been found in supported software. Meanwhile, we are quite sternly warned by the same vendors that the continued use of unsupported or out-of-date software is a serious security risk, thus we had better buy the latest version rightaway.

If we go to one of the security research websites, we find that the software vulnerability reports are monotonously repetitive. In fact, the vast majority of security-critical flaws in software fall into only three categories:

Between these, the first two classes of fault are responsible for the vast majority of software hacks. Exact figures are hard to obtain, but it is likely that 90% or more of all intrusions which rely on exploiting a flaw in software, are due to one or other of this pair. Of course there are other ways of hacking a computer, but for the moment we're talking about software flaws, and in this category these two simply dominate the scene like no others. Cross-site exploits are somewhat less common and mostly result from extremely careless site design. They are also a serious concern, but one that's much easier to fix.

Software flaws are at the root of most malware incidents. Even if social engineering is used to persuade someone to launch a malicious download, that malicious download only got hosted on the website by way of exploiting a software flaw. The email telling you to download the file probably also arrived courtesy of a computer with a software flaw which allowed it to be turned into a spam relay. So, deal with the software flaws and you deal with the consequences. That's why this class of exploit is so important. Prevent the hacker from gaining a foothold on the system, and you forestall a whole range of other potential attack methods.

Even if it were not for the security aspect, the need to constantly patch software is a major source of wasted time, annoyance and problems for users. If the source of these flaws could be eliminated, the need for patching woud be greatly reduced. The time and effort saved worldwide in that way, alone makes fixing these two issues worthwhile.

If you talk to programmers about this, they will tell you that there are ways to mitigate buffer overflows and code injection, and that the problem is with programmers who fail to implement such protective measures. Some many even go as far as to imply that programmers who overlook these protections are fools. This is pure hubris. The same guys probably have hundreds of exploits lurking in their own code.

Thing is, we aren't talking about the need to add a line or two of protective code to a program, and job done. On the contrary, we're talking about situations which can arise many hundreds or thousands of times in a large program, and where omitting to protect even just one such instance could permit a hacking incident. When looked at like that, it becomes clear that securing the whole of a large program by way of add-on protective measures is an extremely difficult task, and indeed may never be completely successful. Usually, is not successful.

The other factor that's not often mentioned is that for all practical purposes, the 'Big Two' exploits are each only encountered in ONE family of programming languages.

Code Injection exploits are a peculiarity of the SQL or 'sequel' database language favoured by website content management systems.

We'll look at this issue separately, but ther key problem here is that the SQL syntax allows embedding of programming commands inside of data. Even, data entered by a website visitor. As far as I know it's the only database language to allow this in ordinary queries. Some other languages may allow that in special cases, but the risk of that occurring is small. With SQL, the risk is ever-present.

Code injection exploits are by far and away the most common form of exploit where database-backed websites are concerned. The real irony of the situation is that the problem could be fixed relatively easily if only the resistance of the software industry to change could be overcome. All that's required is to use another database language. -How hard can that be?

Buffer overflows are a peculiarity of the C and C++ programming languages which are used to write a great deal of desktop software, including operating systems.

In literal terms It's also possible to create buffer overflows in assembly code, but that is to be expected since in that case the programmer has total control over what goes on. It was recently pointed out to me that, suprprisingly, it's also possible to create a buffer overflow risk in that ancient and venerable language Fortran, although the odds of that happening in the wild are insignificant. In real-world scenarios though, it's C and C++ which are the risk vectors.

So, why haven't these age-old issues been fixed yet?

We all know how it goes. Programmers turn out a software product that's shot-through with security holes, spend ages trying to secure it, then eventually decide to declare it 'unsupported' and write a completely new product. Only, they use the same programming language, and maybe even the same compiler as was used to write the previous product. The replacement also turns out to be a security disaster. You might think there was a connection here. There sure is.

If your toaster, washng machine or TV gave constant trouble and had done since new, you'd eventually get fed up with that and go buy another one. More importantly, you'd make sure the new one wasn't the same make as the troublesome one. I don't quite get why, but programmers just don't think like that. A programmer, if they applied the same logic as they apply to programming, would go look for a replacement of the same make, maybe even the same model, and then wonder why it is no better than the last one.

There is no specific reason why these problematic languages have to be used. Desktop software can be written in a whole range of languages, and several other database languages exist besides SQL. Thus, the logical solution to both problems therefore seems to be to change language. Which raises the question of-

Why haven't these languages with weak security been ditched long ago?

The reasons seem to be a combination of:

C and SQL have become heavily entrenched in the software industry

If you work for a software company, you probably need to use code libraries as part of your work, and these will almost always have been written in C. That, and you will probably have to write sections of programs that other coders are also working on. Which, will be written in C. So, nobody's going to let you use a different language.

For smaller software houses there is more freedom of choice in what to use, but you then hit the issue that many language compilers are written in C, so you can quite inadvertently create a program containing buffer overflow vulnerabilities without even writing a single line of C code.

If you buy professional webspace, you find that you have several database packages available to you.. but they are all variants of SQL. So basically you have a choice but it isn't a choice. Even if you set up your own webserver, you still have the problem that the major CMS packages don't allow the use of anything other than SQL. So basically, a website has to be either static or SQL-based.

A new development, transactional or T-SQL at last offers a solution to this one. Although, uptake is entirely voluntary, and  the end user has no way of knowing if the coder user injection-safe methods or not. What is really needed it a platform which no longer supports the vulnerable coding methods.

Their use is perpetuated by centres of learning

If you take Comp Sci at college, you get taught.. C and SQL as the core languages. You will almost certainly be told how to protect against buffer overflows and code injection, but interestingly, I have heard stories of students who were totally unaware that a change of language would do away with the problem. Why, I wonder are they not told that? I think it's reasonable to assume that lecturers and professors of Comp Sci are well aware of the situation, but prefer not to draw attention to the elephant in the room.

In the early days of computers, learning establishments favoured the languages which promoted well-structured programming methods and human-readable code, such as Algol and Pascal. The argument behind this choice was that languages which fail to enforce good practices lead to numerous bugs in code. Funny how that principle seems to have been abandoned in focusing on C, which is about as close to the dubious ideal of 'quick and dirty' as it gets.


Talking to programmers about how these vulnerabilites arise, you often get the response that 'Only a fool would allow buffer overflows or code injections to go through unchecked' -followed by the brash asssertion that they themselves, of course, never, ever make such mistakes. SInce the issue at stake here is the simple act of overlooking one or two unchecked data-entry points in millions of lines of code, I think it's reasonable to say that their claims of perfect, flawless coding are the product of an inflated ego.

In principle this is no different from a worker in any other industry who persists in using a faulty and unsafe tool, or taking risky shortcuts on the job, on the basis that "I know it's risky but I can handle it, because I'm so darn good at it." I think we can see who is the fool here. This kind usually acquire a Darwin Award by falling from height, electrocuting themselves, blowing themselves up, or whatever. In the software industry that doesn't happen. Nobody even gets sacked for writing code with buffer overflows. So, the risktaking continues, along with the boasting.

Gaffertape solutions

It's not as if there haven't been attempts to fix the code injection and buffer overflow risks. There have. The fixes developed are in the nature of gaffertape solutions, though. Data execution prevention, address randomization, parameterized queries, etc all provide a partial solution to the respective problem. As shown so vividly by the Linux GStreamer issue, a far-from reliable fix which is likely come back and bite you by failing at an awkward moment. 

Like wrapping gaffertape around a broken part, none of them fix the underlying problem. All they do is to stave off the day when you have to fix it properly. Which, you would be better off doing in the first place, by way of changing the worn-out component. Job done properly, equals no more trouble. In this case, job done properly by changing the faulty programming languages.

Is there a way forward with fixing this?

C has become so deeply entrenched that any industry-wide change of language would be extremely costly. If that were not the case, a quick switch to Pascal, Ada, Python or whatever, and problem solved. However, it should in principle be possible to create a replacement language with near-identical syntax, in which most features work similarly, which would be a drop-in replacement for most purposes. It's where coders have done things like accessing memory locations directly that problems would arise though. If buffer overflows are to be prevented, that cannot be allowed. In fact this replacement does already in sense exist in the form of C#, although the applicability of C# to general programming is hampered by its need for a near-1GB size runtime library on all client computers.

There would still exist the problem of numerous code libraries, some very old indeed, written in C and containing buffer overflow vulnerabilities. Nevertheless, making a start in dealing with the problem at source is what's needed right now. The rest can follow.

With SQL the problems are in the command syntax itself, so there is no point in trying to create a same-syntax replacement that doesn't suffer from code injection. Thus, any secure replacement would by definition use fundamentally different commands. However, the scope of the problem is smaller than with C, since it typically only affects the way in which programs written in other languages interface with SQL servers. For example, the Web scripting language PHP has builtin commands to send database calls to a SQL server, and these commands form a core part of any database-backed CMS written in the language. It would probably not be too difficult to create an auto-replace utility which changed SQL calls to equivalent calls in a new syntax. The important point is that any replacement syntax must not accept data input as literals, but only as references to variables.

There is an updated form of SQL known as T-SQL (Transactional SQL) which supports a new form of query with which data may indeed be input as variables. It seems that this does indeed offer a solution to the problem of code injection, however its availability on typical webhosting is limited.  It would also only offer a solution if the vulnerable SELECT query syntax were to be deactivated, otherwise you can guarantee it will continue to be used. Definitely one to watch, anyway.

Quality approvals might be one answer

We mentioned the ineffectiveness of fines when applied to banking or e-commerce firms, and that fining their website maintainers would be equally unproductive.

Fines are very effective in other industries, in deterring unsafe practices that lead to accidents. If for example an employee suffers a fall from height due to being issued with worn-out equipment, then the firm responsible is liable to be fined. The fine makes other employers sit up and take notice, those who are also using substandard or worn-out equipment replace it, and the rate of accidents is reduced.

I think we can see why the fines are ineffective here. The aim of fines is to encourage replacement of unsafe tools, and discourage unsafe working practices. The site operators can't easily replace SQL, for the simple reason that few hosting companies and few e-commerce software suppliers offer anything else. When buying desktop or server software, you typically don't know what language that software was written in. Though, you could hazard a guess at C, and you'd probably be right. Thus, they can do little except use the products offered and try their best to patch-over the security flaws.

In other spheres, product approvals are a very effective way of promoting the purchase of safer, better products and discouraging the use of unsafe ones. The BSI 'kitemark' logo and the German TUV badge are two well-respected examples of such. Thus, one way forward out of this mess might be to introduce a quality approval system for software. Security would of course be at the head of such security concerns, and it would, we presume, be nigh-impossible to gain such an approval of any software product built with a language that allows buffer over-runs or code injection.

Likewise a quality approval for website hosting services would include the rider that no database must be offered to clients which permits code injection. Thus, services offering SQL would find themselves at a disadvantage to those who offered a safer alternative. Because, by definition they could not acquire a quality approval.

Having done that, then embark on a publicity campaign to make both the public and site operators aware of the advantages of having quality approval. Explain to the public that for banking in particular, using an unapproved site involves taking an unnecessary risk.

Hopefully in the course of time the inability to gain quality assurance would result in both of these languages disappearing from the scene, and along with their retirement the two greatest security issues of the IT industry will finally bite the dust. 


Recently Visited