HTTPS: Does it protect?

HTTPS certificate informationThere has been pressure brought on webmasters recently to force all websites to use HTTPS. This has even extended to strong-arm tactics such as threatening to downgrade the searchengine rakings of sites which don't comply.

The latest browsers now pop security warnings on pages with password fields that don't use HTTPS.

A somewhat more extreme development is that Chromebooks are, I am told, disabling access to the microphone and camera unless HTTPS is used, even if the use is on an internal LAN.

The rationale behind this is twofold:


A man-in-the-middle attack  refers to a person gaining access to the data stream between the end user and the service provider.

So, MITM is really just phone tapping, but with computer data. The question of whether you think that 'tapping' of your Internet data is likely to be a problem, depends of course on whether you trust data carriers such as your ISP, or not. If not, then you need to use the equivalent of a voice scrambler. Which, is effectively what HTTPS is.

A standard Web connection has a number of security weaknesses.

There is some controversy over what the term MITM means Some say it only refers to a literal wiretap on Internet cabling; others say that it can refer to any situation where an attacker is able to place a listening device at some point along the data path between the user's keyboard and the destination hard disk.  There are an increasing number of precedents in the security industry for the latter being the accepted meaning today, so I think we can reasonably use it here.

The statistics for computer security breaches suggest that actual tapping of a wired data connection is rare. Also, we've been using the wired Web since  the 90's with no significant number of recorded incidents, so why should this suddenly have become a concern? Good question.

However, the rise of the use of wireless hotspots has created the risk of data interception from the radio signals these use, and there is indeed a possible justification for concern in this case. 

 What is HTTPS?

 Originally created with banking and e-commerce sites in mind, HTTPS encrypts the data stream between the browser and the webserver. It does this using a public/private key pair contained in a certificate. The certificate is typically issued by one of a number of certificate signing authorities,  and in addition to providing the encryption key, offers a degree of proof that the person or organisation operating the website is who they claim to be. 

This answers the two abovementioned concerns:

The scheme is something like this:

HTTPS prevents any attempt to steal your data on the wire.

The classic case for using HTTPS is on banking sites. In such cases, there should be only one data source, the bank's server. When used in that way, it achieves the two stated objectives, of ensuring that the data source is where it claims to be, and that the data has not been read or modified enroute.

For any security sensitive Web work, these two protections are highly desirable features.  When connecting to your banking site or the like, you should always check for the 'closed padlock' security icon, and preferably also check that the name on the certificate matches the company you think ought to be providing the service. If in doubt, don't enter any passwords, bank card numbers or the like. 

In its original form, this is what HTTPS was intended to do, and if used strictly in this way, it does what it is supposed to.

An important point to realise though, is that HTTPS only protects data in transit, roughly from the point here it leaves the browser, to where it first arrives on the webserver. Before or after those points, it offers no protection whatsoever.

Here's the rub: A fairly exhaustive search of IT security websites turned up only a handful of reported thefts of data from on the wire. By contrast, there are countless thousands of reported data thefts from user's own computers, and from website servers.

If we compile a list of the most common attack vectors, we see a picture like this emerging: 

Most points of attack are outside the area protected by HTTPS

The important point is that the overwhelming majority of these attack vectors affect either the end user's computer or the website operator's equipment. Attacks on data carriers' equipment are much rarer, probably because their security is generally much better than that of either of the endpoints.

Whilst HTTPS prevents the data being read enroute, it does nothing to prevent it from being read either on the user's computer, or on the website operator's equipment. Which is where the majority of attacks take place.

Most of the endpoint attacks are not MITM attacks, but a few do fall into that category. For example, a hacker who can place a password-sniffing script into the browser, where that script will send any discovered passwords to him, is carrying out a MITM attack. 

An example of a certificated site with data from multiple undeclared sources.Likewise an attacker who manages to place a script on the webserver which intercepts the POST variables on all incoming page requests, will be able to read any passwords therein, because by the time the data reaches the script interpreter, it has already been automatically decrypted back to plaintext.

Thus, we can see that HTTPS is limited to protecting against a small subset of all attack vectors. Furthermore, it does not even protect against all forms of man-in-the-middle attack.

The current promotional drive aims to get HTTPS rolled-out to every single website on the planet. This is a rather different proposition from its original intended use, and brings with it some problems.

The problems here, arise not through any fault in HTTPS itself, but when too much is expected of the technology. It is after all designed to protect high sensitivity data, but only when that data is on the wire, and only when part of a strictly one-to-one conversation between client and service.

General interest websites differ markedly from high security sites. A major difference is that the user's first request to the site will spawn, in turn, numerous requests to other resources. Advertising figures heavily in this list, though it may also contain javascript libraries, visitor tracking and profiling services, font libraries,  hit counters, social media links, forums, blogs, etc. 

Many of these third party data requests come in the form of javascript. Now,  although javascript cannot easily do damage to the end user's computer, being well sandboxed away from local files, it can read or change any data on the same webpage.

Perhaps surprisingly, this is true even if the javascript was loaded from an entirely different website, even if HTTPS is used, and even if the data in question is a password-entry field on a form which the javascript in question didn't put on the page.

Ahemm... Who does the most spying? Your ISP? Nope. You'd think that shouldn't be the case. But it is.

The risk is mitigated to a large extent if the javascript is on a different browser tab than the one with the password. It is also mitigated somewhat if the password field is in a popup window, because javascript cannot enumerate browser windows-it has to already have the window's handle to access the contents. However an advert running in a popup window is a danger to the parent window, since it can access the contents by way of the opener property of itself.

The issue this raises is due to the sheer number of offsite data pulls that large sites make. In stead of having to worry about the trustworthiness of one site, we have to worry about 20, 40 or more, some of which we've never heard of. If any of these sites has a mind to, it could very easily spy on passwords being typed into the page. All of them are, in very real terms,  potential men in the middle.

Even if you explicitly trust all advertisers with your passwords -OK, hope I didn't make you spill your coffee with that one-  that isn't the end of the story.  The large adsites serve out data to a vast number of client websites, some of the larger client sites in turn have millions of subscribers. The sheer number of vulnerable users is bound to make the adsite a prime target for hackers. Why hack a single large website if you can hack an advertiser and plant password-sniffers on many more computers that way?

The hacker need not even load malware onto the hacked advertiser's site. If it is more convenient he could just put a link in the ad code to his own site. Provided this uses HTTPS, even with a free certificate, the browser won't complain about loading content from it.

Therefore, this situation creates a high security risk. So, how is it handled?

Firstly, on an HTTPS site the browser will allow any third party content to be fetched unchallenged so long as the third party is also using HTTPS. The third party need not be using the same credentials as the main site. In fact with so many secondary connections in use, that would be nigh on impossible to arrange.

If a hacker can compromise an advertising server, then he can send password stealing javascript to every browser which visits a site accepting ads from this server. That could potentially be millions of users. The browser 'padlock' info will not even reveal the existence of the connection to this adserver, or the certificate it uses.

A browser will warn you if any of these third parties is not using HTTPS. In the old days that warning did have some value, it being unlikely that a hacker would own a bona fide SSL certificate. Today, with free certificates available, it will rarely arise.

The need to handle a situation where every page request spawns numerous certificate validations, means that browsers simply cannot display every certificate involved under the 'padlock' information.  So instead, they only display the security information for the actual site requested by the user. It is as if the rest do not exist.

Thus, even if one of the third party sources is a hacker using a free certificate, the browser won't inform you, and the certificate won't be visible under the padlock. 

All in all, a very unsatisfactory situation. There is a multiple risk of MITM attacks from an unknown number of sites, only one of which the user explicitly trusts. The browser remains silent about this risk, in fact creating the impression that all of the data loaded comes from the one trusted site.

So, whilst HTTPS is a useful security feature for highly sensitive data in a controlled environment, it is far less effective in the general browsing arena.

Time for a little demo. After all, seeing is believing. Here is a password field created by this site:

The content in the box below is entirely produced by javascript called from a different domain. In much the same way that advertisers provide content to their client sites.

You'll note that the offsite script can read not only its own password field, but any on the main page too. It doesn't need to be told where the field is, either. It can find it automatically. There's another right at the foot of the page if you want to scroll down and enter something, just to prove the point.

This page is served under an LE certificate from Siteground, whilst the injected javascript is served from Cloudflare under a Comodo certificate. Even this vast difference in the nature of the sources flags no warning.

When you consider that the key advantage of the mass HTTPS rollout is claimed to be protection against an untrustworthy individual at your ISP being able to steal your password, what is the point if the same risk still exists at tens of advertisers?

It might be possible to sandbox the ads on the page so they can't steal passwords, but the issue here is that we have no idea if any given site has implemented such protections, or not. So either way, the reassurance of the certificate info that 'this site is secure' is without any real substance.

It's worth taking a look at a couple of examples of data connections on large websites:

I set a data trace on the browser's Internet connection, and opened a well-known Scottish news site. A respectable journal, this carries no ads. This was the log:

I then opened a well-known international news site which does carry ads:

So in visiting these two sites which I trust, my browser also, without even notifying me, pulled-in data from forty three other sites. All of these 43 additional sites were using HTTPS, but no security information for these sites appeared under the 'padlock' icon. From the user's perspective it was as if they didn't exist.

Most of these sites load javascript into the browser, and this renders it highly likely that they could act as a vehicle for a MITM attack on any logins typed into the page.

If we consider that the connection went through one ISP in a protected state to one trusted and validated site, but 43 other untrusted data sources were involved, then that represents a 2/45 or one in 22.5 success rate at safeguarding the connection from men in the middle. Call it five percent and we're being generous.

There would seem to be a trading standards issue here. A security product with a 5% success rate is not fit for purpose.

I'll just recap that this situation arises not through any fault in the HTPPS protocol, but through its application to an environment which it was not intended to work in, and to do a job which is essentially beyond its capabilities.

So to summarize:

HTTPS is promoted on the strength that it will protect your online data by preventing interception of it, and therefore make the Web a safe place.

There are two distinct issues with this claim:

The vast majority of documented IT security incidents have not been man-in-the-middle attacks. Many of the compromised sites were using HTTPS anyway, and it made not the slightest bit of difference.

That ineffectiveness is not the fault of HTTPS, which was designed with a particular role in mind.  It arises through having unrealistic expectations of it.

When a page includes requests to advertising providers, javascript libraries or the like, HTTPS does not even provide any guarantee of protection against man-in-the-middle attacks.

Again, HTTPS was not designed for use in this kind of environment. It was designed for use in online banking and the like, where the data comes from a single trusted source.

I'd say that the chief concerns here are that promotion of HTTPS in this way, when it has so limited a effect on actual security, is bound to create false expectations in the minds of users, that it will make their browsing safe and secure. In reality it only attempts to address two out of many security issues, and because of the way in which it is deployed, does not properly achieve even these two objectives.

Thoughts on a resolution

The problem is not the product, but the unrealistic claims being made for it. MITM risk or no, the key concern is that of plaintext password storage on website operators' servers. If we can do something to address that one, we can make a really significant advance in the safe usage of Web services. 

The way to achieve this is for the browser itself to handle the one-way encrypting of a password before any other process is allowed access to it.

I would suggest giving the HTML password fieldtype additional properties of encryption='type' salt='number' where the salt provides for uniqueness of encrypted values. Public key encryption could also be supported, as could a choice of encryption types. When set,  the browser sandboxes the password field such that no javascript or other webpage process can read the plaintext contents, only the encrypted version thereof. Additionally, javascript functions which allow the logging of kestrokes are disabled whenever the caret is in a password field.

<input type='password' encryption='sha256' salt='3d2c54b3a7f4cb312c34' >

Once such a scheme is well established, browser authors would make it such that use of unencrypted password fields will trigger a security warning.

Such an arrangement would be compatible with sites carrying advertising.

Session cookies should also be encrypted by the browser, to prevent user impersonation. 

The browser coders are keen to say that webmasters should only use official encryption systems, that they should not attempt to devise their own in case they prove to be weak. Well, here is a situation which just begs for the creation of an official encryption system.

Before the point arises, nobody is saying that such a scheme would be unbreakable. No security mechanism is bulletproof. Only that It would be infinitely better than the present situation. The most important benefit of this browser feature is that by providing readily available, persistent encryption it will discourage the storage of passwords as plaintext.

HTTPS could be used as well, if desired.

Lowdown: Does HTTPS protect your browsing against MITM attacks?

YES if the dialog between your browser and the secure website is strictly one-to-one.

NO if the website carries third party content such as advertising, captchas, tracking or profiling, hit counters, etc.

Since the majority of websites do carry third party content, the answer will usually be no.
Which, ought to be worrying.

Recently Visited