The Investigatory Powers Act

-and why the ISP logging aspect might be fundamentally broken.

The UK Government recetntly passed a bill which, among other things, requires ISPs to keep a log of all websites visited by Internet users for at least a year.

Many people have objected to this on grounds of civil rghts. I would like to point out a more fundamental issue, namely that it is unlikely that an ISP could produce an accurate log of the websites a person has visited, based on the information they see passing through their routers

The reason is that very few modern websites exist as a self-contained unit of data. Open a typical business site in your browser, and the initial data is indeed fetched from the webhost which you yourself specified. Part of that data will likely include Javascript downloader routines which will then seek to download data from other sources, sources which you have not specified.  Examples of other sources might be facebook.com, twitter.com, doubleclick.net, google.com and so on.

The important point here is that you didn't even give any permission for those other-site downloads to take place, and you certainly didn't type 'www.doubleclick.net' into your browser. You probably didn't even click a Facebook icon, either.  Worse, even if you knew those other-site data fetches wrere going to happen, unless you are relatively skilled at IT engineering you probably can't do much to stop them from happening anyway.

From the ISP's perspective, viewing the data passing through a connection, all of these requests are basically similar. There is nothing special to distinguish the human user's original, typed-in data request from the flurry of scripted data requests which followed it.  The ISP can tell which customer account each request comes from, and possibly has some idea what type of computer and browser was used, but has no way of knowing if the request was manual or automatic.

With modern browsers able to open several pages simultaneously and many housholds having more than one computer, it is also not possible for the ISP to determine by timing which of those data requests was the first of a series, and hence the one actually requested.  In practice, the stream of data requests for one page will likely overlap the request streams from other pages, making that impossible.

As regards who is reponsible for each data request, well if you typed the URL in then that's pretty clear, you are responsible. However, if a Javascript program on the website you visited downloads the data, then that is not so clear. The script which fetched the data was put there by the Webmaster of the site you visited, so there is a strong argument that the data request is his legal responsbility, not yours.

My question, therefore, is as to whether ISPs will be able to correctly identify the websites I have visited from the data stream which they see, or whether such logs would include numerous sites which I had NOT in fact visited.   

Let's look at a specific example - If I open the website www.scotsman.com in Firefox with a data monitor active, this is what I see in terms of data requests being made over my ISP account:

Host: ss.symcd.com
Host: cdn.taboola.com
Host: www.scotsman.com
Host: nexus.ensighten.com
Host: res.cloudinary.com
Host: b.scorecardresearch.com
Host: secure-uk.imrworldwide.com
Host: apiv1.scribblelive.com
Host: www.google.com
Host: www.google-analytics.com
Host: cse.google.com
Host: widget-cdn.rpxnow.com
Host: widgets-cdn.rpxnow.com
Host: plugin.mediavoice.com
Host: libs.de.coremetrics.com
Host: tmscdn.de.coremetrics.com
Host: s3.amazonaws.com
Host: plugin.mediavoice.com
Host: use.typekit.net

It may come as shock to see that one simple page request spawns so many connections to other websites, some of which even an IT guy like me has never heard of. This is actually a minimalist example though, and of a page on a respectable, trusted news site. Elsewhere on the Web, it gets much worse than this.

On some other less-reputable websites, for example a tabloid news site, the mere act of viewing a page may cause your browser to spawn over a hundred unsolicited data connections to other domains. Some of these foreign connections might be to sleazy sites with some illegal content, and you would not necessarily know of that happening.

The key question, as I see it, is is whether the ISP would be able to identify and log the one website I have actually visited out of this list.

From my knowledge as an IT professional of how these systems work, I would hazard a guess as NO - they cannot. All of these data requests look broadly similar to the data carrier. As a matter of interest a typical data request might look like:

+++GET 1489+++
GET /johnstonpress/production/Bootstrap.js?v=190 HTTP/1.1
Host: nexus.ensighten.com
User-Agent: Mozilla/5.0 ( X11; CentOS ************** Firefox/48.0)
Accept: */*
Accept-Language: en-GB,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://www.scotsman.com/
Connection: keep-alive

The important point about is that nothing in this 'HTTP GET' request indicates whether the request was made by the computer user, or by a script put on the page by the Webmaster. Sure, the Host value tells us, the end user, that it was not part of the page we originally requested. The ISP has no way of knowing that, though. 

The 'Referer' field, by the way, indicates which website the request came from, but this would be http://www.scotsman.com/ regardless of whether I had clicked a link on The Scotsman which took me to nexus.ensighten.com, or a script had accessed that host.  Thus it is not a valid indicator of whether I, the user, wanted to access content on that host, or not. It is also not mandatory to send a Referer header.

If the ISP's log of my activities were to contain these additional data accesses which I myself did not make or authorise, then it would be inaccurate. It would not correctly indicate my Internet browsing activity, as the Government has required.

Any surveillance of a person's activities which might subsequently be used in a legal case, must be truthful and accurate. (Police notebooks come to mind as an example) Providing evidence to a court which is known to contain inaccuracies, could indeed render the submitter of that evidence liable to prosecution or damages claims.

As a law-abiding citizen I would find it unacceptable that an inaccurate log of my online activities were being kept. If such a log is to be kept, it must accurately reflect the websites which I have intentionally loaded into my browser, and no others.

If for technical reasons that cannot be achieved, then I would argue that my rights to accurate representation at Law are being violated.

In which case, the logging of ISP account traffic must cease forthwith.

As a footnote, the question of whether a given data download was initiated by the real computer user or by a robotic process is extremely important to forensic investigators. I would quote the Julie Amero case as one instance where that was gotten badly wrong. Ms Amero was prosecuted for allegedly downloading pornography on a school computer, when in fact the porn had been downloaded by a malicious script. So, does this issue matter? Yes, is the answer. It most certainly does.


Site: iwrconsultancy Thread: blog/investigatory.htm

Recently Visited