Thursday, September 24, 2015

The Saw Doctors | Warwick Hotel, Galway | Thursday 21 September 1989

Where were you on the evening of Thursday September 21st 1989, at the hour of 10pm? That’s a tad over 26 year ago now. Some readers may not have been born, or were too young to remember. For the rest of you – please stop shouting out your answers at the screen! I can’t hear you! … it was only ever a rhetorical device!

Davy Carton
Me? I was making my way through the doors of the Warwick Hotel in Galway’s Salthill district to see the wonderful Saw Doctors. I was on the guest list, which made me West of Ireland Rock & Roll Royalty. I kinda knew some of the band – A couple of them lived together in some Monkees-esque madcap house with their manager and a few of the members of The Stunning (another quality band of 1980s Galway). They were round the corner from me and their place was always a good bet for anyone searching for interesting conversation, a cuppa, or even a bowl of pasta. On that night I was armed with a borrowed camera and had all of two (yes TWO!) rolls of 24 shot generic 200 ISO film and an invitation from the band to take what photos I liked.

Pearse Doherty, John Donnelly, & Davy Carton

John "Turps" Burke, Davy Carton, & Pearse Doherty

Leo Moran & Pearse Doherty

Davy Carton & Leo Moran

Pearse Doherty

John "Turps" Burke
The lineup was Leo Moran (guitar & vocal); Davy Carton (guitar & vocal), John "Turps" Burke (guitar, mandolin, vocal), Pearse Doherty (bass), John Donnelly (drums), Tony Lambert (accordion). The only major change from when I’d first seen them preform was that John Donnelly had taken over on drums from Padraig Stevens.


Davy Carton, Pearce Doherty, Leo Moran
My memory is that it was an absolutely rockin’ gig – but my clearest memories are of the stifling heat. It was a warm September evening outside and the dancing, cheering, singing crowd inside were just sweating. Most of my photos were taken from the audience level – front and centre at the stage. Somewhere along the way I decided that I wanted a better look and a different angle, so I insinuated my way up the side of the speaker stack and onto the level of the band. In my efforts to get the perfect shot, I crept further and further forward until I realised that I was in full view of the audience. Although I beat a hasty retreat, it is for this reason I can sport the rather tenuous claim that I shared a stage with The Saw Doctors! I eventually got enough disposable cash to get the films developed and, I think, I showed them to a few of the guys in the band at the time. But for the most of the time since they have resided in a cardboard box in my various offices and studies as I’ve moved house. I had meant to share them last year on the 25th anniversary of the gig, and I’ve just missed the 26th anniversary, so I reckon that I’d better share them now, or wait until the 30th anniversary rolls round!

These photos (and my photography generally) were never destined for the pages of Rolling Stone or NME, but they are a record of the night. In putting this little post together, it struck me that the live music scene has changed radically in this last quarter-century. Back in 1989 I was probably the only person in the Warwick that night with a camera and now hold the only photographic record of the event. Today, quite a number of fans would have watched the show through the screens of their phones and High Definition clips would be on YouTube and various social media outlets even before the band had finished packing up their gear. I don't yearn for 'simpler' times, I'm just glad that I was there to create a record of a great band on top form, giving it their all on a long ago September evening.

I've added a few shots to the blog post here, but the entire collection can be seen on my Photobucket page [here] and as a Photobucket Story at the end of this post

I hope you enjoy!

John Donnelly

From my notes the set list on the night was:

D’ja see tha’ look Sham?
I Wanna’ Fall in Love
The Ways of the World
When You’re Only 21 (But You Feel Like 69)
Infatuation
Presentation Border
Shamtown
Step One
The Streets of Galway
Stubborn
Why Do I Always (Want You)
Red Cortina
Don’t Let Me Down
You’re In Love With Someone Else (And I’m in Love With You
It Won’t Be Tonight
My Kinda’ Girl
Buffs
25 Quid
Pied Piper
Poison
I Useta’ Love Her
N17
That’s What She Said Last Night
Freedom Fighters
Midfield for the Stars

Drive Away

Davy Carton & Leo Moran: High Lords of Rock & Roll!

Wednesday, September 23, 2015

Excavating Irish Archaeology in the 21st century – an imperfect portrait


In February 2014 I published a rather lengthy piece on the recent financial history of commercial archaeology in Northern Ireland. The paper took each of the major companies and proceeded to look at their highs and lows over the period for which ‘Key Financials’ were publicly available. It then sought to average the data to provide an overall impression of the ‘health’ of the commercial archaeology scene here. For the most part, it illustrated a pattern of decay from highs up to 2008, and plunging Net Worth in the years following. At that point, only one company (incorporated in 2005) had posted results for 2013, but it was sufficient to suggest a modest recovery across the sector. By the time the 2013 results for two further companies were made public, it was clear that this one bonanza year couldn’t mask the poor performance of the other two outfits. While there was still an overall uptick in results, it was nowhere near as pronounced as it had previously seemed.

With the publication of both of these posts, I received a variety of responses along one central theme: Norn Iron’s all well and good, but it’d be great to see what the situation is like in the Republic. My thoughts at the time were quite simple – it’s a much larger proposition trying to conduct the same sort of survey for the south … many more companies, more difficult to access the data, and much more complex financial histories … and that was just the start! Beyond that, I’m afraid that my energies just didn’t stretch that far … most of my excavation career was spent in Northern Ireland … this is where my interests lay and where I keep the axes that I still grind. But it was like a nagging itch that wouldn’t go away … a scab that I couldn’t help picking about the edges. Even an exploratory engagement with the data indicated that the format I’d used for the NI companies just wouldn’t work here. There were simply too many companies to go through in any depth and creating a graph and a data table for each was simply impractical. I needed a different approach. As regular readers of this blog may know, I’ve been trying to leverage some of the functionality of the Tableau business intelligence application to present historical data. I’ve used it to look at the spread of early printing presses, the battles of World War I, the battles of the US Civil War, and viewer figures for Game of Thrones (because, why not?). After all this time, it seemed almost counterintuitive to use Tableau to do what it’s meant to do – business!

The finished Dashboard with data from 2001-2014
The Data
First of all, let’s look at the data and how it’s structured, what it is, and how it was collected. I started with a simple search on the Goldenpages.ie website for archaeologists [here]. I searched the CompanyCheck website for companies with ‘Archaeology’ in their title, selecting anything with an Irish address and submitted accounts. I searched for anything I knew should be on the list, but wasn’t. Finally, I was given additional data by a collaborator who has been pursuing research along a similar line to my own … that’s as scientific as it gets, folks! Taken together, this gave me a ‘longlist’ of 159 companies of one kind or another, including sole traders advertising their services. When whittled down to just those companies that have submitted accounts, this list is reduced to 38 companies of various sizes. I removed two further companies that appear on the Goldenpages list, one is a Northern Irish company and the other is GB based. While I would have liked to include their shares of the Irish market in the dataset, there is no way of telling how much of their revenue was generated in the Republic, so it was the pragmatic choice to exclude them. For the remaining companies I used data publicly available from the CompanyCheck website that is, in turn, taken from year-end submissions made to Companies House. The strengths of these data are that they are simple, robust indices of financial health that are relatively uniform from enterprise to enterprise and do not involve having to wade through the frequently labyrinthine accounts themselves. What makes this data difficult to work with is the relatively limited nature of what’s available. The records for the majority of companies still in business may be frequently mined for data across four or five years. However, this gives little indication of what their performance was like in the period beyond this – and some have (or had) been going since the mid and late 1990s. Where data survives from earlier times, it invariably comes from business that have ceased trading and been dissolved. Inevitably, this skews the data to a significant extent. In some respects, this is reminiscent of the survival rates of archaeological data, where older and older material becomes harder and harder to find as it has statistically less chances of survival. Rather than attempting to hide this deficiency, or abandon the project entirely, I want to highlight it as a caveat. For this reason, I’ve included a histogram called ‘Data Points’ on the Dashboard. When data from all of the companies are viewed together it is clear that the numbers of available accounts only really begin to become robust around 2009 (with 16 instances) and peak in 2010 with 26 sets of data. Before 2009 the numbers of Key Financial data points are significantly smaller, reaching a peak of just nine in 2005 and being totally absent for 2003. This just leaves the ‘what’ of the data. What have I got that I’m using to tell this story. These Key Financials are in four parts. The ‘Cash at Bank’ is what it says – the liquid cash at the bank along with cash in hand. ‘Net Worth’ or 'Book Value' is the amount of the Shareholders Funds, less the Intangible Assets. The 'Current Liabilities' data is the total sum owed to creditors, bank loans, overdrafts, and all other short-term financial liabilities. The final data point, 'Current Assets', is the total of stocks, debtors, cash, and any other miscellaneous assets due within one year. It is important to note that this latter figure will occasionally differ between data supplied by CompanyCheck and other providers, as not everyone includes the cash balance as part of this figure. The last time I undertook this type of analysis (October 2014), I also used data supplied by another provider that has since become defunct. I’m sorry that this source is no longer available as it provided an interesting breakdown of the Current Assets figures and clearly showed how much of these assets were in the form of dangerously unreliable monies owed by debtors – hardly much grounds for comfort! At that time, the Northern Irish data indicated that Current Assets were made up of between 43.30% and 95.07% monies owed by debtors. While this seems quite variable, the average for the four NI companies was 73.43% being in the form of debtors. While I do not have the same level of detail for the Irish data, we can safely presume that it’s not much different.

When I first started looking at this form of data, my instinct was to make it completely anonymous. My interest was primarily in telling the overview story, rather than dwelling on the fates of individual enterprises, however much fun that may be. However, the legal advice I received at that time was that I should be able to prove, if challenged, the source of my data and any conclusions drawn from it. For that reason, I provided an appendix at a slight remove from the text that linked the pseudonym (in that case, the year of incorporation) to the CompanyCheck records I’ve used. While I’ve not used the names of the archaeological companies directly, they could be found. As several companies in the current dataset were incorporated in the same years, I’ve had to come up with some different means of differentiating them. Based on the number of companies that, certainly in the past, were identified by three initials, I used to joke that, should I ever throw caution to the wind and start my own archaeological company, I’d call it TLA … Three Letter Acronym. With this in mind, I’ve assigned each company a randomly-generated TLA of its own that does not correspond to its real name. Anyone wishing to find out the ‘who’s who’ behind the data can find their answers in the appendix I have provided [here]. In this way, I can uniquely group the data while not directly revealing the individual identities behind it, and still satisfy the requirement to allow accessibility for error checking and rebuttal.

The Dashboard
The main image on the dashboard is a line graph that attempts to provide summed data for the entirety of commercial archaeology in the Republic of Ireland from 2001-2014. It is given in absolute values, adding together all the positive and negative results for each of the Key Financial variables for each listed company. As such, it seeks to present a high-level wholistic view of the entirety of the southern Irish commercial archaeological sector. Hovering over any point on this image brings out a ‘tooltip’, giving the year, the name of the variable, and its value. Below this, to the left, is a reworking of the upper graph as averages for each of the Key Financials. It seeks to give an idea of an hypothetical ‘average’ company working in the Irish market over this period. Below this again is the Data Points histogram, discussed above, giving details on the numbers of company accounts making up each Key Financials variable. When the user hovers over any of the histogram bars a simple ‘tooltip’ gives both the year and the number of records. In the bottom right quadrant is a data table giving all figures for the selected data. Although it’s pretty redundant, I’ve added reactive grand totals for each column ... mostly because Tableau allows me to do so and I like it.

As is usual for the Tableau environment, the right-hand edge contains the user-defined controls. At the top the ‘Year’ slider allows the user to select a continuous run of years, trimming off data points from either end, to provide a view of the data that’s of most interest and relevance to them. This acts on both of the line graphs, the histogram, and the data table. Below this is the ‘Key Financials’ control that allows the user to select all or just some of the categories outlined above: Cash at Bank, Net Worth, Current Liabilities, and Current Assets. Owing to technical considerations, this only acts on the main line graph (Absolute Values) and not on the Average Values one. Next comes the 'Active ?' control, indicating whether or not the companies in question are still operational. Below this again is the list of companies, coded by their randomly assigned ‘TLA’.  The colour key to the Key Finanacials variables showing ‘which is which’ is located in the bottom right corner of the dashboard. Taken in combination, the user can easily and seamlessly move from the top level view of all companies, over all available years, and all Key Financial variables, right down to a single variable for one company in an individual year … or anywhere in between. Obviously, if a single company is selected, the main line graph and the Average Values graphs will be identical!

An Imperfect Portrait
Given all these caveats, we can be clear that there are myriad ‘known unknowns’ and we can be quite sure there are plenty of the dreaded ‘unknown unknowns’ lurking out there too. But taken at face-value, what story does this data tell? In the first instance, the Cash Story shows a considerable rise to something in excess of €1.8M in 2004, dropping to almost €1.5M the following year. This then continues to rise, eventually peaking at around €6.8M in 2008, only to fall to just over €1.97M in 2009. This 2008 to 2009 period is, obviously, the critical point in terms of the effects of the economic downturn. Personally, I was surprised to such a rapid recovery in the time following, with rises to €4.75M in 2010, and €6.52M in 2011. These figures decline to €3.5M in 2012 and €2.9M in 2013. Given the relatively small numbers of accounts currently available for 2014 (11 sets of accounts, vs. 19 for 2013), the apparent catastrophic drop to c. €1.86M is not a reality, merely a matter of lack of data. The Average Values graph provides a valuable calibrating influence here, showing the same massive drop from 2008 to 2009 and a much smoother, softer recovery than the main graph. The Average Values graph also shows a marked dip in 2005 that does not appear to be related to a smaller number of available accounts. It would appear that this dip is ‘real’ in the sense that it accurately reflects the available data. However, it would appear to be an artefact of a larger number of relatively poor sets of accounts. While there may be a genuine set of economic reasons and events at work here, it would be best to treat data of this age with caution. One of my collaborators suggests that the 2005 dip reflects 'the period of the dotcom crash and the beginning of the second phase of the celtic tiger fuelled by revenues from construction rather than the export driven first phase between 1998-2004'. The story of the Current Assets data is broadly parallel to the Cash story throughout this period. There’s continuous upsurge through the available data, to an all-time peak of €16.8M in 2008, with a fall to 9.58M in 2009, followed by recovery into 2011 and decline through 2012 and 2013. The Average Values data is pretty similar, showing the 2005 dip, the 2009 crash and a steady tail off in the years since, with no obvious signs of growth or recovery. Differences between the two images would include the Average Values don't show an immediate bounce back in 2010, thought there is continued decline from 2011 onwards. Also, the Absolute Values show a marked peak in 2008, while the Average Values paint the picture as more of a 2007-2008 plateau. Overall figures for Net Worth, not unsurprisingly, closely mirror the Current Assets data for both Absolute and Average values. Here we see a relatively gentle rises to c.€2.9M in 2005, reaching its zenith of c.€16.6M in 2008 … all before falling to €7.8M in 2009. Think about it … the best part of €9M was wiped off the value of Irish archaeology as a commercial concern in just one year … that’s a loss of almost 50% of the value of the entire market. Worse than that, the combined Net Worth fell further to c.€6.25M in 2010. Despite making some of this ground back in 2011 (€6.35M), there has been a steady downward decline to c.€2.19M in 2013. The situation is largely mirrored in the Average Values data, showing the ‘2005 dip’, the 2007-2008 plateau at c.€2M, dropping to €487K in 2009, and down to c.€424K in 2010, and only minor changes and fluctuations in the time since, showing general decline. Current Liabilities tell a different and, frankly, more disturbing story. From remarkably modest levels of debt at the beginning of the century, liabilities increased inexorably over the following decade. By the peak in 2008, liabilities had increased (in absolute terms) to c.€3.28M. However, by the time the other variables crashed in 2009, Current Liabilities had increased to €4.81M. This is understandable, as although profits and Net Worth etc. may have crashed, company debts would not have reduced in the same manner. While the other variables largely indicate a partial recovery by the following year, the Current Liabilities increased to a massive c.€6.46M. The story in the time since has would appear to be one of steady debt reduction, through whatever means. The Average Values graph tells a somewhat different story. Here, Current Liabilities generally increased from the beginning of the decade up to 2008, where they hit €410K. The average liabilities, mirroring the other Key Financials, shrunk in 2009 to c.€300K. They remained relatively steady from this until 2011, and in decline in the period since, hitting €124K in 2013.

Well … that’s what the data appears to show … and that’s why it’s an imperfect portrait. No more than ‘normal’ archaeological data, these used here include caveats and lacunae. For all the issues that surround them, it is clear that they do give an overall impression of this period that is consistent with the accepted narrative of growth in the sector up until 2008, followed by a catastrophic crash and some combination of recovery and stagnation in the period since. While the available data undoubtedly capture a good likeness of the crash and its aftermath, the data from before this point is sadly lacking. For this reason, I think that although this exercise has some merit – chiefly in attempting to put actual Euro values on the entirety of the sector and attempting to quantify the extent of the crash – there remains much work to be done to expand and flesh out this portrait of an industry.


The dashboard with data selected only for still-active companies
My original blog post ended there. However, one of my collaborators suggested that I refine the data further to allow the exclusion of companies no longer in existence. In this way, a secondary portrait could be constructed - one of the survivors of the financial crisis who have gone on to dominate what's left of the archaeological sector in the post-Celtic Tiger landscape. These are the companies still in business, still excavating sites, and still providing employment. Screening out all the inactive companies, and the ones where there has been a significant time lapse since the last set of published accounts (the 'Unknowns'), we are left with some 18 companies that have associated data covering the period from 2009 to 2014. In some respects, it is unfortunate that the data doesn't go back even a single year further as it would have neatly captured the 'prelapsarian' state of the archaeological economy. However, this 'After the Gold Rush' information is valuable as it clearly shows the attempts at recovery from the aftermath. From this we can see that the story told by the Cash at Bank show a recovery to c.€2.8M in 2010, from a low of c.€848K in 2009. This increased to c.€3.6M in 2011, but has been in decline ever since, hitting €2.9M in 2013. The Average Values graph tells a similar story of increase from a low in 2009 (c.€106K) to a high in 2011 (c.€212K), and decreases in the years since to c.€173K in 2013. The Absolute Values of Current Assets tell a broadly similar story of increases to €6.36M in 2010 from €3.32M in the previous year, and continuing decline in the period since, down to c.€4.47M in 2013. The Average Values for this Key Financial, however, tell a quite different story, one of continued decline from €415K in 2009 to €262K in 2013. The Average Values for Net Worth tell a similar, if parallel, story of steady decline from €260K in 2009 to c.€129K in 2013. In both cases, there is a noticeable uptick for 2014, though this would appear to be an artefact of a smaller number of returns submitted in 2014 (10 sets) as opposed to 2013 (17 sets). It is for the same reason that a degree of caution should be exercised in comparing the 2009 to 2010 results (eight sets of accounts vs. 17 sets). In terms of Absolute Values, the Net Worth metric may be seen to rise from an historic low of c.€2.07M in 2009 to a high of c.€3.13M in the following year, with gradual decline in the period since, to some €2.19M in 2013. The story told by the Current Liabilities data is very similar to the overall view in that it hits c.€1.58M in 2009, but continues to increase in 2010 to €3.88M, declining ever since, eventually hitting €2.35M in 2013. The Average Values data is broadly similar, showing Liabilities at €198K in 2009, increasing to €228K in 2010, but continuously falling, hitting €138K in 2013. One of my collaborators in this project has suggested that this reduction in the overall levels of debt could, in part, reflect companies divesting themselves of expensive fleets of leased motor vehicles, and surrendering leases on office and warehouse space. If this is the case, it raises new and troubling questions about the storage of the physical and digital archives from previous excavations.

Looking at the individual surviving companies - even on the limited timescale available - it is clear that all have suffered some form of financial pain. For some it was in 2010, or 2013, rather than 2009. For some there have been rapid rises and falls, while for others it has been a more steady, gradual decline. Some have shown a remarkably resilience, bouncing back to having their best years on record. For some, the most recent set of accounts show their worst financial years yet. While there are a number of Lamontian 'green shoots' visible, the general story is one of continued decay and stagnation. It is tempting to see the 2008-2009 period as nothing less than an extinction-level event for the Irish commercial archaeological sector. While several of the mega-fauna trundled off into extinction around this time, there has been relatively little evidence of others striving and adapting to fill the recently-voided niche. There's a reason for this ... there's still one of the great predators stomping about the jungle. For example, in 2012, the combined Current Assets of the other 16 companies came in at  €2,838,655, while for this one enterprise it was €2,329,920. To put that in context: this one company represents 45% of the total amount. And it's not just this one Key Financial either. In the same year, the Net Worth of this company was €1,774,897, versus €992,845 for the others, combined - 64% of the total. Their Cash at Bank was €1,928,488 (54%) as opposed to €1,594,111 for the rest. Significantly, their Current Liabilities were €673,683 (26%) in comparison with €1,880,273 for the combined remainder of the field. By contrast, 2014 was their worst year on record. Even still, they registered Current Assets of €1,600,066 against €1,431,318 for the combined other eight companies for whom data is available. This figure represents 53% of the total. Similarly, their Net Worth of €1,563,583 (66%) compares favourably against the combined €782,971 for the rest. The story is the same for the Cash at Bank, with this one company holding €971,251 (52%) against €897,482. The only real change is that this company's Current Liabilities rose to €382,841 (36%) against some €682,148 for the others. In short, this one company is the most cash rich, is worth more, has more assets, and fewer debts than the rest of the sector combined.

But what does this mean for the future state of commercial archaeology in the Republic of Ireland? Right now, I think it's too early to say. Despite there being some signs of financial recovery, it is extremely unlikely that the sector will see anything like the period around 2007-8 ... at least not in the short to medium term. For now, it seems that the sector is in, if not actual decline, then a period of prolonged stagnation. Only time and additional data will tell. In the meantime, I'll keep collecting data and attempting to paint a better, more complete portrait. In any case, the data and my Dashboard are available to all with an interest, to drill into and find your own stories, present your own portraits, or examine the extinct and future mega-fauna of the Irish archaeological landscape.




Notes
For the best viewing experience of the Tableau dashboard, I would recommend going to Full Screen mode (F11) … there will be less scrolling needed!

The CompanyCheck website gives all data as pounds (£), but it is clear from the more detailed accounts that this is merely a labelling error for the Irish companies, and all are meant to be shown in Euros (€). The obvious exceptions to this should be company accounts for 2001 (as the RoI changed from the Punt to the Euro on January 1 2002). However, of the two sets of accounts currently in the dataset, one is clearly labelled as being calculated in Euros (SRJ) and the other (GSI) does not have the currency listed. In the face of this, I have elected to leave this one example as is and presume that it is calculated in Euros.

In so far as I can ascertain, the CompanyCheck data is correct and without error. The one exception to this is the entry for one company that lists two different sets of data for the year 2010. In the absence of any other evidence, I have assigned the earlier data set to 2009.

Access the dashboard directly at the Tableau server here.

Excavating Irish Archaeology in the 21st century | Appendix | Who’s Who


Based on the legal advice given to me, I have taken the decision to explicitly link the anonymised companies discussed in the main text to their actual names and data. The reason for this is to provide full transparency in where my data is coming from and so that it may be checked for accuracy and inconsistencies. I have also chosen to place it at a slight remove from the main text, so that anyone not wishing to know the details of any individual company does not have to be confronted with it. The companies are listed in the main text by a randomised three letter acronym (TLA) and that format is preserved here:

AJM       [Last accessed August 24 2015]
LYG        [Last accessed August 24 2015]
RTI          [Last accessed August 24 2015]
LCO        [Last accessed August 24 2015]
JKD         [Last accessed August 24 2015]
XMN      [Last accessed August 24 2015]
PJX         [Last accessed August 24 2015]
IST          [Last accessed August 24 2015]
MOR      [Last accessed August 24 2015]
NRE        [Last accessed August 24 2015]
FEJ          [Last accessed August 24 2015]
SKR        [Last accessed August 24 2015]
CTJ         [Last accessed August 24 2015]
ECH        [Last accessed August 24 2015]
WRS       [Last accessed August 24 2015]
QYK        [Last accessed August 24 2015]
DKF        [Last accessed August 24 2015]
JEA         [Last accessed August 24 2015]
WNB      [Last accessed August 24 2015]
HJE         [Last accessed August 24 2015]
DGL        [Last accessed August 24 2015]
KEA        [Last accessed August 24 2015]
LCE         [Last accessed August 24 2015]
SRJ         [Last accessed August 24 2015]
ABW      [Last accessed August 24 2015]
QHR       [Last accessed August 24 2015]
GSI         [Last accessed August 24 2015]
DJY         [Last accessed August 24 2015]
EHU       [Last accessed August 24 2015]
SJE          [Last accessed August 24 2015]
AGK       [Last accessed August 24 2015]
KEB [Last accessed August 24 2015]
TRA [Last accessed August 24 2015]
JTD         [Last accessed August 24 2015]
MWH [Last accessed August 24 2015]
GHR [Last accessed August 24 2015]
SJT [Last accessed August 24 2015]

Wednesday, September 16, 2015

Digging Dangerous Data | Ashley Madison & the Archaeology of the Now

Introduction
In July 2015 a large amount of data was stolen from Ashley Madison, an online business dedicated to facilitating people who sought to have extramarital affairs. The hackers who stole the data, calling themselves ‘The Impact Team’, attempted to use it for blackmail, insisting that Ashley Madison and fellow Avid Life Media site EstablishedMen.com be permanently shut down. On June 22nd, when the company failed to comply, a sample of the data was released publicly. Obviously, negotiations didn’t go as well as might have been hoped and in August the full dataset was made available on the internet. Late in the same month a further data dump was made available that included a number of corporate emails, including a substantial number from CEO Noel Biderman. Since that time, it is believed that a number of enterprising petty criminals have attempted to use the Personal Identifiable Information (PII) to blackmail alleged account holders.

Let me be clear. There are tangles of legal, ethical, and moral arguments in every sentence and clause in the above paragraph. I don’t hope to offer a solution to any of these … not one! If you think that you can deliver an all-encompassing answer or firmly-established response to any aspect of this in a few sentences, you’ve not thought about it enough - you really haven't. In some respects, all these conversations are academic – the data now exists in the public sphere. The initial data dump was made to the so-called ‘Dark Web’ Tor Network that can only be accessed via a specialised browser via an encrypted connection routed through a number of proxy services. However, the files quickly migrated to bittorrent, a peer-to-peer transfer protocol. As Alex Hern explains in The Guardian: 'The file is broken up into multiple blocks, which are then shared directly from one downloader’s computer to the next. With no central repository, it is all but impossible to prevent the transfer, although a “magnet” link – a short string of text telling a new downloader how to connect to the “swarm” of files – is still required'. To cut through the technobabble: whether or not you like it, hate it, are embarrassed by it, or whatever – this is data that is not going to go away. As Wikipedia reports: ‘The parent company Avid Life Media, which owns the site, has offered a reward of C$500,000 (£240,000) for information about the Ashley Madison hackers.’ So what? … Once that the cops have chased after and caught ’em … they’re punished, they go to jail … and the data is still there, available to everyone who wants it. Private Bradley Manning (now Chelsea Manning) stole something in the region of 750,000 classified and sensitive documents relating to the US military and diplomatic organisations. In 2013 she was sentenced to 35 years for violations of the Espionage Act and will be eligible for parole in 2021. But the documents, passed on to WikiLeaks, are still freely available. With regard to both instances, there are questions about the legality and morality of collecting and disseminating the data, but they really matter little terms of the fact that, once released, it can’t be erased or eradicated. There is no getting this genie back in the bottle.

Now that the Ashley Madison dump is available, what are we going to do about it? And by ‘we’ I mean archaeologists and historians. It may seem counter intuitive, but this is pure archaeology and history territory. If you think that historians are only properly employed in dusty archives, searching through government documents, you don’t understand the range of what their skills can be applied to. By the same token, if you think that someone can only be a ‘real’ archaeologist if they’re covered in mud, carefully uncovering a decorated pottery vessel before the bulldozers track in and destroy the lot, you’re equally mistaken. These are both aspects of what we do, but neither goes anywhere to encompass the totality of what either profession actually does or can achieve. In essence, both are really about using available data to tell stories at varying scales – from the individual, to countries, to continents. More importantly, both are adept at assessing the quality of and bias in data … basically, we were made for this data and this data was made for us! When the archaeologies and histories of the 21st century are written they will have to incorporate WikiLeaks data if they are to fully understand certain aspects of military and diplomatic events and relationships. In the same way that Cabinet papers released under the thirty-year rule can give a profoundly different insight on public actions and statements, these documents will change how we see historical events. While we’re, generally, a bit more prudish when it comes to stuff involving genitals, the Ashley Madison data is going to be just the same – no self-respecting historian of the future is going to be able to talk about early 21st century society without dealing with this data dump. On top of all that, it’s likely that stolen data dumps will only increase in size and frequency over the next decades and, thus, in importance for future researchers. This feeling is reflected in one of the statements by Avid Life Media, saying that this form of hacking and dumping ‘may now be a new societal reality’. Still not convinced? Let me put it another way … what if we excavated a data source that gave information on some aspects of the sex lives of individual builders of Stonehenge or Newgrange? We’d be all over that like a shot! 

Archaeologists in particular have a love of chronology and location and the Ashley Madison data has both in spades … the only difference is that their relative qualities are inverted. It’s usual on a regular archaeological excavation for the chronology to be a bit imprecise, but the location is well tied down and understood. For example, the pottery sherd came from this particular layer in that portion of the ditch on the site that is at that location in this country ... but the radiocarbon date on the charcoal from the same layer gave a range of 60 years. Seem familiar? Here the situation is that we can’t always be sure if the name attached to the account really relates to that specific person in that specific town (more on this later), but we can tie transaction data down to the second. In every real respect, this is data eminently suited to analysis by archaeologists and historians. The only real difference is the passage of time … and even this isn’t as much of a consideration as it might appear. Not too long ago the general working practice in parts of the UK was to machine off the Medieval remains to get to the more important Roman stuff … then that changed and we only machined off the post-Medieval to get to the Medieval, and now that too has changed. A little over 10 years ago I was working on the excavation of a brickworks that went out of business in 1922. Between the famously divisive ‘Transit Van’ excavation and work like Andrew Reinhard’s Atari video game burial dig, the age of what is considered to be ‘acceptable’ for the interests of archaeology and archaeologists has been brought closer and closer to the present time. All I’m proposing is that we change the scale from a couple of decades to a couple of months.

The data dump
With that in mind, it’s probably appropriate that we examine what the data dump (the ‘site’) actually holds and how it is structured. The 9.7Gb compressed archive (I've not accessed the 20Gb email dump) contains the site’s a large amount of internal corporate data along with the user database. The latter contains users profile information, including names, street addresses, and birth dates.  There are lists of personal details including data on whether the user smokes or drinks and the type of individual they’re hoping to encounter, along with their preferences in terms of sexual desires and the acts they’re willing, interested, or able to perform (via Alex Hern, The Guardian). The majority of interest has centred on the portion of the database that holds users email addresses as it is significantly easier to search and interrogate than other portions of the dump. Much has been made of the fact that addresses in the US and UK military along with a variety of companies and seats of learning have all appeared in the database. As Hern and other commentators point out, as much fun as this is, this is a relatively unreliable means of assessing who had an account. In the first instance, Ashley Madison did not require that an address be validated before it could be associated with an account and thus there are multiple cases of patently false email addresses in the data. The Wikipedia article on the data breach notes that ‘people often create profiles with fake email addresses, and sometimes people who have similar names accidentally confuse their email address, setting up accounts for the wrong email address.’, before going on to note that accounts could have been set up as part of office pranks. As an aside, I’d just note that if you work in a place that considers creating a fake account with an infidelity facilitating website an appropriate fun endeavour, you work with asshats and need to find another job! Herne’s piece in The Guardian briefly mentions that a further portion of the database contains details of credit card transactions, but does not include sufficient information to steal cash.

The next aspect I’d like to very briefly touch on is the business model used by Ashley Madison. It seems that, unlike many other dating and hook-up sites, Ashley Madison does not require a monthly subscription to keep an account active. Instead, they set a credit cost for men to open chat sessions and send messages to women. It also charges for men to read messages sent by women. There’s a premium rate for men to have a guaranteed affair, and even more money can be spent on sending gifts of animated gifs etc. From what I can gather, the only people paying for anything are men.

Colin Gleeson, writing for the Irish Times on July 21st, notes that ‘Figures released by the website in 2011 said there were more than 40,000 Irish members. However, a graphic called “the global infidelity map”, published on the website’s Twitter account a fortnight ago, outlined its per capita membership in countries around the world. It indicated that 2.5 per cent of the Irish population were members, which equates to approximately 115,000 individuals.’ He also notes that this data indicates that Ireland ranked 10th of the 45 countries with access to the Ashley Madison site. Again, more on this later ...

The Credit Card data
So, who’s using this data and to what ends? As you’d expect, other than the attempted blackmailers, the first ones to pick over the data have been journalists looking for a salacious titbits and they’ve certainly found them. Among the most high profile of those outed is the reality TV ‘star’ Josh Duggar (if you’ve never heard of him, be grateful – your life is a better place for it). While I care little for him or his particular views on religion and politics, I’m of the opinion that nothing good can come from shaming people – both the private individuals and the vaguely famous – though that’s not stopped the practice! Annalee Newitz at Gizmodo has been doing some excellent technical work analysing the source code, specifically seeking to understand how ‘bots’ (fake profiles) were used to chat to users and (allegedly) relieve them of cash for the privilege, when they thought they were talking to real women. Data Scientist Zack Gorman published an Ashley Madison Users By Zip Code map on Tableau Public, but only using data for the US (excluding Alaska & Hawaii), though he received some criticism from Reddit users for his actions.

Right now we’re in a situation where the analyses being carried out are either exclusively concerned with the US, B-List celebrities, and the prevalence of fake ‘bot’ women. Other than shaming individuals, I’ve got no problem with this … it’s just that while the Irish Times have made some general statements on the topic, there appears to have been little if any attempt to see what the data means for the UK and Ireland generally. It also appears that no one has attempted to use any of the financial data available in the dump to see what light it sheds on our society. As noted above, Alex Hern’s piece in The Guardian mentions ‘a database of credit card transaction information’. I’m not sure if he’s looking at the same data I’m looking at, but what I see is not a database. It’s a collection of 2642 .CSV (comma separated values) files covering the period from March 21st 2008 to June 28th 2015 - by my quick calculation, this indicates that records for only 13 days are missing – pretty dashed comprehensive! Each one appears to represent the totality of the day’s credit/debit card transactions and, I surmise, is a daily download for use in Ashley Madison’s Business Intelligence environment of choice. The reason that they appear to have been ignored until now is that while they are individually easy to work with, their sheer volume means that they take significant effort and energies to sort, search, and collate. As one commentator on Reddit says: ‘to search for a single name in the CC files would be something that would take a lot of time and effort. They are individual files per date, over years, not easily searchable’. But that is exactly what I’ve done. Even if opening each file, filtering the data for British and Irish material, copying it to a master file, and closing it down again only took 20 seconds, the whole procedure would have taken me a little under two working days … which is another thing to note about those of us with an archaeological background – we’re tenacious! As numerous commentators have noted, using the dump of account details and email addresses is fraught with problems. These include the free accounts set up by the curious who never progressed with their involvement, those who had accounts deliberately or accidentally set up in their names, or even spouses wanting to verify if their partners were already users of the site. On the other hand, I would argue that if you’ve actually gone so far as to spend money on the services offered, chances are that you’re pretty committed to the notion of infidelity. It is my opinion that – so long as these files are genuine – this is the most accurate and honest way of assessing involvement and participation in the site … and we’ve every reason to believe that these are genuine, including statements by Ashley Madison themselves.

Credit Card Data Structure
The first thing to explain is how these files are structured. The 2642 .CSV files range in size from 15Kb to 3.4Mb. While there’s variation, and it’s far from being a continuous and unbroken upwards progression, it really does show how the company expanded and grew during the period under review. Early files have only a couple of thousand lines of data and are confined to the company’s core US and Canadian markets. At the other end of the range, the data frequently exceeded 10,000 to 12,000 data rows, representing transactions from most (if not all) of the 53 countries where the company was active. Even if we only graphed the file sizes of each .CSV document we’d have an eloquent testimony of how the company grew and expanded over this period. Whatever the size of the documents, the internal format and layout was always the same. Going by column they were:

Column A: ACCOUNT
This is the account number into which the money is being paid. There are 41 different account numbers each from eight to 10 digits in length.

Column B: ACCOUNT NAME
Again – pretty simple. This is the name associated with the account number (above). Ashley Madison and parent company Avid Life Media (ALM) don’t just have one website and one service. Apart from the main AM site, the same company also runs EstablishedMen.com. In the British and Irish data there are 41 Account Numbers associated with 26 different Account Names. It appears that each Account Name can have multiple Accounts. The Account Names include variations on ADL Media; AMDA; Ashley Madison; Avid Dating Life; CL Media; EM Media; and Swappernet. AMDA is probably not the American Musical and Dramatic Academy, but CL Media would appear to refer to CougarLife, a site for younger men to meet older women and EM Media is almost certainly EstablishedMen, the site for helping younger women meet older men. Swappernet is just what you think it is … if you have no idea what it is I can only reiterate Bartleby’s address to the female board member in Dogma ‘You, on the other hand, are an innocent. You lead a good life. Good for you.’

Column C: AMOUNT
This is the amount of money paid from a credit/debit card to Ashley Madison. There is no indication as to the currency that this is calculated in, but as ALM is a Canadian company, I’m presuming that it’s in Canadian Dollars (C$).

Column D: AUTH CODE
The Authorization Code is issued by the credit card holder’s bank to indicate that the charge has been approved. These codes are usually six or seven digits in length and can be either alphanumeric or plain numeric. Crucially, these codes are only issued when the charge has been approved. Thus, we can be certain that the amounts listed as charged were approved and were paid to Ashley Madison. The only exception to this is when the merchant takes payment without the authorisation code. If an authorization code is not issued, the merchant can receive a "no authorization" chargeback. In a chargeback, any payment the merchant receives is reversed by the card issuer (via eHow).

Column E: AVS
This is the Address Verification System. This is a system used to verify the address of a person claiming to own a credit card. The system will check the billing address of the credit card provided by the user with the address on file at the credit card company (via Wikipedia). From what I can see, this field will be populated if the Authorisation Code (Colum D) is populated, but will be blank if the Error Code (Colum P) is populated 

Column F: BRAND
The Brand refers to the type of card that was used to make the payment. Each credit/debit card company is represented by an abbreviation. For example, VI represents Visa, while MC (unsurprisingly) stands for MasterCard.

Column G: CARD ENDING
These are the last four digits of the member’s credit card number. I have not conducted any analysis on these and I do not intend to discuss them further.

Column H: CVD
The CVD is the Card Verification Data – the three digits on the back of your card. It is used in ‘card not present’ transactions and was instituted to assist in the fight against card crime. No actual CVD numbers are present here, merely the list of returned transaction codes. Visa and MasterCard, for example, use M for ‘Match’, N for ‘No Match’ and Y for ‘Non Applicable’. American Express, apparently wanting to be a bit different, went with M for ‘Not Applicable’ and Y for ‘Match’ (via Chase). I have not conducted any analysis on these and I do not intend to discuss them further.

Column I: FIRST NAME
This one should be easy. This one should contain the first name of the payee. SHOULD … but doesn’t always. True, there are a number of cases where this field is populated by a recognisable first name, but these are the minority. Instead, this column is mostly filled with five to seven digit numbers. In the absence of better understandings, I’m presuming that this is actually the user’s membership number. I’ve used the ‘first’ and ‘last’ names in the following analyses to ensure that when I’m examining data at the individual-level that it is consistent and actually relates to the same individual/account.

Column J: LAST NAME
This column sometimes contains a surname when the first name is in the previous column. Sometimes it’s blank (especially in the earlier data). More often than not it contains the full name or pseudonym of the user. Like the data in the First Name field (Column I) I’ve used it as a visual reference to ensure that individual-level data is correct, and I've created a concatenated field of First and Last Names to create unique account identifiers. These have been used to clearly separate out user data, but at no point are names or pseudonyms referred to in the text.

Column K: MERCHANT TRANS. ID
The Merchant Transaction ID is a seven or eight digit code (I think) produced by the merchant (Ashley Madison in this instance) to identify each purchase. In a minority of cases this is populated with an alphanumeric, frequently including the words ‘Premium’, ‘Priority’, ‘Refund’ … sometimes in combination. I have not conducted any analysis on these and I do not intend to discuss them further.

Column L: OPTION CODE
This column is blank for all of the British and Irish data. I have not conducted any analysis on these and I do not intend to discuss them further … obviously …

Column M: DATE
The date is given in the format: Day/Month/Year Hour: Minute: Second (e.g. 28/03/2008 00:51:22). No archaeologist has ever worked with better chronological data than this! The only issue I have not been able to work out here is which time zone this is tied to. As Ashley Madison have their corporate headquarters in Toronto, it seems plausible that their systems would use local time - Eastern Time Zone (UTC-5:00), but I can’t be certain.

Column N: TXN ID
The Transaction ID is, essentially the ‘proof of purchase’ for the user. In the current data set they are usually nine to 10 digit numbers. In a minority of cases the ID is a 36 character alphanumeric and appears to correlate with an alphanumeric Merchant Transaction ID. I have not conducted any analysis on these and I do not intend to discuss them further.

Column O: CONF. NO.
I presume that this stands for Confirmation Number and is a seven to 10 digit number. From a quick check of the data, it appears that this is usually identical to the data in the TXN ID Column (Column N). Within the current data set, only 150 entries do not show a match between TXN ID and Conf No and these appear to correlate to instances where the Merchant Transaction ID is an alphanumeric. I have not conducted any analysis on these and I do not intend to discuss them further.

Column P: ERROR CODE
In the vast majority of cases this field is blank as the transaction processed without any error. However, at the bottom of each file there is a portion where uncompleted, failed, or stalled transactions are collated. While a transaction in this section will frequently have most of the rest of the data in place, it is unlikely to have an Authorisation Code (Column D) and an AVS code (Column E). Some may not have a pass flag for the CVD check (Column H). A collaborator on this project informs me that these codes are unique to individual merchants and may relate to various reasons why the transaction failed or was declined. For instance, the AVS check examines the address associated with the card and the user. Ostensibly, this is to prevent fraud, but based on some of my visual examinations of the data may occasionally be due to incorrect home addresses being entered into the system. This may be due to genuine errors or may stem from attempts by users to conceal their real world location and failing. Other reasons for transactions to error out may be fraud blocks put in place by the card holder’s bank or lack of funds/credit.

Column Q: AUTH TYPE
The options for what I presume is the Authorisation Type are either ‘Final’ or ‘Undefined’. I have not carried out any further analysis of this data.

Column R: TYPE
The transaction type may be one of five: Authorisation, Settlement, Chargeback, Credits, or Purchases. The most important thing to note here is that most of the data is composed of paired lines. By this I mean that there is one line representing the Authorisation to take the money and a separate line indicating the Settlement. A visual examination of the data indicates that all the other fields (e.g. Account, Amount, Names, Merchant Transaction ID etc.) will be identical, except that one line is flagged as Authorisation and the other is Settlement. Within the current data set there are 27,200 Authorisations and 25,590 Settlements. My presumption is that not all of the Authorisations progressed to Settlements and, thus, when talking about the amount of money spent, I’ll be omitting the Authorisation data. Purchases are a bit odd … there are 10,593 lines flagged as purchases, but all are associated with error codes of one kind or another. I freely admit that I have no idea why this is, and I’ve excluded them from any further discussion of the amounts of money paid. In almost all cases Credits are associated narrative entries in the Merchant Transaction ID that indicate that they are refunds, some of these are specifically indicated to be the result of fraud. As noted previously, Chargebacks are where Ashley Madison – for whatever reason – incorrectly took money from an account and later had to pay it back.

Column S: TXT_CITY
In every .CSV file Column S is called ‘TXT_CITY%2CTXT_COUNTRY%2CTXT_EMAIL%2CTXT_PHONE%2CTXT_STATE%2CTXT_ADDR1%2CTXT_ADDR2%2CZIP%2CCONSUMER_IP’, but it is clear that the ‘%2C’s are intended break up the titles for multiple columns. As the title indicates, this is the member’s city of residence.
Lines of data for UK & Ireland

Column T: TXT_COUNTRY
For all of the Ashley Madison data, this is given as a two letter identifier code for the country. Ireland is indicated by IE and the United Kingdom (England, Wales, Scotland, & Northern Ireland) is given as GB. This has been my primary field for data selection. I’ve filtered the original .CSV files to only show IE and GB data, then selected and copied the data to a separate file. I’ve removed any data lines that are obviously errors. For example, one recurring purchaser always appeared to identify his country as IE, but the city was in the US. As his email address referenced the American Football associated with said city, I felt justified in removing him from the data set. However, there may be more included that should not be there. The other side of this is that British and Irish patrons that deliberately or accidentally identified as coming from another country have been overlooked and are not included in this analysis. While this is regrettable, it is my opinion that they make up a relatively insignificant portion of the overall group.

Column U: TXT_EMAIL
This field contains the email address of the member. While I’ve not used this data in these analyses, it is clear that this column contains wide variety of obviously fake email addresses along with many that, at least appear to be, genuine. Much has been made of the various email addresses of government and university sector workers that appear to have signed up, but as Ashley Madison did not appear to require email verification it is best to treat all addresses here with some degree of suspicion.

Column V: TXT_PHONE
The phone number of the member. As there are only two numbers used for all of the British and Irish data: 12121212 and 111222333, I think it’s safe enough to regard them as fake. They have not been used in these analyses.

Column W: TXT_STATE
Presumably intended mostly for US customers, the British and Irish data contains a bewildering array of two and three letter acronyms; full and abbreviated county names, country names, even a few ones just composed of numbers. I’ve not examined this data in any depth.

Column X: TXT_ADDR1 and Column Y: TXT_ADDR2
These contain the addresses of the site’s clients. Again, there’s a variety of data captured here, and not all of it can be accepted at face value. The data varies from numeric (1-6 digits), gibberish alphabetic and alphanumerics, and obviously/likely fake addresses, though the vast majority are ostensibly real/plausible addresses. Indeed, the combined geographical and personal data is frequently coherent and sufficient to identify individuals with a reasonable degree of certainty. I’ve only used this data to attempt to ‘weed out’ occasional entries that have been miss-assigned a country code.

UK & Ireland valid transactions
Column Z: ZIP
The data here ranges from a variety of obviously faked alphabetic, numeric, and alphanumerics. The Irish data includes various versions of Dublin postcodes, but also town and county names and country designations. Interestingly, there are a small number of postcodes beginning with BT, indicating a Northern Irish origin. I would note here that it’s bizarrely saddening to see that sectarian divides are still prevalent, even on a site devoted to infidelities. I’ve seen a number of Londonderry’s that are in GB, and a few Derry’s in IE and I've considered each to be of the country they claim and have not altered the data. With regard to the UK, the data does appear to be dominated by valid (or valid looking) postcodes. I’ve entered a few into Google Maps and most return real-world locations, though that is no guarantee that the transactions refer to these exact places. Beyond these few manual checks, I have not used this data in these analyses.

Column AA: CONSUMER_IP
My guess is that this data was collected automatically by the Ashley Madison system, where available. IP addresses feature relatively frequently in the data, but I’ve made no use of them.

Preliminary Data Analyses for the Republic of Ireland and the UK
Having copied all the relevant data I can find into a separate spreadsheet (and attempting to remove incorrect entries where I can spot them), I’m left with 63918 rows of data. Of these, 6,075 relate to Irish transactions, while 57,843 are associated with UK accounts. But these numbers only refer to account activity and payment transactions. What we really need is to get an idea as to how many active accounts there actually are. Colin Gleeson’s piece in the Irish Times gives figures of 40,000 and 115,000. I’m not saying that there are not 40,000 Irish people who’ve signed up to a free account out of curiosity or genuine intent. What I can tell you is that there are 1251 accounts that identify as Irish (including a number from Northern Ireland) that paid actual money to Ashley Madison. There are also 2501 accounts of UK origin (also including a number from Northern Ireland). Obviously, there is a question of where is a question of where infidelity begins … is it in the thought or in the deed? If it is when you hand over money to Ashley Madison, that’s an awful lot fewer that I might have expected. For anyone reading this thinking ‘there are twice as many British accounts as Irish ones’ the sobering thought is that there are huge differences in the relative populations of the two countries! Just based on the latest figures for population for the Republic and the UK (I’m using the entire population as I couldn’t find any figures for the adult segment alone) 0.004% of the UK have paid money to Ashley Madison, as opposed to 0.0273% of the Irish population.

UK & Ireland Revenue (C$)
And just what has been paid? In the timescale under review it appears that there were 25,590 transactions where money was paid to Ashley Madison and this amounted to C$2,400,245.80. No matter how you look at it, that’s a lot of money! This can be broken down as follows: C$2,151,348.91 from 23237 transactions in the UK and C$248,896.89 from 2353 Irish transactions. The average spend per UK transaction was C$92.58, while it was C$105.79 for their Irish counterparts.

Next question: Who’s getting paid? Well, Ashley Madison, right? While everything, one presumes, eventually goes back to the parent company, Avid Life Media, it goes via a variety of sub-companies as recorded in the Account Name (Column B). There are four account names that are variations on the name ‘ADL Media’ that brought in C$220,763.42 (C$198,343.73 from the UK and C$22,419.69 from Ireland). Avid Dating Life Inc brought in C$33,705.01 (C$2,327.69 from Ireland and C$31,377.31 from the UK). Two different accounts associated with the name ‘AMDA’ (still probably not the American Musical and Dramatic Academy) received C$39,877.00, all of it from the UK. There are twelve separate account names based on ‘Ashley Madison’ that received C$1,864,889.24 from UK accounts and C$224,149.51 from Irish accounts (Total: C$2,089,038.75). While I’m not at all clear on the specific services provided by each of these corporate divisions, it’s pretty safe to assume that the ‘Ashley Madison’ account names relate to its core ‘have an affair’ business. We would appear to be on firmer ground in assuming that CL Media refers to CougarLife. This account name received two payments (both from the UK) totalling C$238. However, both have associated error codes, indicating that the transactions were not completed. EM Media (EstablishedMen?) did rather nicely, thank you very much, taking in C$16,841.67 from 214 transactions from 150 unique accounts, all from the UK. Poor old Swappernet brought in only C$19.95 from a single UK transaction in 2013, which may go some way to explaining why it appears to have been shut down.

Account Names paid by UK & Irish members (Values in C$)
Just to round out the preliminary examination of the financial data, it is interesting to look at the cards used to pay for Ashley Madison’s services. For both Ireland and the UK, the leading brands are, in order, Visa (VI), MasterCard (MC), and (probably) American Express (AM).

Credit Card brands used in valid transactions. Top: overview. Bottom Left: UK. Bottom Right: RoI
Adding in the time dimension allows us to gain a number of different insights. For example, looking only at the yearly revenues shows a story of a company doing relatively well from 2008 to 2012, making a marked improvement in 2013, but simply accelerating beyond expectations in 2014 and 2015 – even more so when one considers that the 2015 data only goes up to June! Even when broken out into UK and Ireland data, a pretty similar story emerges for both. However, breaking it down by financial quarter unveils a different narrative. Now it’s clear that the major uptick in overall revenues only begins in Q4 of 2013. While there is still a vast hike in revenue, it is clear that there was a major hiccup in Q4 2014. There must have been a significant rethink of policy and direction at that point, as the Q1 2015 figures were the best ever achieved. As all of the Q2 2015 data is not available, there is a marked downturn in the latest figures. In this instance, breaking it down again by country tells quite different stories. The UK figures largely mirror the overall picture – natural as they make up the lion’s share of the data. Like the parent data, there’s the first major surge in in Q4 2014 and the Q4 2014 plunge. However, the UK data shows a continued increase in the latest figures that is out of step with the overall picture. This can be explained with reference to the Irish data where Q2 2015 has shown an unprecedented slump. Other differences in the Irish data include a much more marked increase in Q4 2013, followed by an immediate collapse in the following quarter. Increasing the level of resolution to monthly shows a much more fraught series of peaks and troughs of surging and collapsing revenue streams. While the UK data shows a peak in revenues in April 2015, it seems that the Irish data climaxed in January 2015 and, despite repeated attempts at revival (most notably in March and May 2015), was becoming increasingly flaccid. This level of resolution can be increased to weekly, where the data is reduced to staccato lunges, thrusts, and general throbbing. At this resolution, it is clear that the Irish market had been in serious decline for some time before the data breach. This increase in resolution can be charted down to the minute and second, but the data becomes remarkably difficult to visualise clearly, and my already-unravelling ability to refrain from genital-based imagery goes into even steeper decline.

Annual Revenues. Top: overview. Bottom Left: UK. Bottom Right: RoI
Annual Revenues by Quarter. Top: overview. Bottom Left: UK. Bottom Right: RoI
Annual Revenues by Month. Top: overview. Bottom Left: UK. Bottom Right: RoI
Annual Revenues by Week. Top: overview. Bottom Left: UK. Bottom Right: RoI
If we remove the 2015 data (as it does not include second half results) it is clear that across the years, business got better by the quarter from slow starts in Q1 (Jan, Feb, & Mar) to the best results in Q4 (Oct, Nov, & Dec). I’m not going to speculate on why that should be, but both the UK and Irish data show the same results, if with slightly different emphases. Again excluding the 2015 data, it appears that for the UK there was a visible rise in expenditure (presumably correlated with a rise in actual infidelity) in May, August, and September. Basically, as the year went on there was an increase in payments made to Ashley Madison. The Irish, being different, also show a general increase towards year end, but the peak month are July and November … I’m not even going to try and explain why …

Annual Revenues by Quarter. Top: overview. Bottom Left: UK. Bottom Right: RoI
Annual Revenues by Month. Top: overview. Bottom Left: UK. Bottom Right: RoI
We’re on a roll! Looking at when in the month is the most lucrative for Ashley Madison, we can clearly see that the UK picture (including the 2015 data) shows peaks on the 2nd, 11th, 20th, 22nd, and 27th, though the trend is towards falling revenue across the whole month. The Irish data, again apparently loath to follow their neighbours, shows a series of small peaks of similar size on the 2nd, 4th, 13th, 18th, and 21st before taking a break in preparation for what can only be described as an earth-shattering climax on the 29th. Overall, the trend is towards increased financial activity across the month. There are even differences between the two populations in terms of when the commit their greatest expenditures. The UK data indicates a preference for Thursdays and Fridays, while the Irish data shows distinct preferences for Wednesdays and Saturdays. Putting it all together, if you're from the UK, you're more likely to be spending money at Ashley Madison on Thursday August 11th, while Irish users are more likely to start laying out the cash on Saturday November 29th. The chronology of this data is so fine that discussions can be formed at the second-level … this is chronology like no archaeologist has ever dealt with before … and it’s wonderful …

Annual Revenues by Day of Month. Top: overview. Bottom Left: UK. Bottom Right: RoI
(owing to a stupid labeling error, the chart says 'ex 2015' but does include 2015 data)
Annual Revenues by Day of Week. Top: overview. Bottom Left: UK. Bottom Right: RoI
Perhaps less wonderful is the locational data. As I said above, archaeologists are familiar with (usually) tightly controlled locational data, be it at the site or context level. We’re not so good with the idea that a site might have been in Scotland, but was actually in Killarney. And that’s kinda’ what we’ve got here. I’ve used the Country and City fields to attempt to map locations. The biggest issue here is that Ashley Madison, in attempt to shield their members privacy, did not require email verification. They also did not appear to enforce any form of robust location control or checking. Thus, anyone wishing to hide their real world locations could do so without problem. That is why there are a number of data entries that claim to be from Ireland, but have, say, the city given as New York and include a New York address. These I’ve attempted to weed out. Less easy to spot and eradicate are instances where someone claimed to be in Ireland in the Country field and then gave a valid Irish address that may not have been their own. There are certainly plenty of real addresses in this data, but whether any of them are associated with the people who live at those addresses is quite another matter. The other issue is one of my own making and/or laziness … Tableau is good at spotting real world places and generating mapable Latitudes and Longitudes for them … but not perfect. Thus, there are over 3000 City codes that it has been unable to place. With more time and patience that I possess, it would have been possible to interrogate each one and manually add co-ordinates. Thus, the map data is heavily skewed towards larger, more recognisable urban areas. There are 173 unique values in the City field for Ireland and another 819 for the UK. I’m not going to discuss the UK data here, as I’m much less familiar with British place names, however in the Irish data a number of things are visible. Firstly, this field can be as specific as an individual Townland or it can be as general as the County name. This is compounded by a number of Irish Counties that have the same names as their major urban centres (e.g. Galway, Limerick, and Dublin). While I could have attempted to augment the data with postcode and address-level indicators, I felt that it was way too much trouble and potentially too revealing of where active Ashley Madison account holders really were. While it’s deeply flawed and heavily biased towards the larger urban centres, the data is still worth examining at this level. The map (using only Settlement data) shows a clear preference for the south-east of England, the Midlands, southern Wales, and the north-east, along with the central belt of Scotland. In terms of the Island of Ireland, there is a clear preference for Dublin, Meath, the east coast, and eastern Ulster. If the marks are resized by the amount of expenditure (i.e. again Settlement data only), it is clear that in Ireland. Dublin leads the way. In the UK there are notable ‘hot spots’ in Glasgow, Middlesbrough, Salisbury, and Poole, but all are dwarfed by the activities of London. As I say, this locational aspect of the data deserves much more work and adjustment, but I’m content to leave this for further students and researchers.

All Tableau-mapable locations for UK & Ireland
All Tableau-mapable locations for UK & Ireland with marks re-sized to reflect relative expenditure
The final aspect of the data I wanted to examine here was at the individual-level. The first thing to say is that I’ll not be revealing any names or exact real world locations. As many commentators have noted, they may not refer to the actual people named. Even if they were guaranteed accurately to identify everyone who spent money on this site, I find little interest in naming and shaming any of them. Although there are multiple issues with linking the cities, addresses, postcodes, and even the countries to the names in the data set to real world people, I’ve found that they are remarkably consistent across transactions. By this I mean that the details given to make a payment in, say, July 2008 will be the same as when the member makes a further payment the following month. We may not be able (or want to) identify individuals, but the activities of individuals can be coherently tracked through the data. When you first graph this data (again, Settlements only) it appears like there’s just a small smudge in the bottom left hand corner of the page. It’s an issue with attempting to graph so many users against such a variety of amounts of money. There are a few people who’ve paid an awful lot and so many that have paid very little, they basically cancel each other out. To be fair, things are not much better when you separate out the data by country either. Instead it’s probably better to concentrate on a small number of case studies within the data to examine generalities. The first thing to note about looking at the respective top 20 accounts by amount spent is that Ireland and the UK are vastly different. The top spending UK Account managed to relieve himself of C$182,000. Admittedly, the top Irish account wasn’t terribly far behind, spending C$119,300. But, while the UK accounts pretty much step down along a gentle curve, the Irish ones simply drop between first and second place – the next most generous Irish account spent C$7,502, as opposed to the second place UK account that shelled out C$112,900. By the time you get to the 20th placed Irish account you’re talking about a spend of ‘only’ C$690. It’s really not much when compared to the 20th placed UK account that forked over C$10,870. For this top 20 group the averages are quite telling too. The UK average here is C$40,933, while the Irish average is a ‘mere’ C$7,728. For the entire dataset the average spend is C$255 – C$310 for Irish accounts and C$250 for UK ones. Not that I’m condoning it, but that sounds pretty reasonable for an affair! I became intrigued by what some of the lower spending customers were spending and I note that there are 1132 examples of people paying exactly C$19. This is an important and much vaunted figure in that this was the price charged by Ashley Madison for their ‘Full Delete’ service … that, it seems, they didn’t actually carry out. With 275 examples from Ireland and the remaining 857 from the UK, that’s C$21,508 that Ashley Madison got for (allegedly) doing very little. When examined on an account, or individual level, it is clear that many of these are the only payments ever made to Ashley Madison. These may be the result of people desperately trying to erase a ‘prank’ account created by co-workers, to people deciding that, now that they’ve had a bit of a look about and a think about it, Ashley Madison and the services they offer are not for them. Whatever about the latter group, I do sincerely hope that the former group immediately followed this up by finding a job where they’re not surrounded by asshats. Below this C$19 marker there are 9,009 transactions, across 91 price points, where amounts were settled with Ashley Madison, ranging from C$18.75 to C$8.08. All together, these totalled C$135,022.25 … an awful lot of money to make in small transactions.

All of the UK & Ireland account-level data by expenditure. So much data there's only a small smudge visible!
I next wanted to look at how individual-level accounts spent their money … basically, I wanted to see if they spent it all in a short financial thrust or gently smouldered over a much longer timescale … (Sorry! I had to take a break from typing as I started to hallucinate Barry White). Anyway, Mr White successfully exorcised, I took a look at this for the UK and Ireland and was rather confused by what I found. According to the data, our biggest spender in the UK – lashing out a cool C$182K – did it in only two payments of C$91K in August and September of 2014. But he’s not alone. The second placed account paid out C$72,900 in March 2105, followed by two payments of C$20K each in the two months following. Even in the eighth and ninth spots, they made their entire spends over two transactions in a single month. I will admit that my first reaction was one of disbelief and shock - it seems implausible to me that so much money could be spent on a caprice over such a small span of time. I'm afraid that this aspect of life is beyond my experience (both the affairs and that amount of disposable cash), so I'm not able to comment on whether this is reasonable or plausible. For these apparent highest of high rollers, I did wonder if they were not the victims of fraud. This may yet prove to be the case, but there is no direct evidence of it that I can see in the data available. The Irish data is only slightly less strange. The top payer managed to get through C$119,300 in 12 payments, varying between C$2,000 and C$12,800, in the period from November 2014 and March 2015. The third, fourth, and sixth ranked accounts all paid out within a single calendar month (C$6,396, C$4,900, and C$1,120 respectively). Only the second, eighth, and ninth ranked accounts paid their cash out in smaller sums and over a relatively lengthy time frame. The second ranked account shows Settlement activity in the period from April 4th 2011 to August 20th 2012. The account holder spent C$7,866.99 over 105 transactions, with amounts ranging from C$20 to C$249. As an aside, there is a person of the same name, same address, but different account number, and different email (with a different credit card) that spent C$720 over 11 transactions in the period from July 3rd 2014 to April 4th 2014. Thus, it would seem that individual accounts may not hold all the events of a single individual. One way or another, this aspect of the data requires further detailed study and scrutiny.

Top 20 accounts by expenditure for UK (Left) and Ireland (Right)
Breakdown of expenditure by Top 10 UK accounts by month
Breakdown of expenditure by Top 10 Irish accounts by month
This is as far as I’ve taken these analyses, but I think that there’s much more there to be found and investigated. But beyond these direct analyses, is there much that this form of data analysis can bring to our understanding of modern British and Irish society? I think that the answer has to be a resounding yes. This is exactly the type of data that doesn’t make it into standard historical narratives precisely because it’s usually inaccessible and impossible to quantify. For this reason alone, I believe that it is worthy of study and consideration. Certainly in terms of the Irish data (north and south), it lends an important nuance to traditional, conservative narratives about how ideas of ‘family’ and sexuality are understood and presented. For the British data as much as the Irish, at its simplest level, it gives the lie to so many standard stereotypes about the denizens of these islands being sexually repressed and conservative. It was probably always thus, the difference now is that the internet has provided the means to connect and now we have the ability to analyse and interrogate the data. One way or another – whatever the morality or legality of taking or using this data – it is here and cannot be suppressed. Not only that, it will become more frequent and increasingly normalised in the years to come. Here’s the challenge for archaeologists, historians, and anyone interested in the current state of our culture: how will we react to this data, how will we use it responsibly, and what insights will we achieve?

Where do we go from here? In the first instance, I think it’s important to note that I’ve only undertaken a small amount of ‘excavation’ on one small portion of this digital site – just the credit/debit card data for the British and Irish members ... But there is so much more … not just in terms of card transactions, but in the context of the data dump as a whole. I do think that we need to move the analysis of this archive away from the outing individual people to an atmosphere of genuine research at the societal level. Think about it like undertaking a research dig on an ancient city – there’s room for one team to investigate the houses of the wealthy, while other projects look at the dwellings of the middle classes and the workshops of the artisans. There’s still enough room for other groups again to look at the aqueducts and the bathhouses … you just want to keep the treasure hunters away from digging out the shiny stuff, shorn of all context. I think that the archaeology metaphor holds when we consider that as large and comprehensive as this data dump is, it is still just one ‘site’ (in both the archaeological and internet senses). I hesitate to be seen to advocate for more hacking of personal data, but should other related types of site become available – for example, eHarmony, Tinder, Grindr, Christian Mingle, Match.com, singlemuslim.com and any of the plethora of niche sites that are out there – we may just be able to begin to create a landscape archaeology of the digital realm. Should these be combined with data from other (non dating/hookup) digital sources, like Facebook, Twitter, YouTube, MumsNet, Academia.edu and Instagram, then we could begin producing some genuinely deep and interesting insights at all levels from the individual to the planet as a whole ... and that, dear readers, is the true essence of archaeology and history!

Notes
For a project like this I’d normally direct the reader to a Tableau presentation where they can play about with the data and make their own discoveries and bring out the particular things that interest them. I have not done that in this case, and for a number of reasons. In the first place, I fell that despite my interest in it and argument that it is a valid field for research, it is still too sensitive to release en masse – even if I did anonymise the names and remove all even vaguely personal information. The second reason is Tableau themselves. They do give off a very Geek-friendly vibe, but their heavy handed response to users analysing portions of the WikiLeaks data and uploading it to Tableau Public makes me think it’s better to be overly cautious here (see also here | here | here | here | here). Although Tableau have changed their policy in the time since (here | here), I’m not inclined to risk it. Instead, I just used Tableau Public to make the visualisations and screen-grabbed them … it's not perfect, but it works!

I thought about writing to Visa and suggesting that they attempt to capitalise on this data by taking out advertising to say: ‘Visa: Your card of choice for infidelities!’ ... but I have a feeling that they’d not go for it. I even had a rough draft of a script for a TV advert … it could have been great …

In doing research for this post, I stumbled upon the photography of Georgios Makkas and his 'Archaeology of Now' project looking at abandoned shopfronts in Greece. His work has a haunting quality that eloquently shows the effects of the recent economic downturn on the post-war Greek dream of family shop-ownership. See his work here and here

I just wanted to add that while the prevailing narrative regarding the Ashley Madison data is one of heterosexual infidelities, there are actually six categories of membership:
1: Attached Female SeekingMales
2: Attached Male Seeking Females
3: Single Male Seeking Attached Females
4: Single Female Seeking Attached Males
5: Attached Male Seeking Males
6: Attached Female Seeking Females

While the majority of coverage has centred on the first two categories, this ignores a (probably) significant proportion of the membership. In particular, the final two categories of individuals either in heterosexual relationships seeking homosexual encounters or people in long-term homosexual relationships, looking for affairs have been absent from the discussion. I’ve not been able to find evidence in the credit card data to make differentiations at this level of detail, but I do think it’s worthy of study and further research.