Feed aggregator

Why 2k?

Information Quality Trainwrecks - January 20, 2010 - 11:34
IT media sources are reporting that reports of the demise of the Y2k bug may have been premature. (see also here) Systems affected included Spam control software and other security software from a leading vendor, network equipment from leading vendors as well as credit card payment systems in Germany and Australia, as well as (it seems) Windows [...]

Personal Data – an Asset we hold on Trust

There has been a bit of a scandal in Ireland with the discovery that Temple St Children’s Hospital has been retaining blood samples from children indefinitely without the consent of parents.

The story broke in the Sunday Times just after Christmas and has been picked up as a discussion point on sites such as Boards.ie.  TJ McIntyre has also written about some of the legal issues raised by this.

Ultimately, at the heart of the issue is a fundamental issue of Data Protection Compliance and a failure to treat Personal Data (and Sensitive Personal Data at that) as an asset (something of value) that the Hospital held and holds on trust for the data subject. It is not the Hospital’s data. It is not the HSE’s data. It is my child’s data, and (as I’m of a certain age) probably my data and my wife’s data and my brothers’ data and my sisters-in-laws’ data…..

It’s of particular interest to me as I’m in the process of finishing off a tutorial course on Data Protection and Information Quality for a series of conferences at the end of February (if you are interested in coming, use the discount code “EARLYBIRD” up to the end of January to get a whopper of a discount). So many of the issues that this raises are to the front of my mind.

Rather than simply write another post about Data Protection issues, I’m going to approach this from the perspective of Information as an Asset which has a readily definable Life Cycle at various points in which key decisions should be taken by responsible and accountable people to ensure that the asset continues to have value.

Another aspect of how I’m going to discuss this is that, after over a decade working in Information Quality and Governance, I am a firm believer in the mantra: “Just because you can doesn’t mean you should“. I’m going to show how an Asset Life Cycle perspective can help you develop some robust structures to ensure your data is of high quality and you are less likely to fall foul of Data Protection issues.

And for anyone who thinks that Data Protection and Data Quality are unrelated issues, I direct you to the specific wording in the heading of Chapter 2, Section 1 of the Directive 95/46/EC.

The Information Asset Life Cycle

Information, just like any other asset, has a life cycle through which it needs to be managed. Just because you can keep throwing files into a filing cabinet (or, in Data Protection terms, a “relevant filing system“) or hold it forever in electronic storage doesn’t mean you should.

POSMAD model (thanks to Danette McGilvray)

The key stages in the Information Asset Life Cycle are listed below, and are mapped to the 8 Data Protection Principles in the diagram opposite as well.

  • Plan
  • Obtain
  • Store and Share
  • Maintain
  • Apply
  • Dispose

The answers posed to the questions asked at each of these stages affect the ability of the organisation to meet its obligations under the Data Protection Act, as can be seen by the mapping of DP Principles to POSMAD Life Cycle stages.

Just like any other asset, information needs to be managed.

When you are hiring new staff, you plan for what type of people you will need in your business. You’ll need to have strategies for obtaining those staff, planning in place for how to store them (offices, desks etc.). You will (ideally) want to maintain them (training, rewards and recognition, staff retention), and be sure you can Apply them to their job (right tools, adequate resources etc.). Inevitably, you are also going to have to have answers to the question of what you will do with your staff when you no longer need them and need to dispose of them (retirement, redundancy, etc.).

With Information, you need to Plan what type of data you will be capturing and why, and who you’ll share it with. You’ll need to define policies and methods for obtaining that data (e.g. standardised testing protocols such as the Guthrie Card and the structured information captured on hospital test request forms). You’ll need to plan how that data will be stored and shared so it can be found when needed, and can be readily linked to other data etc. You’ll need to have some  consideration to how you will maintain that data (e.g. keeping personal data up to date for as long as you are holding it), and you’ll need to have a clear plan and protocol for how to dispose of the data when it is no longer needed.

Note that in this context, disposal does not necessarily mean the destruction of the data. It could simply be a process or policy that defines when data has become “excessive” and mandates the anonymising of all the data to a level suitable for statistical reporting and study but that is no longer “personal data” in the meaning of the Data Protection Acts.

Of course, just as you have measures that help you track and manage the effectiveness with which you are managing other assets (e.g. tracking equipment services and outage statistics for office equipment or purchase volumes for stationery and paper clips), you should ideally look to develop some metrics that help you know how well you are answering the various questions at each stage in the Information Asset Life Cycle.

Finally, you need to bear in mind that the Data Protection Acts now cover personal data held in formats other than electronic files. So paper based data (e.g. patient information associated with a Guthrie Card) is protected by the Data Protection Act when it is held in a “Relevant Filing System”. So your paper filing needs to bear up to the scrutiny of the Information Asset Life Cycle

The Temple Street Situation

What appears to have happened in the Temple St situation is that there was a failure to properly plan for the management of a key information asset.

  • While there was a plan to provide a national scheme for testing, some key questions were not asked at the planning stage. As a result, the answers are not necessarily forthcoming to parents when faced with the test.

    Good Data Protection requires Planning

    For example, parents do not always know that the test is being done in Temple St. Certainly, parents were not aware that the data was to be held indefinitely. Ultimately, if Aldi (see image) are able to tell me what my personal data (CCTV images) is to be used for and who I can contact if I have a query, then the HSE and its sub-division and constituent hospitals should have been able to do the same.

  • There appears to have been no thought given to the conditions or criteria for reasonable disposal (either outright disposal or anonymising) of the data. This gave rise to a situation where a Data Protection breach was inevitable.
  • Parents (acting as legal guardians of their children) were not given the opportunity to opt in or opt out of being part of further scientific research using the blood samples taken from children.
  • Likewise, children and adults (and the DPA does not require someone to be an adult to be protected) who have data on file in Temple St. do not appear to have a clear mechanism available whereby they can request that their data be blocked from use in processes other than that for which it was initially provided. (I’ve looked at the list of Data Controllers and Data Processors registered with the Data Protection Commissioner and Temple St. doesn’t seem to be listed in its own right is listed as the “Children’s University Hospital” and it’s not immediately clear from the register whether HSE Dublin North East or Dublin Mid-Leinster covers Temple St.) [Thanks to Hugh Jones, the lead trainer on the Irish Computer Society's Data Protection course for the correction]
  • The personal data attached to the blood samples will, in a large number of cases, be woefully out of date and inaccurate – in itself a breach of the Data Protection Act – as there is likely no process to keep that data up to date. If the data is not being maintained, it is not needed and is excessive to the stated purposes for which it is being used. If the personal data (e.g. name and address) on samples that are 26 years old is being used for specific purposes, then one would suggest that any action taken on foot of such analysis is likely to be wrong.

A number of commenters on Boards.ie and elswhere have commented to the effect that the personal data could be excised out when the blood test data or samples were being used. That misses the fundamental point. There is a point in time beyond which having my daughter’s name and address cease to have an actionable value in any analysis of blood tests taken from her. If you don’t need it, then the data you are holding is excessive (in the meaning of the DPA and the EU Data Protection Directives) and should be permenantly removed. Aggregation of samples into clusters based on geographic region would likely allow for data sets suitable for scientific and statistical analysis.

But then we are right back to the Plan stage of the POSMAD model. What was intended to be done with this data? What are the stated purposes for which it is being captured? At what point does the data become excessive for those purposes? What is the plan to dispose of the excessive data (while retaining data appropriate to the stated purposes?

Conclusion

Information is an Asset. It needs to be managed and protected as such. Adopting an Asset Life Cycle approach to planning, with your Data Protection Duties forming the basis for your Key Questions can help your organisation properly plan to manage your Asset in a compliant manner. Furthermore, the Data Protection rules are enablers rather than restrictions as they provide a clear framework within which you can manage the expectations and worries of Data Subjects so that you get better up-take and happy and informed consent to you having the relevant data for as long as you plan to need it.

A key message coming from “DPA-Aware” bloggers affected by this issue is that if they had been asked if they minded anonymised data being held for research purposes they would not necessarily have objected. Investing time in planning and in valuing the Personal Data Asset which the HSE holds on Trust for the affected Data Subjects (they don’t own it) would have avoided the negative reaction and push back from rightly concerned parents and Civil Rights groups.

Want to learn practical approaches to avoiding this type of boo boo in your organisation? Why not come along to my tutorial at the 2010 IDQ Seminar Series event in Dublin on the 22nd and 23rd of February. Use the code “EarlyBird” before Jan 31st to get big savings on the event. If you are interested in just my tutorial, use the code “DoBlog” when registering.

Categories: Personal Blog

Slovak Police accidentally cause Terror Alert in Dublin

Information Quality Trainwrecks - January 7, 2010 - 15:12
The Irish and International media have been busy the past few days covering the story of the horrendously botched security test by Slovakian Border Police which resulted in 90 grams of high explosive RDX finding its way to Dublin from Bratislava in the backpack of an unsuspecting Slovakian electrician who was travelling back to Ireland [...]

In the ocean…

Information Quality Trainwrecks - January 7, 2010 - 14:34
Poor geographic data gives rubbish results.  Recently I read in Railways Africa: “Electronic files of Southern California’s rail system can be downloaded from the Federal Railroad Administration’s website,” the Los Angeles Times says, “but are riddled with errors, placing hundreds of rail crossings at points far from any railroad tracks. Many are shown in the Pacific [...]

Perhaps they should have checked their listings twice?

Information Quality Trainwrecks - December 21, 2009 - 13:17
The Irish Sunday Independent reports this past weekend that the Irish State Broadcaster RTE is facing legal action from its erstwhile privately owned competitor TV3  arising from what are described as “significant and egregious” errors in the listings published for TV3’s programmes over the Christmas period in the RTE owned listing’s magazine “The RTE Guide”. [...]

IAIDQ Information Quality Blog Carnival (updated)

Information Quality Trainwrecks - December 15, 2009 - 11:37
A little later than we had planned, IQTrainwrecks.com is proud to publish the December edition of the IAIDQ’s Blog Carnival for Information Quality, a retrospective on blog posts that appeared in November. [Edit: We'd actually missed one submission when we posted this. A horrendous oversight given the importance of the discussion. Apologies to Dylan Jones and [...]

Information Quality – Every Little Helps

Information Quality Trainwrecks - December 15, 2009 - 10:57
[Thanks to Tony O'Brien for sending this one in to us recently. For those of you not familiar with Tesco and their marketing slogans, this is their corporate website.] ManagementToday.com has a great story (from 25th November) of how six bicycles purchased by Tesco from a supplier came with an apparent£1million (US$1.62 million) price tag. Some red [...]

Who then is my customer?

Two weeks ago I had the privilege of taking part in the IAIDQ’s Ask the Expert Webinar for World Quality Day (or as it will now be know, World Information Quality Day).

The general format of the event was that a few of the IAIDQ Directors shared stories from their personal experiences or professional insights and extrapolated out what the landscape might be like in 2014 (the 10th anniversary of the IAIDQ).

A key factor in all of the stories that were shared was the need to focus on the needs of your information customer, and the fact that the information customer may not be the person who you think they are. More often than not, failing to consider the needs of your information customers can result in outcomes that are significantly below expectations.

One of my favourite legal maxims is Lord Atkin’s definition of who your ‘neighbour’ is who you owe legal duties of care to. He describes your ‘neighbour’ as being anyone who you should reasonably have in your mind when undertaking any action, or deciding not to take any action. While this defines a ‘neighbour’ from the point of view of litigation, I think it is also a very good definition of your “customer” in any process.

Recently I had the misfortune to witness first hand what happens when one part of an organisation institutes a change in a process without ensuring that the people who they should have reasonably had in their mind when instituting the change were aware that the change was coming.

My wife had a surgical procedure and a drain was inserted for a few days. After about 2 days, the drain was full and needed to be changed. The nurses on the ward couldn’t figure out how to change my wife’s drain because the drain that had been inserted was a new type which the surgical teams had elected to go with but which the ward nurses had never seen before.

For a further full day my wife suffered the indignity of various medical staff attempting to figure out how to change the drain.

  1. There was no replacement drain of that type available on the ward. The connections were incompatible with the standard drain that was readily available to staff on the ward and which they were familiar with.
  2. When a replacement drain was sourced and fitted, no-one could figure out how to actually activate the magic vacuum function of it that made it work. The instructions on the device itself were incomplete.

When the mystery of the drain fitting was eventually solved, the puzzle of how to actually read the amount of fluid being drained presented itself, which was only of importance as the surgeon had left instructions that the drain was to be removed once the output had dropped below a certain amount. The device itself presented misleading information, appearing to be filled to one level but when emptied out in fact containing a lesser amount (an information presentation quality problem one might say).

The impacts of all this were:

  • A distressed and disturbed patient increasingly worried about the quality of care she was receiving.
  • Wasted time and resources pulling medical staff from other duties to try and solve the mystery of the drain
  • A very peeved and increasingly irate quality management blogger growing more annoyed at the whole situation.
  • Medical staff feeling and looking incompetent in front of a patient (and the patient’s family)

Eventually the issues were sorted out and the drain was removed, but the outcome was a decidedly sub-optimal one for all involved. And it could have been easily avoided had there been proper communication about the change to the ward nurses and the doctors in the department from the surgical teams when they changed their standard. Had the surgical teams asked the question of who should they have in their minds to communicate with when taking an action, surely the post-op nurses should have featured in there somewhere?

I would be tempted to say “silly Health Service” if I hadn’t seen exactly this type of scenario play out in day to day operations and flagship IT projects during the course of my career. Whether it is changing the format of a spreadsheet report so it can’t be loaded into a database or filtered, changing a reporting standard, changing meta-data or reference data, or changing process steps, each of these can result in poor quality information outcomes and irate information customers.

So, while information quality is defined from the perspective of your information customers, you should take the time to step back and ask yourself who those information customers actually are before making changes that impact on the downstream ability of those customers to meet the needs of their customers.

Categories: Personal Blog

No smoke without ire – Life Insurance Overcharging in Ireland

Information Quality Trainwrecks - October 30, 2009 - 09:27
RTE News in Ireland ran a story last night on overcharging by Irish Life Assurance companies arising from a mis-classification of customers as smokers. (link to the item is here, but you may not be able to access it if you are not in Ireland). On foot of two complaints, the Irish Financial Services Ombudsman investigated [...]

Bank of Ireland – again

The Irish Times today reports that Bank of Ireland are again investigating incidents of double charging of customers who use LASER cards.

I wrote about this last month (see the archives here), picking up on a post from Tuppenceworth.ie earlier in the summer. I won’t be writing anything more about the issue (at least not for now).

Looking back through my archives I found the picture below in a post that I’d written back in May when Simon on Tuppenceworth first raised his issue with BOI’s Laser Cards.

Categories: Personal Blog

What’s in a name?

Mrs DoBlog and I are anxiously awaiting the arrival of a mini-DoBlog any day now. So we have spent some time flicking through baby name books seeking inspiration for a name other than DoBlog 2.0.

In doing so I have been yet again reminded of the challenges faced by information quality professionals when trying to unpick a concatenated string of text in a field that is labelled “Name”. The challenges are manifold:

  • Name formats differ from  to culture to culture – and it is not a Western/Asian divide as some people might assume at first.
  • Master Data for name spellings is notoriously difficult to obtain. My wife and I compared spellings of some common names in two books of baby names and the variations were staggering, with a number of spellings we are very familiar with (including my own name) not listed in either.
  • Often Family Names (surnames) can be used as Given Names (first names) such as Darcy (D’Arcy) or Jackson (Jackson) or Casey.
  • Often people pick names for their children based on where they were born or where they were conceived (Brooklyn Beckham, the son of footballer David Beckham is a good example).
  • Non-name words can appear in names, such as “Meat Loaf” or “Bear Grylls
  • Douglas Adams famously named a character in the Hitchhiker’s Guide to the Galaxy after one of the “dominant life forms” – a car called a “Ford Prefect
  • Names don’t always fit into an assumed varchar(30) or even varchar(100) field.
  • It is possible to have a one character Given name and a one character Family name.
  • Two character Family names are more common than we think.
  • Unicode characters, hyphens, spaces, apostrophes are all VALID in names – particularly if they are diacritical marks which change the meaning of words in particular languages.
  • And then you have people who change their names to silly things to be “different” or “special”,  but who create interesting statistical challenges for data profilers and parsing tools.

Among the examples I found flicking through one of our baby name books last evening where “Alpha” and “Beta”. Personally I think it sends the wrong signals to name your children after letters of the Greek alphabet, but I’m sure it is helpful if you have had twins to keep them in order.

I also found “Bairn” given as a Scots Gaelic name for a baby girl. I had to laugh at this as “Bairn” is actually a Scots dialect word for Child. Even Wikipedia recognises this and has a redirect from “Bairn” to “child“.  But it does remind me of the terribly sexist “joke” where the father asks the doctor after the birth whether it is a boy or a child his wife has just delivered.

The trouble with names, from an information quality point of view, is that they are inherently personal things which people have a strong attachment to. So getting spellings wrong can have negative effects on your business and your relationship with your customers (like my on-going gripe with Vodafone). But often companies need to accept the “fuzziness” of identity in order to match records and meet the needs of Anti-money laundering or similar regulations or simply to create a single view of their customers. But the EU Data Protection regulations require organisations to hold data accurately – with accuracy being defined from the point of view of the data subject.

So, when you head has stopped spinning from managing all the Alphas, Betas, Brooklyns, and Ford Prefects, as an Information Quality practitioner you are faced with juggling the needs of Customer Intimacy, the demands of Data Protection, and a range of other legal requirements when you are deciding how to clean your name data up.

Jim Harris’ excellent series of posts on Data Profiling  gives a great run through of how data profiling tools can help you figure out what is in those strings of text in that field labelled “Name”. However, you should exercise caution in your assumptions about what a name might be and might look like.

For example, allegedly the longest Name in the world is

Mr. Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Wolfeschlegelsteinhausenbergerdorffwelchevoralternwarengewissenschaftschafe rswessenschafewarenwohlgepflegeundsorgfaltigkeitbeschutzenvonangreifeudurch ihrraubgierigfeindewelchevoralternzwolftausendjahresvorandieerscheinenersch einenvanderersteerdemenschderraumschiffgebrauchlichtalsseinursprungvonkraft gestartseinlangefahrthinzwischensternaitigraumaufdersuchenachdiesternwelche gehabtbewohnbarplanetenkreisedrehensichundwohinderneurassevonverstandigmens chlichkeitkonntefortpflanzenundsicherfeuenanlebenslanglichfreudeundruhemitn icheinfurchtvorangreifenvonandererintelligentgeschopfsvonhinzwischenternart Zeus igraum Senior

Now, that’s 802 character’s long (including the Mr). It also doesn’t fit very easily into a <given name><middle_initial><Family_name> format which most of us would probably start with as our template for parsing a name string. Note also, that he was a “senior”, so there is another one of this name out there somewhere. Perhaps he just goes by the name “Mr Adolph Ingram”. I’d also hate to see what a matching process would make of that name (how many match keys would need to be created?)

There are some interesting comments about this name on the Stackoverflow.com website. Some of them are helpful pointers to the different structures of names that exist out there. Others show the risks that are run in designing and developing systems based on a particular cultural bias or perception of what a name is (a lot of commenters refer to US government forms and how people with long names usually have a form that they use for “official purposes”. This is not necessarily a “safe” assumption… not every government form in each country is the same. Indeed, many Irish Government forms don’t have enough space for my address or my wife’s first name and compound family name).

In a case that will be put up on IQTrainwrecks.com, the impact of cultural assumptions about what a “valid” name is can be seen in this story of a woman who trouble boarding a plane because of her name.

Fans of Star Trek:Deep Space 9 will recall how the actor who played  Doctor Bashir changed his credit name from Siddig el Fadil to “Alexander Siddig” because (it is claimed) fans couldn’t pronounce Siddig el Fadil properly. The full version of his name would be an interesting challenge for a data profiling tool  in the hands of an Information Quality professional and certainly challenges the <Given_name><middle_name><family_name> format used in Anglo-Saxon cultures.

Wikipedia has an interesting page of references for unusual and long names which I would recommend at the very least as a tool to blow away any assumptions you have of what’s in a name.

No set of name Master Data in a reference dictionary will ever be complete or fully accurate. When balancing the needs for accuracy and correctness of data versus the needs to match and consolidate data (either for internal business purposes like CRM or for legally mandated purposes such as AML or PEP processes), you need to give some thought to how you will weight and manage your priorities within the data. Furthermore, assumptions you might make about the “correct” structure of a name could actually create information quality problems for you.

For now, Mrs DoBlog and I will continue to see if we can find a name that fits the impending arrival. But it has been made a lot harder because of my insights into the fun a name can cause for an information quality team.

I’m angling for something very traditional and Irish…. just to really confuse people and break Soundex keys.

Categories: Personal Blog
Syndicate content