Briggs (2007) explained that as organizations have learned to quickly amass volumes of data to support decision making, ensuring that the data is reliable has become increasingly challenging. Briggs’ study focused on how the Humberside Police force in the U.K. had addressed the challenges. The vision of Humberside was to not only improve its data warehousing system to produce better intelligence, but to standardize the data that is extracted from a number of disparate internal and external databases toward the goal of being able to share vital police information among more than 43 jurisdictions throughout the U.K.The Humberside Police force consists of more than 4,000 employees, has a jurisdiction that covers an area of approximately 3,500 km in England and Wales, and serves to protect about 1,000,000 citizens (Humberside Police, 2009). According to the BBC (2006), Humberside ranked among the worst performing police forces in the U.K. in the 2005/06 assessment period. Humberside realized that key personnel did not have timely and accurate data at the time of need, concluded that it needed a data cleansing system, and recognized that to choose one required an understanding of the unique data-cleansing challenges faced by law enforcement organizations. As Briggs (2007) explained, for example, suspects are generally motivated to deliberately provide false or misleading data. They are also typically on the move making it difficult to match geo-location data with the correct identity. Further, in emergency situations during which reaction time is crucial, data is often quickly jotted down incorrectly or incompletely. To address these issues, Humberside chose Informatica® Data Quality software because of its scalability and compatibility with the existing infrastructure.
Key Business Tasks Supported
According to Briggs (2007), resolving identities and validating the existence of addresses and vehicle types were the main tasks in need of support by Humberside. In addition, the emergency system along with the domestic violence and child protection databases were also in need of cleansing and standardizing. Informatica enabled technical staff to automate these tasks by using the product’s configurable business rules. By eliminating much of the redundancy resulting from the manual validation efforts that was otherwise needed, law enforcement personnel could respond to situations more timely and appropriately having better intelligence available.
Key Business Users Supported
All law enforcement personnel from traffic officers to criminal investigators, dispatchers, and emergency response teams could be supported even across jurisdictions when the data is cleansed, verified, parsed, and transformed in a standardized way. By relieving officers of the tasks of trying to make sense of unstandardized data extracted from disparate systems, a police force would be more effective at protecting citizens, assisting victims, and catching criminals.
Description of General Architecture
According to Briggs (2007), the Humberside’s infrastructure includes an internal warehouse built on Microsoft® SQL Server™ and Microsoft BizTalk® Server. The latter enables the former to interact with external systems. This type of infrastructure provides clients with a common and consistent interface for browsing, searching, and retrieving metadata and a uniform experience for users across different lines of the business (Microsoft, 2014).
In addition to the Informatics data quality tools that cleanse the data being ported into the Microsoft-based warehouse, Humberside uses the Hewlett-Packard® (HP) Autonomy search engine that enables users to quickly perform needed searches on the cleansed data. Autonomy uses advanced signal processing to identify and understand patterns, concepts, and context. It is also adaptable, meaning the search engine model mixes new information with a growing body of indexed data that is dynamically updated based on how users consume and contribute information (HP, 2014).
Business Needs Driving the Data Warehouse Decision
In the case of Humberside, the business need was to improve staff performance at enforcing the law and protecting its citizens. Although the need for data warehousing and data cleansing was not explicitly stated as a factor in the poor performance assessment, the timing of the assessment and the case study are indicators that Humberside had a good idea that lack of data integrity was a primary obstacle to successful performance.
Key Business Objectives
Throughout the U.K., the key objectives were to improve law enforcement performance along multiple dimensions including reducing and investigating crime, promoting safety, providing assistance to and focusing on citizens, using resources efficiently, and promoting local community involvement. Inasmuch as law enforcement staff must be able to rely on the data they are given to meet those objectives, the primary goal of Humberside was to implement a data cleansing solution.
The greatest benefits expected from implementing a data cleansing solution was that it would reduce the time law enforcement staff needed to spend on manual data validation and provide them with more reliable data and ultimately better intelligence. Having timely and reliable information at the time of need to support decision making would further lead to improved performance. Benefits would further accrue to the community as officers could devote more time on reducing crime and protecting citizens.
Briggs (2007) made no mention of any training provided to or needed by the users of the Informatica and Autonomy tools. However, she did mention that the users are experts in identifying the information they need while those responsible for defining and editing the business rules have the needed technical skills. It is presumed that the technical staff had some form of training in the use and administration of the system to support users. The key point in this case is that implementation of the system required a team effort by technical and non-technical staff suggesting that learning was a continuous process with best practices evolving from that process.
Outside Services Used
There is also no mention of any outside services used in the implementation of the data cleansing solution. It stands to reason, however, that the technical staff would have negotiated vendor technical support at the time of purchase of the tools and would avail themselves of that support if needed.
Perspective on Success
Given the improved performance in the 2006/07 assessment year as reported by BBC (2007), it appears that Humberside is on the right track. However, there is insufficient data provided in the study to form an opinion as to whether or not Humberside actually achieved what it set out to accomplish. Although Briggs (2007) indicated that 98% of Humberside’s vehicle registration numbers conformed to a national standard after cleansing compared to 83% prior to cleansing, there is no way to determine if that improvement led to improved staff performance or if it was cost-effective.
Any system that is relied upon and fails to work has the potential to be anything from a mere nuisance to catastrophic depending on how critical the system is. How disruptive the duration of the downtime is also depends on the volume of data the system processes, the frequency of use, and number of users. In the case of Humberside, any amount of downtime could result in loss of life if emergency responders are unable to obtain the data they need. In any event, downtime caused by problems with the system itself as opposed to electrical outages and acts of nature could be avoided with adequate vendor support.
Major Success Factors
The major success factors for Humberside need to be identified and quantified. Since police performance is a primary concern, reasons for less than optimal performance need to be investigated and the extent to which lack of data integrity is a significant causal factor determined if possible. Since lack of data integrity is known to have impeded staff’s ability to perform their duties, improving data integrity is a desired goal, thus, at a minimum, Humberside could establish a baseline by tracking the number of data errors encountered per query and then establish a formal plan to reduce that number over a specified period of time.
The main lesson is actually one that has been known for decades and that is: garbage in, garbage out (GIGO) or that any information extracted from a database is only as good as the data that was entered into it. As Briggs (2007) pointed out, data cleansing is no small task, however it is necessary in order to enable users to rely on the data to support their decisions. Further, as is true with any system implementation plan designed to enhance job performance, it is essential to identify and monitor measurable success factors.
A possible enhancement to a data cleansing system is a logging feature that tracks the number of data errors encountered over time. Such a feature could be used to monitor error rates to ensure they are within acceptable parameters established by the organization.
Desired Additional Materials to Aid in Analysis
Although the title of the article “Cleaner Data Allows Better Policing” may sound somewhat intuitive, there is insufficient factual evidence supported by research to reach the implied conclusion. While lack of data integrity may impede performance of duties, it may not be significant when compared to other influential factors such as population demographics, urban/rural makeup of the jurisdiction, available funding, political and economic climate, and number of police per capita.