Insurance companies venturing into digital insurance encounter typhoon volumes of data, challenging even the most adept to gain any business value. How can we weather these data storms? More importantly, how can we turn the data into information to drive more revenue or save expenses?
Historically companies have established structured data stores and data warehouses to contain data available for searching. But these are proving to be increasingly costly to maintain, and they are unable to handle the large volumes of media and internet related data.
Some companies have begun to use open-source Hadoop clusters to store and analyse the media and internet related data (unstructured data). A few companies have started to expand the use of this Big Data technology into more operational areas, with increasing success.
Early challenges to this adoption included security and technical skills, but better security tools and Big Data Platform as a Service (BDPaaS) are addressing these challenges. Meanwhile techniques such as Master Data Management (MDM) combined with “schema on read” are enabling new ways to manage and access data.
In this article we review the business situation for data in insurance, the technologies employed to manage that data, and end by looking at some of the benefits and experiences thus far using Hadoop, especially for broader enterprise purposes.
Digital insurance = Data abundance
Digital insurance is marked by an abundance of data from many sources. Primary sources of data include internal systems; internet, social media, etc associated with customers and prospects; and market data.
Insurance companies have long used the data from their existing policy, commission, claims, and finance systems. Too often even this data has been fragmented within the different systems, yielding little added value.
Introducing digital channels increases the volume of digitised data coming from customers. While online sales increases sales related data, allowing customer and agency self service increases the digital data while simultaneously increasing the efficiency of the company. Likewise, straight through claims processing will result in increased digital data volumes, and provide opportunities for decreasing claims processing costs and fraud.
With digital insurance comes the flood of associated data, such as:
• How someone actually uses the company’s website and reacts to online advertising
• How far along in a sales process a potential customer progresses
• What someone chooses to eat, how much the person exercises, or how healthy the person is (from personal fitness devices, phones, watches, and other wearables)
• How a customer drives (from usage based insurance systems)
• What is happening in the life of a customer or identified prospect (from Facebook, other social media, etc).
Much of this data is still unavailable or unused by Asian insurance companies, despite the value it can potentially deliver to the business.
The internet provides an abundance of data on how people think, feel, buy and interact with companies and each other over electronic media.
This data can yield insights for product development, pricing, and positioning. Combining this data with the internal and external data could provide insights into existing customers to drive better customer service and increased sales to those customers – but few companies achieve this goal today.
Converting data to information – Data management options
So what does it take to obtain these insights from data to add value to the business, for sales and other processes? Querying the data through the individual application systems has not been the answer. Likewise, having separate capabilities for internet and for operational data has not provided an integrated view.
To store, manage, and process the data from the various sources, companies have used evolving technologies, as depicted in the diagram below (see Diagram 1).
Companies begin with operational data stores (ODS), usually with SQL databases, where data from the operational application(s) could be transferred and stored with minimal transformation. This data could be used for building operational reports, originally printed reports that now have become tabular and highly visual. This approach tends to have some critical drawbacks:
• Often tied directly to one or more underlying application systems, but with little linking of the data
• Limited cleansing and processing of the data
• Limited retention of historical data, especially detailed data.
Data warehouse/data mart using SQL database
Some companies combined the data from various applications or ODS into a single SQL structured data warehouse (usually with data marts per major department). Data was cleansed before it was placed into the data warehouse, usually as part of an ETL process (Extract, Transform, Load). Although originally using attached hard disks or network storage, this led to data storage appliances that offered faster access and extra tools for managing the data.
This approach has worked reasonably well. Companies are able to query related data from different systems, and obtain tabular and visual reports across the enterprise.
But this approach also has serious limitations:
• Need for constant revision as underlying systems change along with changing business requirements – intensive and costly to maintain
• Inappropriate for unstructured data such as media and internet data
• Difficult and expensive to scale for very large data volumes.
Open-source Hadoop cluster
Increasingly, groups, or “clusters”, of low cost processors and storage using open-source Hadoop software are used to manage data. These open-source Hadoop clusters have been particularly effective for storing the vast and generally unstructured data from social media and the internet.
Most early implementations of Hadoop clusters have been in corporate data centres. Recently companies have begun using cloud computing for Hadoop clusters, especially as uses have extended beyond individual departments and into the broader enterprise.
Companies can also obtain this capability as a service using Big Data Platform as a Service (BDPaaS). BDPaaS solutions can be deployed in companies’ own data centres, in private clouds, or in public clouds. Any of these solutions can be secure; even public cloud solutions can be PCI and HIPAA accredited for financial and health institutions.
Early concerns about data reliability, integrity and security have been addressed with data replication and a combination of open-source and vendor tools. We can now secure down to the element level by user account.
Improved concepts for data management
Incorporating two concepts can lead to a logical data warehouse where data is stored and accessed in ways that better meet most enterprise requirements.
Appropriately implemented Master Data Management (MDM) enables companies to define the source of truth for data and a common means for using data across applications.
The use of “schema on query” rather than “schema on write” provides significantly more flexibility and an ability to derive value faster from the data. Schema on write means applying the structure to the data when it is written to the database, most commonly used for SQL databases. Schema on read means applying the structure and ensuring quality when the data is extracted from the storage for use in analytics, reports, and applications. In both cases a common data model forms the basis for the schema.
Hadoop clusters for multiple business purposes
To date Hadoop clusters have primarily been used for internet related data, especially for analytical purposes by data scientists. Recently, though, some insurance companies have ventured into operational uses, with excellent results:
• A large US life insurer combined the data from over 70 systems across the US onto a Hadoop cluster using a NoSQL database, and linked it to their CRM system so that the customer service representatives could access all the policies, claims data, etc for each customer. They built the system and started deriving value within six months.
• A mid-tier UK general insurance company decided to forego a central SQL data store and simply use Hadoop for the data from their various application systems. Increased sales paid for the development within six months; and the company saved millions of pounds on claims fraud within 18 months.
• A US auto insurer decided to use Hadoop to handle the storage and analytics associated with telemetry data from their UBI auto insurance customers. They used the data to assess customer use and revise the products.
Hadoop/BDPaaS as data warehouse alternative for insurance
Many insurance companies in Asia are now implementing or considering implementing data warehouses using SQL data stores. Is this the right approach? Building and maintaining SQL data bases even on data appliances is costly for large volumes of data, and takes a long time to implement, often more than a year. During this period the company gains no value from the data warehouse.
Hadoop clusters combined with new data management techniques and visualisation/ analytical tools that can extract from multiple sources provide significant benefits:
• Low cost storage solution
• Fast delivery of business value, often less than six months
• Rapid, phased implementation to get to overall capability
• Easily modifiable to accommodate business and technical changes
Asian companies have leapfrogged in other technology areas, perhaps they can do so in data management and analytics as well.
Dr Michael Kelly is a Consulting Partner with CSC, based in Singapore.