News

Gartner: Open source data quality software focuses on data profiling

Jeff Kelly, News Editor

Open source data quality software could be a good fit for companies looking for an inexpensive way to conduct data

Requires Free Membership to View

profiling -- but that's about it, according to Gartner.

While open source vendors like JasperSoft and Talend have enjoyed significant success in business intelligence (BI), data integration and other data management domains, they are just starting to explore the data quality market, according to Ted Friedman, an analyst with the Stamford, Conn.-based research firm and author of a recent report on the topic.

"The significant increase in interest in [open source] data integration seems to be spilling over into the related field of data quality," Friedman said.

Not surprisingly for a new entrant to the market, however, open source data quality software and applications tend to be less mature than their open source data management cousins, he said.

They rarely incorporate more than one functional requirement, he said, and most lack more sophisticated data quality capabilities like data matching and monitoring.

But that doesn't mean open source data quality software can't benefit some organizations. A handful of open source data quality products on the market are adequate for basic data profiling, according to Friedman.

Data profiling involves collecting and analyzing statistics on the quality of a data set in order to identify problem areas. It is often the first step in a data quality project.

Companies undertaking a broad data quality initiative can distribute open source data profiling software to multiple users in various departments, because data profiling is often recommended during a project's early phases, Friedman wrote in his report, co-authored with fellow analyst Andreas Bitterer.

Data profiling is also useful "for educational or initial assessment purposes and to assist in developing requirements for data transformation and data migration projects," according to the report.

Open source data quality's low price tag is offset by a number of factors, however. Friedman said the software lacks business user-friendly interfaces, meaning that it requires significant technical expertise to use. There is also generally little in the way of support from the vendors.

The "most advanced" of the open source vendors offering data quality tools is Talend, based in Los Altos, Calif., Friedman said. While better known for its open source data integration software, Talend recently released the Talend Open Profiler, available for free download, as well as a commercial data quality product that includes some limited data cleansing and matching capabilities.

Another recent entrant to the market is the Denmark-based DataCleaner, whose software "consists of a quick download and an easy installation, including some sample data that allows you to try out the profiling functionality," Friedman said.

Other vendors offering open source data quality software include Toronto-based SQL Power, and Infosolve, based in South Brunswick, N.J.

Still, open source data quality software vendors have a long way to go if they want to grab their portion of the $500 million data quality market. And that could take significant time.

"It will be well beyond 2012 before open source data quality platforms have broadly caught up in terms of their capabilities with the commercial data quality tool vendors and are considered a viable alternative for enterprise-wide usage," Friedman wrote.

As open source data quality offerings do mature, Friedman said he would not be surprised to see an acquisition or two by larger data integration vendors, as the two technologies -- data quality and data integration -- continue to converge.

Tags: Data profiling tools and techniquesData quality management softwareData quality techniques and best practicesVIEW ALL TAGS