| By CMS News Desk | Article Rating: |
|
| February 25, 2010 06:00 AM EST | Reads: |
2,432 |
IBM on Thursday announced it is working with the British Library on a project that will preserve and analyse terabytes of information on the Web before it is lost forever.
The new analytics software project, called IBM BigSheets, helps extract, annotate and visually analyse vast amounts of Web information using a Web browser. IBM's new technology prototype is helping the British Library archive and preserve massive amounts of Web pages, and then unlock the virtual door to its archives for generations to come.
IBM's new analytics technology is helping the British Library speed up the archival process before Web data is lost forever. The Web is rapidly changing with new pages created every day causing an explosion of data that is disappearing almost as quickly as it is published. Recent research estimates the average life expectancy of a Web site is just 44 - 75 days. In turn, every six months, 10 percent of Web pages on the UK domain are lost.
"IBM BigSheets does for big data what spreadsheets did for personal computing," said Rod Smith, vice president, Emerging Internet Technologies, IBM. "Within a matter of minutes, researchers, academics and students will be able to search many terabytes archived Web pages from the UK domain, analyse the results and effortlessly visualise the results of the search."
Preserving Data for Generations to Come
Each year more than six million searches are generated by the British Library online catalogue, and nearly 400,000 people visit the British Library reading rooms, looking for information. The British Library receives a copy of every physical publication produced in the UK and Ireland, amounting to more than 150 million maps, manuscripts, musical scores, newspapers and magazines that it must archive. Beyond just the physical assets, the British Library has been archiving selected Web pages from the UK domain since 2004. With BigSheets, users of the Library will be able to access vast archives of historic Web sites, and easily research and analyse their queries and visualise the results of the search.
"We estimate the UK Web space will contain over 11 million Web sites by 2011. To take on the enormous challenge of capturing this content, we need a system capable of taking the UK Web Archive to Web-scale," said Helen Hockx-Yu, Web Archiving Programme Manager, The British Library. "IBM can help us analyse the web archive containing millions of pages and unlock embedded knowledge which otherwise is difficult to discover using traditional search methods."
Whether it's someone interested in their own genealogy or a student working on a project for school, people need help making sense of this growing sea of information on the Web. For example, the 2005 election marked the first attempts by UK politicians to use the Web as a campaigning tool. With the use of Web campaigns expected to explode during the 2010 election, the 2005 collection will enable researchers studying the evolution of politics and the Web to access hugely valuable primary source material.
BigSheets: The Technical Foundation
This year, the amount of digital information is expected to reach 988 exabytes which is the equivalent to a stack of books from the Sun to Pluto and back. The Web is exploding with data and business professionals want to access that data -- both structured and unstructured -- to get better insights to their business. IBM BigSheets is an insight engine that helps businesses get insights from really large data sets easily and in a timely manner. By building on top of the Apache Hadoop framework, IBM BigSheets is able to process large amounts of data quickly and efficiently.
IBM BigSheets is a new technology prototype. Users can explore and generate new data insights using a Web application and then the IBM software publishes Web 2.0 standard data feeds which can be searchable by British Library patrons.
BigSheets is an extension of the mashup paradigm that integrates gigabytes, terabytes, or petabytes of unstructured data from Web-based repositories; collects a wide range of unstructured Web data stemming from user-defined seed URLs; extracts and enriches that data using an unstructured information management architecture; and lets the user explore and visualise this data in specific, user-defined contexts. For example, users can see search results in a pie chart and look at the data in a tag cloud.
Published February 25, 2010 Reads 2,432
Copyright © 2010 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By CMS News Desk
CMS News Desk trawls the websites, information sources, press wires, and blogs of the world for timely and insightful news and views on every apsect of Web content management and CMS systems.
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- Upcoming Bloomberg BNA Webinar Focuses on COPPA Compliance
- WordsEye Announces Upcoming Beta of a First-of-Its-Kind Text-to-Scene Application
- Mobile Devices Now Account for 25% of Total U.K. Paid Search Spend According to New Kenshoo Report
- New Relic Named Best Place to Work in the Bay Area for Second Year in a Row
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Streamline Health® Engages KPMG as Its New Independent Registered Public Accountants
- Harris Farms Streamlines General Accounting Practices of Its Hospitality Operations With Westbrook Fortis
- Social Business Intelligence Book Industry’s First Executive SBI Guide
- This Week in Cloud, April 26, 2013: AWS S3 reaches 2 trillion objects, CA Technologies acquires Layer 7 & Nolio, CSA cloud guidelines for SMBs. And more…
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- Scripps Networks Interactive’s Popular Lifestyle Shows from HGTV, DIY Network, Food Network, Cooking Channel and Travel Channel Coming to Prime Instant Video and Amazon Instant Video
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- Global Micro Servers Market (2013 - 2018), By Processor Type (Intel, Arm, Amd), Component (Hardware, Software, Operating System), Application (Media Storage, Data Centers, Analytics, Cloud Computing) & Geography (North America, Europe, Apac, Row)
- LivePerson Scheduled to Participate in Upcoming Investor Conferences
- Upcoming Bloomberg BNA Webinar Focuses on COPPA Compliance
- WordsEye Announces Upcoming Beta of a First-of-Its-Kind Text-to-Scene Application
- Mobile Devices Now Account for 25% of Total U.K. Paid Search Spend According to New Kenshoo Report
- VOO and SeaChange Win Cable Europe’s 2013 Innovation Award
- TheLadders Recognizes Best In Business With New ELITE Program
- New Relic Named Best Place to Work in the Bay Area for Second Year in a Row
- 25 great site designs powered by Wordpress
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- Cloud Expo New York Call for Papers Now Open
- ManageWP Powers Over 100,000 WordPress Sites Within Three Months of Launch
- Drupal Content Management Platform Has Been Chosen By Ulitzer
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Ulitzer’s Amazing First 30 Days in Public Beta
- Cisco Unveils Visual Collaboration Solutions in the Post-PC Era, Extending the Reach of TelePresence With New Mobile-to-Immersive Offerings
- Will Ulitzer Dominate News Content on The Web? -Gartner
- Cloud Expo, Inc. Announces Cloud Expo 2011 New York Venue
- Drupal Creator Forms Company
- Open Letter to the President of Syria Bashar al-Assad
























