QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
|
Digging into Apache HadoopThursday, October 2. 2008Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Have you seen the OpenSolaris/Hadoop live CD? http://opensolaris.org/os/project/livehadoop/
Yes ... i saw it ... but i want to integrate it in my alread running testbed
(Your spam prevention is preventing valid comments, BTW. It took me way too many tries to get around it.)
I'm not sure I understand what you mean about not being able to work with compressed files. The documentation at hadoop.apache.org/core/docs/current/native_libraries.html describes how to turn on native compression in Hadoop so that it can read/use gzip, lzo, etc, compression as part of the MR job.
Sorry for the hassles with the spam prevention but with a less stringent spam prevention i would use my free time with deleting spams ...
I read in a presentation that you canīt splits compressed files in shards. That sounded logical, as you canīt take out 10 MB out of a gz file and gunzip it. I have to admit that iīm in my early stages to dig into hadoop. I will further dig into the documentation ...
You might be interested in Hive, Facebook's "alternative" to HBase, as well - it seems to provide a better interface (SQL-like, rather than the HBase shell).
Also, the way we got around the compression issue (our files were tarred and gzipped) was to extract and resize the archives on the fly -- to meet the shard size I believe. I wasn't personally involved in that part, so you'll have to forgive me if I've got it wrong!
Check out CloudBase-
http://cloudbase.sourceforge.net It is a data warehouse system built on top of Hadoops Map Reduce architecture that allows one to query Terabyte and Petabyte of data using ANSI SQL. It comes with a JDBC driver so one can use third party BI tools, reporting frameworks to directly connect to CloudBase. CloudBase creates a database system directly on flat files and converts input ANSI SQL expressions into map-reduce programs for processing flat files. It has an optimized algorithm to handle Joins and plans to support table indexing in next release. |
+1The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog AdministrationDonateOkay, okay ... as several people have asked for it ... but you know my opinion.
|