QuicksearchCodenews SearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
NavigationCategories
|
![]() Digging into Apache HadoopThursday, October 2. 2008Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Have you seen the OpenSolaris/Hadoop live CD? http://opensolaris.org/os/project/livehadoop/
Yes ... i saw it ... but i want to integrate it in my alread running testbed
(Your spam prevention is preventing valid comments, BTW. It took me way too many tries to get around it.)
I'm not sure I understand what you mean about not being able to work with compressed files. The documentation at hadoop.apache.org/core/docs/current/native_libraries.html describes how to turn on native compression in Hadoop so that it can read/use gzip, lzo, etc, compression as part of the MR job.
Sorry for the hassles with the spam prevention but with a less stringent spam prevention i would use my free time with deleting spams ...
I read in a presentation that you canīt splits compressed files in shards. That sounded logical, as you canīt take out 10 MB out of a gz file and gunzip it. I have to admit that iīm in my early stages to dig into hadoop. I will further dig into the documentation ...
You might be interested in Hive, Facebook's "alternative" to HBase, as well - it seems to provide a better interface (SQL-like, rather than the HBase shell).
Also, the way we got around the compression issue (our files were tarred and gzipped) was to extract and resize the archives on the fly -- to meet the shard size I believe. I wasn't personally involved in that part, so you'll have to forgive me if I've got it wrong!
Check out CloudBase-
http://cloudbase.sourceforge.net It is a data warehouse system built on top of Hadoops Map Reduce architecture that allows one to query Terabyte and Petabyte of data using ANSI SQL. It comes with a JDBC driver so one can use third party BI tools, reporting frameworks to directly connect to CloudBase. CloudBase creates a database system directly on flat files and converts input ANSI SQL expressions into map-reduce programs for processing flat files. It has an optimized algorithm to handle Joins and plans to support table indexing in next release. |
Links in this articleThe LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos SyndicationTagged articlesAMD Apple avs Bahn Blogging Blogosphere braindump Business Travel CeBIT cec cec2006 CMT del.icio.us deutsch dtrace fliegen Fundsache General Hamburg IBM i hate sundays Intel iscsi jumpstart Links Linux lksf Mindfuck Movies Music Musik Niagara Opensolaris Opteron Photographie policy of ... Politik Security Solaris storage Sun suncec2007 sunw t1 The IT Business Ultrasparc ultrasparc t1 Wirtschaft Work ZFS
Comments about links for 2010-09-05
Mon, 06.09.2010 08:52
besides pissing everyone of in
the company hurd is most prom
inent for slashing r&d at hp t
o its bare minimum. famo [...]
about A really long rant ...
Mon, 06.09.2010 08:19
>termination. I don't think th
e ON gate >was ever showing re
al time >development, just the
staged >releases of cod [...]
about links for 2010-09-05
Mon, 06.09.2010 03:34
Close OpenSolaris, ramp up sup
port for Solaris 10, sue Googl
e over Java, let star engineer
ing talent go - and now [...]
about A really long rant ...
Sun, 05.09.2010 21:03
Looks like I'm a bit late to t
he party here, so maybe no-one
will read this, but here goes
anyway:
First, in my o [...]
about A really long rant ...
Sun, 05.09.2010 19:02
I know what you mean with your
talk about packaging.
And th
at's what I was saying: you ca
n package yourself your [...]
Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog Administration |