QuicksearchCodenews SearchDisclaimerThe individual owning this blog works at Sun Microsystems GmbH in Germany, a subsidiary of Oracle. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
NavigationCategories
|
Digging into Apache HadoopThursday, October 2. 2008Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Have you seen the OpenSolaris/Hadoop live CD? http://opensolaris.org/os/project/livehadoop/
Yes ... i saw it ... but i want to integrate it in my alread running testbed
(Your spam prevention is preventing valid comments, BTW. It took me way too many tries to get around it.)
I'm not sure I understand what you mean about not being able to work with compressed files. The documentation at hadoop.apache.org/core/docs/current/native_libraries.html describes how to turn on native compression in Hadoop so that it can read/use gzip, lzo, etc, compression as part of the MR job.
Sorry for the hassles with the spam prevention but with a less stringent spam prevention i would use my free time with deleting spams ...
I read in a presentation that you canīt splits compressed files in shards. That sounded logical, as you canīt take out 10 MB out of a gz file and gunzip it. I have to admit that iīm in my early stages to dig into hadoop. I will further dig into the documentation ...
You might be interested in Hive, Facebook's "alternative" to HBase, as well - it seems to provide a better interface (SQL-like, rather than the HBase shell).
Also, the way we got around the compression issue (our files were tarred and gzipped) was to extract and resize the archives on the fly -- to meet the shard size I believe. I wasn't personally involved in that part, so you'll have to forgive me if I've got it wrong!
Check out CloudBase-
http://cloudbase.sourceforge.net It is a data warehouse system built on top of Hadoops Map Reduce architecture that allows one to query Terabyte and Petabyte of data using ANSI SQL. It comes with a JDBC driver so one can use third party BI tools, reporting frameworks to directly connect to CloudBase. CloudBase creates a database system directly on flat files and converts input ANSI SQL expressions into map-reduce programs for processing flat files. It has an optimized algorithm to handle Joins and plans to support table indexing in next release. |
Links in this articleThe LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Twitterfeedstwitter.com/c0t0d0s0
@mperedim no ... research for a new blog article ;) twitter.com/codenews 6914386 X freeze (and reboot) a build 130 system http://bit.ly/abvIH5 twitter.com/SunPatches Security patch: 113723-21 - SE3510 423A: StorEdge 3510 array controller firmware upgrade. Available since Feb/08/10. http://bit.ly/btnK9U twitter.com/SolPatchesX86 109810-12 - SunOS 5.8_x86: timezone data patch. Available since Feb/08/10. http://bit.ly/bW5k68 twitter.com/SolPatchesSPARC 109809-12 - SunOS 5.8: timezone data patch. Available since Feb/08/10. http://bit.ly/cNUNg8 Web 2.0Contact
Networking open.bc My photos SyndicationTagged articlesAMD Apple avs Bahn Blogging Blogosphere braindump Business Travel CeBIT cec cec2006 CMT del.icio.us deutsch dtrace fliegen Fundsache General Hamburg IBM i hate sundays Intel iscsi jumpstart Links Linux lksf Mindfuck Movies Music Musik Niagara Opensolaris Opteron Photographie policy of ... Politik Security Solaris storage Sun suncec2007 sunw t1 The IT Business Ultrasparc ultrasparc t1 Wirtschaft Work ZFS
CommentsTue, 09.02.2010 12:50
no ... this was a response to
andys comment ...
Tue, 09.02.2010 12:11
Do you mean Andy or lparvirt?
I guess lparvirt. So here is w
hat I think: Maybe lparvirt ov
ersaw that ZFS is able t [...]
Tue, 09.02.2010 11:44
Is there anything this comment
should tell me ?
Tue, 09.02.2010 11:40
Dedup your brain!
Tue, 09.02.2010 11:25
Interesting read and it inspir
ed me to check upon the dedupe
features in TSM6.1. Seems tha
t it uses SHA-1, non-com [...]
Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog Administration |