Entries tagged as technology
Tuesday, October 14. 2008
Yesterday iīve got a new mobile. Itīs one of this superduperallinclusive smartphone. There is even a GPS receiver in the device. Additionally the phone includes all this hip three to six letter acronym technologies. But the best feature of the mobile doesnīt need all this arcane technological wizardry.
Before tell you about this feature, i have to describe one of my habits: I use my mobile as a alarm clock at home as the ringing of a telephone is more efficient than the ringing of a alarm clock as a high piority sleep interrupt. I assume ringing of the telephone is hardwired to "an important message" whereas the sound of an alarm clock is hardwired to "waking up and leaving the warm bed" in my brain.
Okay ... the absolute best feature of the E71 is: When you press the middle of the cursor button of the locked mobile , it displays the time white on black (so it doesnīt dazzle you) with a font size you can even read with the small eyes at 05:00 oīclock when you have to reach the red-eye train ...
Thursday, October 2. 2008
Iīm exploring Apache Hadoop at the moment. This looks like a really interesting technology. Whatīs Hadoop? Hmm ... to explain it in a really simplified manner: Itīs a distributed, highly available datastore. Okay ... yawn ... no big deal so far.
But there is an interesting twist in Hadoop. Letīs assume you have vast amounts of log files. A pile of data in the size of multiple Terabytes. You want to know the URLs of the Top-10 pages of your website.
The standard old-school approach to this problem is: Starting an analyser on a big server which gathers all data via block or file based protocols to this analysing server. Of course this approach has several bottlenecks: The size of the network pipes, the amount of computing power in a single box, the amount of IOPS in a single server, the amount of IOPS of a single storage attached to the server, the amount of memory in a single server.
But now think in HPC terms about this problem: You could divide this task in several ones. Letīs assume 64 MB shards. You could compute the result for each of the shards on a seperate node. To stay in our example: This step print outs the pageviews of any URL in itīs shard. This fragments of the final result are collected and reduced to the final result: For example by adding the pageviews of a URL from every shard. By using the concepts you you seperate the task of analysing the log files on thousands of nodes in parallel. You get your answer in minutes, not hours or days. The advantage of doing so is to bring the data intensive parts of computation to the data instead of bringing the data to the computation.
Such algorithms are called MapReduce. This concept was introduced by Google and the core competency of Google is to analyse big piles of data, thus such an mechanism is quite handy. I have several usecases in mind for such a solution: Commercial data warehousing, billing of large heaps of Call Data Records, mass converting jobs ... and so on ...
What has all this stuff to do with Hadoop. Hadoop is an open-source implementation of this concepts. The Hadoop Wiki writes: Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It does the housekeeping, the seperation of data in shards, the distribution of the analysing tasks on the server. You can view at it as an API/command line controlled grid engine for data distribution and data processing.
It consists out of the Hadoop Core, the Hadoop Distributed File System (itīs not a POSIXfilesystem integrated to the VFS, you can think of it like FTP, you need a client or you use an API to use it) , there is even an scripting language helping you to write the analysing jobs. This language is called Pig. Additional there is an effort to implement a database for structured data on top of Hadoop with HBase.
At the moment there are some gotchas in this technology. Fore example you canīt work with compressed files in it, as gzip files shards arenīt decompressible seperately (okay, itīs a problem of gzip, but it prevents you to work with it) (Update: Iīm not entirely sure about this, it seems that you can work with block based compression like bz2, and gzip is indevelopment, at least according to the Pig documents). But here ZFS on-the-fly compression can be very helpful. I think i will create a Hadoop testbed with multiple zones on one of my systems this weekend.
Wednesday, June 4. 2008
Nice presentation about the most southern installation of SamFS: Ein Robot auf hoher See - SAM-FS bei 70 Grad Süd (sorry, only in german language, but nice photos). Itīs about a SamFS installation on the Polarstern, a german research vessel. I found this presentation held by Dr. Hans Pfeiffenberger from the Alfred Wegener Institute for Polar and Marine Research after a hint from a colleague. This is storage under extreme conditions.
Tuesday, October 23. 2007
A fully automatic Triple-A goes berzerk and kills 9 soldier in south africa.
A robot cannon began wildly and autonomously firing its huge gun in South Africa last Friday, killing 9 soldiers and wounding 14. The Oerlikon GDF-005 antiaircraft gun suddenly began uncontrollably shooting as it swung back and forth, spraying hundreds of high-explosive 35mm cannon shells all over the place
This is a good example why software should never ever control the actual firing of bullets. In wartimes it may be a good tradeoff to design a trigger-happy robot for antimortar/antishell artillery, as itīs propable that more people are killed by mortar rounds then by a berzerk robot. But at peace time without incoming rounds? At least it wasnīt a wise choise to put live ammunition into this robot.
Wednesday, September 5. 2007
Iīve got some really good photos of the maiden flight of the A380 MSN11. You will find them at aviation.c0t0d0s0.org.( here and here
Sunday, September 2. 2007
Ob Airbus jetzt wohl Ausparksensoren beim A380 nachruestet? Airbus A380 in Bangkok leicht beschädigt
Saturday, September 1. 2007
Sometimes it looks like Boeing is the IBM of the aircraft industry: Bold statements galore, but nothing behind it  . Interesting article at the Seattle Times : Boeing may acknowledge further 787 delays next week. I remember dozens of articles in the media who try to tell the world, that Boeing is so much better at constructing the 787 then Airbus at building the A380. But: The airframe of the 787 at rollout wasnīt able to fly, even when all other systems were in place and functional, because of temporary fasteners. Boeing partly dismantled the first 787 immediately after its rollout to allow mechanics to install systems including electrical wiring, hydraulic tubes and the flight-deck instrumentation and also to replace temporary fasteners. And according to other reports, switching on the power in the plane is still weeks away. I simply do not believe that they deliver their first 787 in May 2008. I would bet on May 2009. At the end, itīs like with the Power6 4.7 Ghz
Thursday, August 23. 2007

A second A380 can be found here, even a A340 looks a little bit small in relation to the lady.
(found on Google Maps via Google Sightseeing)
Thursday, August 16. 2007
Erstauslieferung der SIA-A380 ist am 15. Oktober. Vielleicht hören dann endlich mal die Unkenrufe auf ...
Sunday, August 12. 2007

Two impressive points at this video: The landing with cross winds and the the sheer size of the lady ...
Sunday, August 12. 2007
The Chicago Tribune speculates in " Takeoff appears delayed for 787" about delays due software problems. I donīt think, that the software will be the biggest problem of the 787. As far as i know, nobody has long-term experience with an large sized commercial aircraft consisting out of plastic. Until now all "plastic aircraft" were military ones, pulled into the hangar at night, relativly small time airborne and with a "no-matter-the-costs"-maintainance. But whats with an apron-parking aircraft, that rushes from one airport to another?
Thursday, July 26. 2007
Iīve reported about the separated ZIL a few days ago. The problem of the described NVRAM PCI card is, that you canīt do a clusterfailover with such a device. How do you want to failover the seperate log, when the log is on a card in the failed server? Sun had a product called Prestoserve, that was used to accelerate NFS and DB. It was static RAM with a battery. It was great for benchmarks, but suffered by the cluster problem.
Thus you should use some external device, that can failover with the rest of your storage. The obvious choice would be a RAM-based Solid State Disk(SSD). But these are quite expensive: You need the RAM, you need a harddisk to keep the data persistent when power fails, and you need a rechargeable battery or an capacitor thatīs able to power the SSD until all data is written to the hard disk.
A Flash-Based SSD would be a more sensible choice, as Flash is a non-volatile memory by nature. Such a disk costs you approximatly 400$. But most people think "Oh no, wear will destroy it within a few days". Experiences with el-cheapo CF-cards underline this assumption.
But letīs calculate with the specifications of a leading brand flash disk. Letīs assume: A 32 GB flash-based SSD is specified for 2.000.000 write cycles. We have a sustained stream of 40 MB per second (conservative assumption). The wear leveling is perfect (perhaps supported by a seperate ZIL algorithm, that looks at the flash SSD as a cyclic buffer). Okay, a little math:

So this flash SSD wouldnīt fail by wear within the usable live of the storage and the server, even when you write 40 MB every second to it. Iīm sure, that a flash disk doesnīt run such a long time, but this is not a wear problem, itīs the problem, that modern electronic hasnīt the build quality of former times.
Based on this considerations, a flash SSD would be an interesting choice for the separated ZIL. Or at least: Wear isnīt a reason for not using Flash SSD
PS: There is one point, iīm not perfectly sure, but i interpret the 2 million write cycles as the ability to erase and write the full disk 2 million times.
Sunday, July 15. 2007
Scientia writes in his blog about a little-reported fact: Albeit announced in 2006, a Core2Duo with more than 3 GHz is still not available.
Tuesday, May 15. 2007
Mads pointed me to this good article written by Greg regarding software patents: Will we stop pursuing software patents on our software? Can't do that yet. That's simply because our competitors will still go for them, and unless our system changes, we'd have fewer "trading stamps" and end up paying even higher rates to indemnify the users of our software. At the end, the patent systems gone wild somewhere in the past in the software space. Even when you donīt like it, you have to go for it, as other would do it. And nobody would step back first, as all participants are in fear that all other market participants would use this as the competitive weapon. Itīs the classical mexican standoff.
Thursday, May 3. 2007
Itīs a well known problem in the computer industry, that the needed time for filesystem checking will reach sooner or later unacceptable dimensions. This was one reason, why we developed ZFS. A number of mechanisms in the filesystem ensures an always consistent state. The Linux community sees this problem as well.
But this solution looks more like a kludge: ChunkFS divides a filesystem into up to 256 chunks that gets transparently merged into one user/application-visible filesystem. Every chunk is an filystem on itīs own. The idea behind this concept is, that you only need to check a few chunks and not the whole filesystem. This idea has some major drawbacks. At first i assume that in practice the fault isolation of chunkfs wonīt reach the level you need to save substantial fscking time.
Itīs only a short thoughtgame, but: The more write load you give to the filesystem the more chunks will be in "dirty state". The more write load you have on a filesystem, the more probable a inconsistent state will be, as the probability of disrupting an write operation in progress rises with the amount of write operations. So in my personal opinion you end with several "dirty" chunks and thus you wonīt get such an big advantage. The more you need the mechanisms to shorten fsck time, the less ChunkFS would help you.
The biggest advantage may be parallel fsck-ing of the chunks but this would pose a hge load to the storage systems although it would be interesting how to solve the dependencies between chunk (Imagine: Chunk A needs a consistent Chunk B, Chunk B needs a consistent Chunk A. How to solve this conflict without risking consistency of the whole filesystem) . Besides of this, it introduces new classes of problems like the creation of unique inodes over several file systems or the mentioned problem of interchunk dependencies.
At the end, there is only one solution to problem of the growing fsck run times: Obsoleting filesystem checking at all. The most reasonable way to to this is by copy on write and transactional writes. Net Apps saw the problem and invented WAFL, Sun saw the problem and invented ZFS. I think itīs time for the Linux community to find a real solution and to step back of developing an questionable kludge.
PS: The section 8 of this shows a problem that seems to be common within the Linux development community: Vast misunderstandings about the inner function of ZFS. There is no filesytem checking at ZFS, as the filesystem donīt need one, so . You can scrub the filesystem (in the widest and farthest sense similar to fsck, but this can be done online and it checks the validity of data and metadata by the checksums). As the people in the linux community tend to be intelligent to downright brilliant, i donīt understand this misunderstandings ...
|
Comments