Kosmos Filesystem Release
Web search engines are required to process large volumes of data. This entails having a scalable backend storage infrastructure built on commodity hardware (such as, cluster of PCs running Linux). To address this infrastructure need, at Kosmix, we have developed the Kosmos Distributed Filesystem (KFS). We have released KFS as an open-source project under the terms of the Apache 2.0 license. The initial release is KFS version 0.1 and it is currently in “alpha”. The source code as well as pre-built binaries are available for download at the project site
In a nutshell, KFS virtualizes disk storage on a cluster of machines providing a global namespace. Files are striped across nodes in the cluster and are replicated for fault tolerance/availability. KFS consists of a client library that enables user applications to read/write files stored in KFS.
KFS supports the familiar filesystem interfaces/programming model. The functionality of the KFS API is similar to the model exposed by operating systems such as Linux. To illustrate,
• When a file is created, the filename is visible in the global namespace.
• As data is written to a block of a file, it gets flushed out to the set of servers storing that block. Data written to servers can now be read by other processes.
• For writing/reading, a process can seek to any point in the file and read/write from there.
• Files can be opened for writing multiple times.
• Data can be appened to existing files by opening the file for writing in append mode.
When blocks of a file are striped across nodes in the cluster, KFS stores individual blocks of file as files in the underlying file system (such as, XFS on Linux). To guard against disk corruption, checksums are computed on the blocks and verified on each read. If disk corruption is detected by checksum mismatch, the system discards the corrupted block and uses re-replication to recover lost data.
Each file stored in KFS is typically replicated 3-way. Depending on application needs, the degree of replication for files can be changed on-the-fly.
KFS also contains rudimentary support for block rebalancing. To help with better disk utilization across nodes, the system may periodically migrate data from over-utilized to under-utilized nodes.
KFS client library provides support job placement systems. For instance, a job scheduler can determine the location(s) of a byte range within a file and schedule jobs appropriately.
KFS is implemented in C++. In addition to C++ applications, KFS also contains support for Java (via JNI) and Python applications.
To enable a large class applications to evaluate KFS, we have integrated KFS to be the backing store for other open source projects:
• Hadoop: Hadoop is an open-source project that provides a Map/Reduce implementation. It contains a Filesystem API that allows alternate implementations to be used as the backing store. For example, currently, the set of choices for a backing store are Local filesystem, HDFS, S3 infrastructure. As a new alternative to these choices, KFS is integrated with Hadoop using Hadoop’s Filesystem API. This allows existing Hadoop Map/Reduce applications to use KFS seamlessly. That is, by changing some Hadoop configuration parameters, KFS can be used as the backing store. We have submitted the necessary “glue” code to the Hadoop code-base; it will be included in the next Hadoop release.
• Hypertable: Hypertable is an open source project (being developed at Zvents Inc.) that provides a Big-Table interface. KFS is integrated with Hypertable as the backing store.
We are releasing KFS with the intent of providing a useful storage infrastructure software. It is our hope that KFS will meet the storage needs of various projects. We would be happy to work with anyone interested in using KFS. Please try out KFS and give us your feedback of what works, what you would like to see added/possibly contribute to KFS!

Subscribe to our RSS Feed


October 1st, 2007 at 10:37 am
Hypertable has not been released yet?
October 1st, 2007 at 10:50 am
Harish,
No, Hypertable has not been currently released. Please check with either Doug or Ethan at Zevents for their release dates.
Sriram
October 1st, 2007 at 12:58 pm
Hi, this is Ethan, CEO of Zvents. Hypertable has not yet been released. It will be available under an open-source license within the next 60 days. Email me (firstname at company) to discuss in more detail.
October 6th, 2007 at 1:32 pm
Good chance that we will use KFS with our local Bangalore client. Will be very interesting to see how we can manage to pull out something interesting for them.
Will keep you guys in loop! Thanks for bringing this out for others to use. Much appreciated!
Are you guys based out of Bangalore too?
Gyani
October 21st, 2007 at 4:32 pm
Is there plans to make KFS more posix compliant? Mostly in regards to permissions.
October 22nd, 2007 at 1:57 pm
Craig,
Re: Permissions, it is currently not in the plans; though it would be a very useful capability to add to KFS.
Sriram
November 17th, 2007 at 5:57 pm
I know Kosmos Distributed File System from http://blog.powerset.com/ , I am interested in powerset involved in the System?
November 26th, 2007 at 9:12 am
Hello,
I’m a final year Student of Computing Degree and i’m hoping to do a Distributed File System for my Final Year Project. So i’ll b much greatful if u can provide me a detailed report and the source code.
Thank you..
November 26th, 2007 at 10:44 am
Rukshan,
The source code and doc are available on the project page on sourceforge:
http://sourceforge.net/projects/kosmosfs/
Sriram
December 3rd, 2007 at 4:44 pm
coone:
No, powerset isn’t involved in Kosmos Filesystem.
Sriram