SQL-DFS:一种基于HDFS的海量小文件存储系统

马志强; 杨双涛; 闫瑞; 张泽广

doi:10.11936/bjutxb2015060040

SQL-DFS:一种基于HDFS的海量小文件存储系统

SQL-DFS: A Massive Small File Storage System Based on HDFS

摘要

摘要: 针对Hadoop分布式文件系统(Hadoop distributed file system,HDFS)进行小文件存储时Name Node内存占用率高的问题,通过分析HDFS基础架构,提出了基于元数据存储集群的SQL-DFS文件系统.通过在Name Node中加入小文件处理模块实现了小文件元数据由Name Node内存到元数据存储集群的迁移,借助关系数据库集群实现了小文件元数据的快速读写,并对小文件读取过程进行优化,减少了文件客户端对Name Node的请求次数;通过将部分Data Node文件块的校验工作交由元数据存储集群完成,进一步降低了Name Node节点的负载压力.最终通过搭建HDFS和SQL-DFS实验平台,对HDFS和SQL-DFS 2种架构进行了小文件读写的对比测试,实验结果表明:SQLDFS在文件平均耗时(file average cost,FAC)和内存占用率方面均明显优于原HDFS架构,具有更好的小文件存储能力,可用于海量小文件的存储.

Abstract: In order to solve the problem of high occupancy rate of Name Node memory while using Hadoop distributed file system( HDFS) to store massive small files,this paper analyzed the HDFS storage structure and presented a SQL-DFS file system based on metadata storage cluster. In SQL-DFS,in order to move small file metadata from Name Node memory to metadata storage cluster a small file processing module was added in Name Node. In order to improve the reading and writing speed of the metadata,relational database cluster was used,and in order to reduce the time of request for Name Node the reading process of the small file was optimized. To further reduce the load pressure of Name Node,the checking of file block from Data Node was completed by metadata storage cluster. Finally the contrast experiments were carried out between HDFS and SQL-DFS experimental platform. The experimental results show that SQL-DFS in the file average cost( FAC) and memory occupancy rate are significantly better than that of the original HDFS architecture and has better small file storage capacity. It can be used for the storage of massive small files.

HTML全文

参考文献(15)

施引文献

资源附件(0)