本文共 10371 字,大约阅读时间需要 34 分钟。
随着数据的日益增长,很多数据由热变冷,已经不再或者很少使用,而数据的存储需求越来越大,计算需求则相应增长不大。如何解耦这种急剧增长的存储需求和计算需求?HDFS Archival Storage正好能派上用场。
HDFS Archival Storage是Hadoop-2.6.0新增的一个特性,是Hadoop异构存储中的一部分。它实现了数据按照策略分开存储,解耦存储与计算能力,将部分冷数据归档至拥有廉价高密度存储介质但计算能力不强的机器。
Hadoop-2.6.0起增加了几种存储策略,可以规划某些机器为归档服务器,存储冷数据,存储介质为高密度廉价磁盘,某些机器为热数据服务器,存储热数据,存储介质可以为磁盘,也可以是SSD。存储策略包括如下几种:
public static final String MEMORY_STORAGE_POLICY_NAME = "LAZY_PERSIST" ; public static final String ALLSSD_STORAGE_POLICY_NAME = "ALL_SSD" ; public static final String ONESSD_STORAGE_POLICY_NAME = "ONE_SSD" ; public static final String HOT_STORAGE_POLICY_NAME = "HOT" ; public static final String WARM_STORAGE_POLICY_NAME = "WARM" ; public static final String COLD_STORAGE_POLICY_NAME = "COLD" ; |
目前,Hadoop-2.6.0支持的为HOT、WARM、COLD三种,热数据全部存储在标记为[DISK]的DataNode存储路径上(未标记的默认为[DISK]),而冷数据全部存储在标记为[ARCHIVE]的DataNode存储路径上,这种节点机器可以是计算能力比较弱但是存储密度高的廉价机器,温数据则介于两者之间,部分副本存储于[DISK]上,而部分副本存储于[ARCHIVE]上。而SSD则是在Hadoop-2.7.0开始支持的一种存储介质。
重启DataNode之后,我们可以标记部分HDFS数据存储路径的存储策略,然后利用hdfs mover工具进行数据的迁移。注意,未标记存储属性的DataNode默认为[DISK],未标记存储策略的HDFS路径默认为unspecified,新建文件时存储在[DISK]上。
修改hdfs-site.xml配置文件中的dfs.datanode.data.dir,在原路径前增加存储属性,如下:
< property > < name >dfs.datanode.data.dir</ name > < value >[ARCHIVE]file:///opt/hadoop/dfs.data</ value > </ property > |
目前Hadoop-2.6.0可以配置的存储属性包括[DISK]、[ARCHIVE],Hadoop-2.7.0开始支持[SSD]和[RAM_DISK]。
sh hadoop -daemon .sh stop datanode sh hadoop -daemon .sh start datanode |
hadoop dfsadmin -setStoragePolicy /tmp/lp_test COLD hadoop dfsadmin -getStoragePolicy /tmp/lp_test |
hdfs mover |
或者
hdfs mover -p /tmp/lp_test |
hdfs mover默认会迁移DataNode上和其存储属性不一样的存储策略的路径下文件的数据块,-p或-f则只会迁移指定路径下或者指定文件与存储属性不一致的数据块。
hadoop fsck /tmp/lp_test/ -files -blocks -locations |
测试环境:5台虚拟机,其中4台为DataNode。
1、标记一台DataNode为[ARCHIVE]后,并标记hdfs路径/tmp/lp_test为HOT后,如果执行hdfs mover会怎么样?
16/09/27 09:47:17 INFO balancer.Dispatcher: Successfully moved blk_1073760906_20104 with size=57521882 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.108:50010:DISK through xxx.xx.x.110:50010 16/09/27 09:47:23 INFO balancer.Dispatcher: Successfully moved blk_1073760895_20093 with size=57521882 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.108:50010:DISK through xxx.xx.x.111:50010 16/09/27 09:47:28 INFO balancer.Dispatcher: Successfully moved blk_1073760887_20085 with size=57521892 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.108:50010:DISK through xxx.xx.x.110:50010 16/09/27 09:47:30 INFO balancer.Dispatcher: Successfully moved blk_1073760905_20103 with size=57521892 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.109:50010:DISK through xxx.xx.x.111:50010 16/09/27 09:47:31 INFO balancer.Dispatcher: Successfully moved blk_1073760907_20105 with size=57521882 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.109:50010:DISK through xxx.xx.x.110:50010 16/09/27 09:47:33 INFO balancer.Dispatcher: Successfully moved blk_1073759183_18381 with size=79634354 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.109:50010:DISK through xxx.xx.x.111:50010 16/09/27 09:47:41 INFO balancer.Dispatcher: Successfully moved blk_1073760893_20091 with size=134217728 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.109:50010:DISK through xxx.xx.x.110:50010 16/09/27 09:47:44 INFO balancer.Dispatcher: Successfully moved blk_1073759178_18376 with size=134217728 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.109:50010:DISK through xxx.xx.x.110:50010 16/09/27 09:47:44 INFO balancer.Dispatcher: Successfully moved blk_1073760883_20081 with size=134217728 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.108:50010:DISK through xxx.xx.x.111:50010 16/09/27 09:47:45 INFO balancer.Dispatcher: Successfully moved blk_1073759181_18379 with size=134217728 from xxx.xx.x.111:50010:ARCHIVE to xxx.xx.x.108:50010:DISK through xxx.xx.x.111:50010 |
可以看到,如果路径下的文件被标记为[HOT],执行hdfs mover会将其数据块由ARCHIVE迁移至DISK,并且,查看ARCHIVE机器上的存储路径,依然还有很多数据块,说明之前的已经存在的文件不会被迁移。由此可以得出:
1)升级前已存在的文件没有定义存储策略,其可以在任何存储属性的DataNode上;
2)没有标记存储属性的DataNode默认为[DISK];
3)对应标注存储策略的路径,数据块的存放则需要按照DataNode存储属性严格存放,hdfs mover会引起数据块的迁移。
2、新建路径,并上传文件后,文件如何分布?
执行以下命令上次文件至hdfs新创建的路径:
hadoop fs -mkdir /tmp/lp_test2 hadoop fs -copyFromLocal 00002.txt /tmp/lp_test2/ |
然后,检查文件数据块存储分布情况,如下:
hadoop fsck /tmp/lp_test2 -files -blocks -locations |
执行结果主要输出如下:
/tmp/lp_test2/00002.txt 6005783952 bytes, 45 block(s): OK 0. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761093_20294 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 1. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761094_20295 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 2. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761095_20296 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 3. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761096_20297 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 4. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761097_20298 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 5. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761098_20299 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 6. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761099_20300 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 7. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761100_20301 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 8. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761101_20302 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 9. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761102_20303 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 10. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761103_20304 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 11. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761104_20305 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 12. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761105_20306 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 13. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761106_20307 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 14. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761107_20308 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 15. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761109_20310 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 16. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761110_20311 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 17. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761111_20312 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 18. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761112_20313 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 19. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761113_20314 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 20. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761115_20316 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 21. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761116_20317 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 22. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761118_20319 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 23. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761119_20320 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 24. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761120_20321 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 25. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761122_20323 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 26. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761123_20324 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 27. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761124_20325 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 28. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761125_20326 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 29. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761126_20327 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 30. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761127_20328 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 31. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761128_20329 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 32. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761129_20330 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 33. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761130_20331 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.108:50010] 34. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761131_20332 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 35. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761132_20333 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 36. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761133_20334 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 37. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761134_20335 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] 38. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761135_20336 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 39. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761136_20337 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 40. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761137_20338 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.109:50010] 41. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761138_20339 len=134217728 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.109:50010] 42. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761139_20340 len=134217728 repl=2 [xxx.xx.x.109:50010, xxx.xx.x.110:50010] 43. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761140_20341 len=134217728 repl=2 [xxx.xx.x.110:50010, xxx.xx.x.108:50010] 44. BP-822385924-xxx.xx.x.107-1470286871711:blk_1073761141_20342 len=100203920 repl=2 [xxx.xx.x.108:50010, xxx.xx.x.110:50010] |
文件所有的数据块副本分布在了xxx.xx.x.108~110三台[DISK]机器上,而xxx.xx.x.111这台[ARCHIVE]机器不会存放新建文件的数据块。这个说明,新建文件虽然没有定义存储策略,但是其数据块是被存储在[DISK]数据节点上的。
如果集群上很多hive表都是按天的分区表,而且如果数据有相当一部分是N年前的数据的话,这种数据理论上应该不会再或者很少会被用到,如果将这些数据标记为冷数据,归档至一些计算能力比较弱的廉价机器上,将会极大节省计算节点的存储空间,增加了存储能力,而计算能力则维持不变,并且这些冷数据可以做压缩、降副本处理,更多的节省归档服务器的存储空间,将来如果还要频繁使用,更改存储策略并利用hdfs mover迁移数据即可。另外,将来也可以利用SSD、内存做热数据的存储,提高数据存取速度,从而提高计算能力。
转载地址:http://zgjil.baihongyu.com/