Rook部署的Ceph系统中CephFS的一个小问题

栏目: 服务器 · 发布时间: 5年前

内容简介:在使用Rook部署的Ceph集群里,配置好CephFS后,在比较长的目录下,读写文件失败。现象如下:使用ceph-fuse客户端时:

在使用Rook部署的Ceph集群里,配置好CephFS后,在比较长的目录下,读写文件失败。

现象如下:

使用ceph-fuse客户端时:

root@ceph3:/mnt/test/volumes/kubernetes/kubernetes/kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0# cat fox 
cat: fox: Input/output error

使用kernel client时:

root@ceph3:/mnt/test/volumes/kubernetes/kubernetes/kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0# cat fox 
cat: fox: File name too long

问题分析

打开ceph-fuse client的log:

# ceph-fuse -m 10.10.15.89:6790,10.10.15.198:6790 /mnt/test/ -n client.admin --keyring=./yangguanjun/keyring --debug-client=20

然后在 /var/log/ceph/里查看log,有err: -36

2018-12-05 18:42:04.908612 7fd5955bc700  3 client.5213 ll_read 0x565512bf8760 0x10000000007  0~4096 
2018-12-05 18:42:04.910114 7fd599dc5700 10 client.5213 ms_handle_connect on 172.16.1.54:6800/33886 
2018-12-05 18:42:04.910887 7fd5955bc700 10 client.5213 check_pool_perm on pool 2 ns fsvolumens_kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0 rd_err = -36 wr_err = -36 
2018-12-05 18:42:04.910907 7fd5955bc700 10 client.5213 check_pool_perm on pool 2 ns fsvolumens_kubernetes-dynamic-pvc-9adbb10c-f86a-11e8-96ff-9247c38478e0 rd_err = -36 wr_err = -36

查看 linux 里的 include/uapi/asm-generic/errno.h ,有如下定义:

#define ENAMETOOLONG    36  /* File name too long */

因为这个错误跟ceph-fuse和kernel client无关,所以猜测是osd相关地方的问题;

然后搜索ceph代码,osd相关的地方有如下几处检查:

文件:osd/PrimaryLogPG.cc

/** do_op - do an op
 * pg lock will be held (if multithreaded)
 * osd_lock NOT held.
 */
void PrimaryLogPG::do_op(OpRequestRef& op)
{
...
    // object name too long?
    if (m->get_oid().name.size() > cct->_conf->osd_max_object_name_len) {
        dout(4) << "do_op name is longer than "
                << cct->_conf->osd_max_object_name_len
                << " bytes" << dendl;
        osd->reply_op_error(op, -ENAMETOOLONG);
        return;
    }
    if (m->get_hobj().get_key().size() > cct->_conf->osd_max_object_name_len) {
        dout(4) << "do_op locator is longer than "
                << cct->_conf->osd_max_object_name_len
                << " bytes" << dendl;
        osd->reply_op_error(op, -ENAMETOOLONG);
        return;
    }
    if (m->get_hobj().nspace.size() > cct->_conf->osd_max_object_namespace_len) {
        dout(4) << "do_op namespace is longer than "
                << cct->_conf->osd_max_object_namespace_len
                << " bytes" << dendl;
        osd->reply_op_error(op, -ENAMETOOLONG);
        return;
    }
...
}

于是打开OSD的log:

[root@rook-ceph-tools /]# ceph tell osd.0 config set debug_osd 5
Set debug_osd to 5/5
[root@rook-ceph-tools /]# ceph tell osd.1 config set debug_osd 5
Set debug_osd to 5/5

然后继续测试重新问题,抓取osd的log:

# grep "longer than" *
rook-ceph-osd-0-85f5bf454f-64w7d-ceph3.log:2018-12-05 11:14:38.864707 7fbe34194700  4 osd.0 pg_epoch: 24 pg[2.15( empty local-lis/les=21/22 n=0 ec=20/20 lis/c 21/21 les/c/f 22/22/0 21/21/20) [0,1] r=0 lpr=21 crt=0'0 mlcod 0'0 active+clean] do_op namespace is longer than 64 bytes
rook-ceph-osd-0-85f5bf454f-64w7d-ceph3.log:2018-12-05 11:14:38.864853 7fbe3819c700  4 osd.0 pg_epoch: 24 pg[2.15( empty local-lis/les=21/22 n=0 ec=20/20 lis/c 21/21 les/c/f 22/22/0 21/21/20) [0,1] r=0 lpr=21 crt=0'0 mlcod 0'0 active+clean] do_op namespace is longer than 64 bytes

而对应代码处的检查为:

if (m->get_hobj().nspace.size() > cct->_conf->osd_max_object_namespace_len)

查看osd相关的配置:

[root@rook-ceph-osd-0-85f5bf454f-64w7d-ceph3 ceph]# cat ceph.conf
...
osd max object name len      = 256
osd max object namespace len = 64
...

(o゜▽゜)o☆[BINGO!],找到原因了,哪为什么会有这个配置呢??

在Rook的代码里有下面代码,文件:pkg/daemon/ceph/osd/device.go

func writeConfigFile(cfg *osdConfig, context *clusterd.Context, cluster *cephconfig.ClusterInfo, location string) error {
    cephConfig := cephconfig.CreateDefaultCephConfig(context, cluster, cfg.rootPath)
    if isBluestore(cfg) {
        cephConfig.GlobalConfig.OsdObjectStore = config.Bluestore
    } else {
        cephConfig.GlobalConfig.OsdObjectStore = config.Filestore
    }
    cephConfig.CrushLocation = location

    if cfg.dir || isFilestoreDevice(cfg) {
        // using the local file system requires some config overrides
        // http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#not-recommended
        cephConfig.GlobalConfig.OsdMaxObjectNameLen = 256
        cephConfig.GlobalConfig.OsdMaxObjectNamespaceLen = 64
    }
    ...
}

可以看出这个配置项是在配置OSD使用目录 或 使用FileStore时添加的。

查看Ceph关于Filestore里的说明,指出在ext4的文件系统里,因为xattrs长度的限制,启动Filestore会被限制,用户可以在确定object name比较短的应用场景里,配置下面的两个参数来使用ext4的Filestore。

osd max object name len = 256
osd max object namespace len = 64

参考: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#not-recommended

查看我们的配置,cluster.yaml里对OSD的配置如下:

...
  storage:
    useAllNodes: false
    useAllDevices: false
    deviceFilter:
    config:
      storeType: bluestore
    nodes:
    - name: ceph2
      devices:
      - name: sdb1
    - name: ceph3
      devices:
      - name: sdb1

而Rook在配置OSD时候,是不支持配置分区的。若配置为分区时,实际上Rook代码检查端会跳过所有有分区的盘,然后默认使用 /var/lib/rook/osd<id>/ 这个目录来创建OSD,如下:

# ll  /var/lib/rook/osd0/
total 3344
drwxr--r-- 3 root root         4096 Dec  5 16:44 ./
drwxr-xr-x 5 root root         4096 Dec  5 16:44 ../
lrwxrwxrwx 1 root root           34 Dec  5 16:44 block -> /var/lib/rook/osd0/bluestore-block
lrwxrwxrwx 1 root root           31 Dec  5 16:44 block.db -> /var/lib/rook/osd0/bluestore-db
lrwxrwxrwx 1 root root           32 Dec  5 16:44 block.wal -> /var/lib/rook/osd0/bluestore-wal
-rw-r--r-- 1 root root            2 Dec  5 16:44 bluefs
-rw-r--r-- 1 root root 472432779264 Dec  5 19:22 bluestore-block
-rw-r--r-- 1 root root   1073741824 Dec  5 16:44 bluestore-db
-rw-r--r-- 1 root root    603979776 Dec  5 19:27 bluestore-wal

解决办法

临时方法

修改 osd_max_object_namespace_len 为更长的值即可。

[root@rook-ceph-tools /]# ceph tell osd.0 config set osd_max_object_namespace_len 256
Set osd_max_object_namespace_len to 256
[root@rook-ceph-tools /]# ceph tell osd.1 config set osd_max_object_namespace_len 256
Set osd_max_object_namespace_len to 256

推荐方法

Ceph OSD使用BlueStore,在Rook对应Ceph集群的clustre.yaml文件里,指定OSD使用整块磁盘。

...
  storage:
    useAllNodes: false
    useAllDevices: false
    deviceFilter:
    config:
      storeType: bluestore
    nodes:
    - name: ceph2
      devices:
      - name: sdb
    - name: ceph3
      devices:
      - name: sdb

注:指定的磁盘要创建GPT Header,并且删除所有分区


以上所述就是小编给大家介绍的《Rook部署的Ceph系统中CephFS的一个小问题》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Data Structures and Algorithm Analysis in Java

Data Structures and Algorithm Analysis in Java

Mark A. Weiss / Pearson / 2011-11-18 / GBP 129.99

Data Structures and Algorithm Analysis in Java is an “advanced algorithms” book that fits between traditional CS2 and Algorithms Analysis courses. In the old ACM Curriculum Guidelines, this course wa......一起来看看 《Data Structures and Algorithm Analysis in Java》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

html转js在线工具
html转js在线工具

html转js在线工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换