内容简介:在使用Ceph的CephFS时,每个client都会建立与MDS的连接,以获取CephFS的元数据信息。如果有多个Active的MDS,则一个client可能会与多个MDS都建立连接。Ceph提供了那么执行
在使用Ceph的CephFS时,每个client都会建立与MDS的连接,以获取CephFS的元数据信息。如果有多个Active的MDS,则一个client可能会与多个MDS都建立连接。
Ceph提供了 client/session
子命令来查询和管理这些连接,在这些子命令中,有一个命令来处理当CephFS的client有问题时,如何手动来断开这些client的连接,比如执行命令: # ceph tell mds.2 client evict
,则会把与mds rank 2 连接的所有clients都断开。
那么执行 client evict
的影响是什么?是否可以恢复呢?本文将重点介绍一下这些。
命令格式
参考: http://docs.ceph.com/docs/master/cephfs/eviction/
测试环境:Ceph Mimic 13.2.1
1. 查看所有client/session
可以通过命令 client/session ls
查看与ms rank [id] 建立connection的所有clients;
# ceph tell mds.0 client ls 2018-09-05 10:00:15.986 7f97f0ff9700 0 client.25196 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 10:00:16.002 7f97f1ffb700 0 client.25199 ms_handle_reset on 192.168.0.26:6800/1856812761 [ { "id": 25085, "num_leases": 0, "num_caps": 5, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.25085 192.168.0.26:0/265326503", "client_metadata": { "ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77", "ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)", "entity_id": "admin", "hostname": "mimic3", "mount_point": "/mnt/cephfuse", "pid": "44876", "root": "/" } } ]
比较重要的信息有:
- id:client唯一id
- num_caps:client获取的caps
- inst:client端的ip和端口链接信息
- ceph_version:client端的ceph-fuse版本,若使用kernel client,则为kernel_version
- hostname:client端的主机名
- mount_point:client在主机上对应的mount point
- pid:client端ceph-fuse进程的pid
2. evict指定client
可以通过指定id来evict特定的client链接;
若有多个Active MDS,单个MDS Rank的evict也会传播到别的Active MDS
# ceph tell mds.0 client evict id=25085
evict client后,在对应的host上检查client的mountpoint已经不能访问:
root@mimic3:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown root@mimic3:~# vim /var/log/ceph/ceph-client.admin.log ... 2018-09-05 10:02:54.829 7fbe732d7700 -1 client.25085 I was blacklisted at osd epoch 519
3. 查看ceph osd的blacklist
evict client后,会把client加入到osd blacklist中(后续有代码分析);
root@mimic1:~# ceph osd blacklist ls listed 1 entries 192.168.0.26:0/265326503 2018-09-05 11:02:54.696345
加入到osd blacklist后,防止evict client的in-flight数据写下去,影响数据一致性;有效时间为1个小时;
4. 尝试恢复evict client
把ceph osd blacklist里与刚evict client相关的记录删除;
root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/265326503 un-blacklisting 192.168.0.26:0/265326503
在对应的host上检查client是否正常?发现client变得正常了!!
root@mimic3:~# cd /mnt/cephfuse root@mimic3:/mnt/cephfuse# ls perftest
而测试 Ceph Luminous 12.2.7 版本时,evcit client后无法立刻恢复,等一段时间后恢复!!
( “mds_session_autoclose”: “300.000000”,)
root@luminous2:~# ceph osd blacklist rm 192.168.213.25:0/1534097905 un-blacklisting 192.168.213.25:0/1534097905 root@luminous2:~# ceph osd blacklist ls listed 0 entries root@luminous2:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown
等待一段时间(300s)后,session变得正常!
root@luminous2:/mnt/cephfuse# ls perftest
测试 cephfs kernel client 的evcit,client无法恢复!!
root@mimic3:~# cd /mnt/cephfs -bash: cd: /mnt/cephfs: Permission denied
5. evict所有的client
若在evict命令后不指定具体的client id,则会把与该MDS Rank链接的所有client evict掉;
若有多个Active MDS,单个MDS Rank的evict也会传播到别的Active MDS
# ceph tell mds.0 client evict
这个命令慎用,也一定不要误用,影响比较大!!!
6. session kill命令
session子命令里还有一个kill命令,它比evict命令更彻底;
root@mimic1:~# ceph tell mds.0 session kill 104704 2018-09-05 15:57:45.897 7ff2157fa700 0 client.25742 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 15:57:45.917 7ff2167fc700 0 client.25745 ms_handle_reset on 192.168.0.26:6800/1856812761 root@mimic1:~# ceph tell mds.0 session ls 2018-09-05 15:57:50.709 7f44eeffd700 0 client.95370 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 15:57:50.725 7f44effff700 0 client.95376 ms_handle_reset on 192.168.0.26:6800/1856812761 [] root@mimic1:~# ceph osd blacklist ls listed 1 entries 192.168.0.26:0/1613295381 2018-09-05 16:57:45.920138
删除 osd blacklist entry:
root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/1613295381 un-blacklisting 192.168.0.26:0/1613295381 root@mimic1:~# ceph osd blacklist ls listed 0 entries
之后client链接没有再恢复!!!
root@mimic3:~# cd /mnt/cephfuse root@mimic3:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown
session kill后,这个session无法再恢复!!!也要慎用!!!
代码分析
基于Ceph Mimic 13.2.1代码;
执行client evict的代码如下,可以看出里面会添加osd blacklist:
bool MDSRank::evict_client(int64_t session_id, bool wait, bool blacklist, std::stringstream& err_ss, Context *on_killed) { ... // 获取指定id的session Session *session = sessionmap.get_session( entity_name_t(CEPH_ENTITY_TYPE_CLIENT, session_id)); // 定义kill mds session的函数 auto kill_mds_session = [this, session_id, on_killed]() { assert(mds_lock.is_locked_by_me()); Session *session = sessionmap.get_session( entity_name_t(CEPH_ENTITY_TYPE_CLIENT, session_id)); if (session) { if (on_killed) { server->kill_session(session, on_killed); } else { C_SaferCond on_safe; server->kill_session(session, &on_safe); mds_lock.Unlock(); on_safe.wait(); mds_lock.Lock(); } } ... }; // 定义添加OSD blacklist的函数 auto background_blacklist = [this, session_id, cmd](std::function<void ()> fn) { ... Context *on_blacklist_done = new FunctionContext([this, session_id, fn](int r) { objecter->wait_for_latest_osdmap( new C_OnFinisher( new FunctionContext(...), finisher) ); }); ... monc->start_mon_command(cmd, {}, nullptr, nullptr, on_blacklist_done); }; auto blocking_blacklist = [this, cmd, &err_ss, background_blacklist]() { C_SaferCond inline_ctx; background_blacklist([&inline_ctx]() { inline_ctx.complete(0); }); mds_lock.Unlock(); inline_ctx.wait(); mds_lock.Lock(); }; // 根据参数执行kill mds session和添加OSD的blacklist if (wait) { if (blacklist) { blocking_blacklist(); } // We dropped mds_lock, so check that session still exists session = sessionmap.get_session(entity_name_t(CEPH_ENTITY_TYPE_CLIENT, session_id)); ... kill_mds_session(); } else { if (blacklist) { background_blacklist(kill_mds_session); } else { kill_mds_session(); } } ... }
调用该函数的地方有:
Cscope tag: evict_client # line filename / context / line 1 1965 mds/MDSRank.cc <<handle_asok_command>> bool evicted = evict_client(strtol(client_id.c_str(), 0, 10), true, 2 2120 mds/MDSRank.cc <<evict_clients>> evict_client(s->info.inst.name.num(), false, 3 782 mds/Server.cc <<find_idle_sessions>> mds->evict_client(session->info.inst.name.num(), false, true, 4 1058 mds/Server.cc <<reconnect_tick>> mds->evict_client(session->info.inst.name.num(), false, true, ss,
1、handle_asok_command:命令行处理client evict
2、evict_clients:批量evict clients
3、find_idle_sessions:对于stale状态的session,执行evict client
4、reconnect_tick:mds恢复后等待client reconnect,45s超时后evict clients
相关参数
于mds session相关的配置参数有:
# ceph daemon mgr.luminous2 config show | grep mds_session_ "mds_session_autoclose": "300.000000", "mds_session_blacklist_on_evict": "true", "mds_session_blacklist_on_timeout": "true", "mds_session_timeout": "60.000000",
还有一些client相关的:
"client_reconnect_stale": "false", "client_tick_interval": "1.000000", "mon_client_ping_interval": "10.000000", "mon_client_ping_timeout": "30.000000",
evict client后的处理
从上面的实践可以看出,evcit client后,client会被添加到osd blacklist里,超时时间为1小时;在这个时间段内,client是不能访问CephFS的;
但是通过命令: ceph osd blacklist rm <entry>
删除osd的blacklist后,client端立刻就能继续访问CephFS,一切都跟之前正常时候一样!
方法1:rm blacklist
root@mimic1:~# ceph tell mds.0 client evict id=25085 2018-09-05 11:07:43.580 7f80d37fe700 0 client.25364 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:07:44.292 7f80e8ff9700 0 client.25370 ms_handle_reset on 192.168.0.26:6800/1856812761 root@mimic1:~# ceph tell mds.0 client ls 2018-09-05 11:05:23.527 7f5005ffb700 0 client.25301 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:05:23.539 7f5006ffd700 0 client.94941 ms_handle_reset on 192.168.0.26:6800/1856812761 [] root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/265326503 un-blacklisting 192.168.0.26:0/265326503 root@mimic1:~# ceph tell mds.0 client ls 2018-09-05 11:07:57.884 7fe07b7f6700 0 client.95022 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:07:57.900 7fe07c7f8700 0 client.25400 ms_handle_reset on 192.168.0.26:6800/1856812761 []
然后在client host重新访问以下挂载点目录后,session变为正常
root@mimic1:~# ceph tell mds.0 client ls 2018-09-05 11:06:31.484 7f6c6bfff700 0 client.94971 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:06:31.496 7f6c717fa700 0 client.94977 ms_handle_reset on 192.168.0.26:6800/1856812761 [ { "id": 25085, "num_leases": 0, "num_caps": 4, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.25085 192.168.0.26:0/265326503", "client_metadata": { "ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77", "ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)", "entity_id": "admin", "hostname": "mimic3", "mount_point": "/mnt/cephfuse", "pid": "44876", "root": "/" } } ]
方法2:wait 1小时
默认evict client后,添加osd blacklist的超时时间为1小时,考察1小时过后,session可以变为正常:
root@mimic1:~# ceph osd blacklist ls listed 0 entries
然后在client host重新访问以下挂载点目录后,session变为正常
root@mimic3:~# cd /mnt/cephfuse/ root@mimic3:/mnt/cephfuse# ls perftest
查看mds的sessions:
root@mimic1:~# ceph tell mds.0 session ls 2018-09-05 13:56:26.630 7fae7f7fe700 0 client.95118 ms_handle_reset on 192.168.0.26:6801/1541744746 2018-09-05 13:56:26.642 7fae94ff9700 0 client.25496 ms_handle_reset on 192.168.0.26:6801/1541744746 [ { "id": 25085, "num_leases": 0, "num_caps": 1, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.25085 192.168.0.26:0/265326503", "client_metadata": { "ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77", "ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)", "entity_id": "admin", "hostname": "mimic3", "mount_point": "/mnt/cephfuse", "pid": "44876", "root": "/" } } ]
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Algorithms in C, Parts 1-4
Robert Sedgewick / Addison-Wesley Professional / 1997-9-27 / USD 89.99
"This is an eminently readable book which an ordinary programmer, unskilled in mathematical analysis and wary of theoretical algorithms, ought to be able to pick up and get a lot out of.." - Steve Sum......一起来看看 《Algorithms in C, Parts 1-4》 这本书的介绍吧!