内容简介:在使用Ceph的CephFS时,每个client都会建立与MDS的连接,以获取CephFS的元数据信息。如果有多个Active的MDS,则一个client可能会与多个MDS都建立连接。Ceph提供了那么执行
在使用Ceph的CephFS时,每个client都会建立与MDS的连接,以获取CephFS的元数据信息。如果有多个Active的MDS,则一个client可能会与多个MDS都建立连接。
Ceph提供了 client/session
子命令来查询和管理这些连接,在这些子命令中,有一个命令来处理当CephFS的client有问题时,如何手动来断开这些client的连接,比如执行命令: # ceph tell mds.2 client evict
,则会把与mds rank 2 连接的所有clients都断开。
那么执行 client evict
的影响是什么?是否可以恢复呢?本文将重点介绍一下这些。
命令格式
参考: http://docs.ceph.com/docs/master/cephfs/eviction/
测试环境:Ceph Mimic 13.2.1
1. 查看所有client/session
可以通过命令 client/session ls
查看与ms rank [id] 建立connection的所有clients;
# ceph tell mds.0 client ls
2018-09-05 10:00:15.986 7f97f0ff9700 0 client.25196 ms_handle_reset on 192.168.0.26:6800/1856812761
2018-09-05 10:00:16.002 7f97f1ffb700 0 client.25199 ms_handle_reset on 192.168.0.26:6800/1856812761
[
{
"id": 25085,
"num_leases": 0,
"num_caps": 5,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.25085 192.168.0.26:0/265326503",
"client_metadata": {
"ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77",
"ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)",
"entity_id": "admin",
"hostname": "mimic3",
"mount_point": "/mnt/cephfuse",
"pid": "44876",
"root": "/"
}
}
]
比较重要的信息有:
- id:client唯一id
- num_caps:client获取的caps
- inst:client端的ip和端口链接信息
- ceph_version:client端的ceph-fuse版本,若使用kernel client,则为kernel_version
- hostname:client端的主机名
- mount_point:client在主机上对应的mount point
- pid:client端ceph-fuse进程的pid
2. evict指定client
可以通过指定id来evict特定的client链接;
若有多个Active MDS,单个MDS Rank的evict也会传播到别的Active MDS
# ceph tell mds.0 client evict id=25085
evict client后,在对应的host上检查client的mountpoint已经不能访问:
root@mimic3:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown root@mimic3:~# vim /var/log/ceph/ceph-client.admin.log ... 2018-09-05 10:02:54.829 7fbe732d7700 -1 client.25085 I was blacklisted at osd epoch 519
3. 查看ceph osd的blacklist
evict client后,会把client加入到osd blacklist中(后续有代码分析);
root@mimic1:~# ceph osd blacklist ls listed 1 entries 192.168.0.26:0/265326503 2018-09-05 11:02:54.696345
加入到osd blacklist后,防止evict client的in-flight数据写下去,影响数据一致性;有效时间为1个小时;
4. 尝试恢复evict client
把ceph osd blacklist里与刚evict client相关的记录删除;
root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/265326503 un-blacklisting 192.168.0.26:0/265326503
在对应的host上检查client是否正常?发现client变得正常了!!
root@mimic3:~# cd /mnt/cephfuse root@mimic3:/mnt/cephfuse# ls perftest
而测试 Ceph Luminous 12.2.7 版本时,evcit client后无法立刻恢复,等一段时间后恢复!!
( “mds_session_autoclose”: “300.000000”,)
root@luminous2:~# ceph osd blacklist rm 192.168.213.25:0/1534097905 un-blacklisting 192.168.213.25:0/1534097905 root@luminous2:~# ceph osd blacklist ls listed 0 entries root@luminous2:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown
等待一段时间(300s)后,session变得正常!
root@luminous2:/mnt/cephfuse# ls perftest
测试 cephfs kernel client 的evcit,client无法恢复!!
root@mimic3:~# cd /mnt/cephfs -bash: cd: /mnt/cephfs: Permission denied
5. evict所有的client
若在evict命令后不指定具体的client id,则会把与该MDS Rank链接的所有client evict掉;
若有多个Active MDS,单个MDS Rank的evict也会传播到别的Active MDS
# ceph tell mds.0 client evict
这个命令慎用,也一定不要误用,影响比较大!!!
6. session kill命令
session子命令里还有一个kill命令,它比evict命令更彻底;
root@mimic1:~# ceph tell mds.0 session kill 104704 2018-09-05 15:57:45.897 7ff2157fa700 0 client.25742 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 15:57:45.917 7ff2167fc700 0 client.25745 ms_handle_reset on 192.168.0.26:6800/1856812761 root@mimic1:~# ceph tell mds.0 session ls 2018-09-05 15:57:50.709 7f44eeffd700 0 client.95370 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 15:57:50.725 7f44effff700 0 client.95376 ms_handle_reset on 192.168.0.26:6800/1856812761 [] root@mimic1:~# ceph osd blacklist ls listed 1 entries 192.168.0.26:0/1613295381 2018-09-05 16:57:45.920138
删除 osd blacklist entry:
root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/1613295381 un-blacklisting 192.168.0.26:0/1613295381 root@mimic1:~# ceph osd blacklist ls listed 0 entries
之后client链接没有再恢复!!!
root@mimic3:~# cd /mnt/cephfuse root@mimic3:/mnt/cephfuse# ls ls: cannot open directory '.': Cannot send after transport endpoint shutdown
session kill后,这个session无法再恢复!!!也要慎用!!!
代码分析
基于Ceph Mimic 13.2.1代码;
执行client evict的代码如下,可以看出里面会添加osd blacklist:
bool MDSRank::evict_client(int64_t session_id,
bool wait, bool blacklist, std::stringstream& err_ss,
Context *on_killed)
{
...
// 获取指定id的session
Session *session = sessionmap.get_session(
entity_name_t(CEPH_ENTITY_TYPE_CLIENT, session_id));
// 定义kill mds session的函数
auto kill_mds_session = [this, session_id, on_killed]() {
assert(mds_lock.is_locked_by_me());
Session *session = sessionmap.get_session(
entity_name_t(CEPH_ENTITY_TYPE_CLIENT, session_id));
if (session) {
if (on_killed) {
server->kill_session(session, on_killed);
} else {
C_SaferCond on_safe;
server->kill_session(session, &on_safe);
mds_lock.Unlock();
on_safe.wait();
mds_lock.Lock();
}
}
...
};
// 定义添加OSD blacklist的函数
auto background_blacklist = [this, session_id, cmd](std::function<void ()> fn) {
...
Context *on_blacklist_done = new FunctionContext([this, session_id, fn](int r) {
objecter->wait_for_latest_osdmap(
new C_OnFinisher(
new FunctionContext(...), finisher)
);
});
...
monc->start_mon_command(cmd, {}, nullptr, nullptr, on_blacklist_done);
};
auto blocking_blacklist = [this, cmd, &err_ss, background_blacklist]() {
C_SaferCond inline_ctx;
background_blacklist([&inline_ctx]() {
inline_ctx.complete(0);
});
mds_lock.Unlock();
inline_ctx.wait();
mds_lock.Lock();
};
// 根据参数执行kill mds session和添加OSD的blacklist
if (wait) {
if (blacklist) {
blocking_blacklist();
}
// We dropped mds_lock, so check that session still exists
session = sessionmap.get_session(entity_name_t(CEPH_ENTITY_TYPE_CLIENT,
session_id));
...
kill_mds_session();
} else {
if (blacklist) {
background_blacklist(kill_mds_session);
} else {
kill_mds_session();
}
}
...
}
调用该函数的地方有:
Cscope tag: evict_client
# line filename / context / line
1 1965 mds/MDSRank.cc <<handle_asok_command>>
bool evicted = evict_client(strtol(client_id.c_str(), 0, 10), true,
2 2120 mds/MDSRank.cc <<evict_clients>>
evict_client(s->info.inst.name.num(), false,
3 782 mds/Server.cc <<find_idle_sessions>>
mds->evict_client(session->info.inst.name.num(), false, true,
4 1058 mds/Server.cc <<reconnect_tick>>
mds->evict_client(session->info.inst.name.num(), false, true, ss,
1、handle_asok_command:命令行处理client evict
2、evict_clients:批量evict clients
3、find_idle_sessions:对于stale状态的session,执行evict client
4、reconnect_tick:mds恢复后等待client reconnect,45s超时后evict clients
相关参数
于mds session相关的配置参数有:
# ceph daemon mgr.luminous2 config show | grep mds_session_
"mds_session_autoclose": "300.000000",
"mds_session_blacklist_on_evict": "true",
"mds_session_blacklist_on_timeout": "true",
"mds_session_timeout": "60.000000",
还有一些client相关的:
"client_reconnect_stale": "false", "client_tick_interval": "1.000000", "mon_client_ping_interval": "10.000000", "mon_client_ping_timeout": "30.000000",
evict client后的处理
从上面的实践可以看出,evcit client后,client会被添加到osd blacklist里,超时时间为1小时;在这个时间段内,client是不能访问CephFS的;
但是通过命令: ceph osd blacklist rm <entry>
删除osd的blacklist后,client端立刻就能继续访问CephFS,一切都跟之前正常时候一样!
方法1:rm blacklist
root@mimic1:~# ceph tell mds.0 client evict id=25085 2018-09-05 11:07:43.580 7f80d37fe700 0 client.25364 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:07:44.292 7f80e8ff9700 0 client.25370 ms_handle_reset on 192.168.0.26:6800/1856812761 root@mimic1:~# ceph tell mds.0 client ls 2018-09-05 11:05:23.527 7f5005ffb700 0 client.25301 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:05:23.539 7f5006ffd700 0 client.94941 ms_handle_reset on 192.168.0.26:6800/1856812761 [] root@mimic1:~# ceph osd blacklist rm 192.168.0.26:0/265326503 un-blacklisting 192.168.0.26:0/265326503 root@mimic1:~# ceph tell mds.0 client ls 2018-09-05 11:07:57.884 7fe07b7f6700 0 client.95022 ms_handle_reset on 192.168.0.26:6800/1856812761 2018-09-05 11:07:57.900 7fe07c7f8700 0 client.25400 ms_handle_reset on 192.168.0.26:6800/1856812761 []
然后在client host重新访问以下挂载点目录后,session变为正常
root@mimic1:~# ceph tell mds.0 client ls
2018-09-05 11:06:31.484 7f6c6bfff700 0 client.94971 ms_handle_reset on 192.168.0.26:6800/1856812761
2018-09-05 11:06:31.496 7f6c717fa700 0 client.94977 ms_handle_reset on 192.168.0.26:6800/1856812761
[
{
"id": 25085,
"num_leases": 0,
"num_caps": 4,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.25085 192.168.0.26:0/265326503",
"client_metadata": {
"ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77",
"ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)",
"entity_id": "admin",
"hostname": "mimic3",
"mount_point": "/mnt/cephfuse",
"pid": "44876",
"root": "/"
}
}
]
方法2:wait 1小时
默认evict client后,添加osd blacklist的超时时间为1小时,考察1小时过后,session可以变为正常:
root@mimic1:~# ceph osd blacklist ls listed 0 entries
然后在client host重新访问以下挂载点目录后,session变为正常
root@mimic3:~# cd /mnt/cephfuse/ root@mimic3:/mnt/cephfuse# ls perftest
查看mds的sessions:
root@mimic1:~# ceph tell mds.0 session ls
2018-09-05 13:56:26.630 7fae7f7fe700 0 client.95118 ms_handle_reset on 192.168.0.26:6801/1541744746
2018-09-05 13:56:26.642 7fae94ff9700 0 client.25496 ms_handle_reset on 192.168.0.26:6801/1541744746
[
{
"id": 25085,
"num_leases": 0,
"num_caps": 1,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.25085 192.168.0.26:0/265326503",
"client_metadata": {
"ceph_sha1": "5533ecdc0fda920179d7ad84e0aa65a127b20d77",
"ceph_version": "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)",
"entity_id": "admin",
"hostname": "mimic3",
"mount_point": "/mnt/cephfuse",
"pid": "44876",
"root": "/"
}
}
]
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
The Book of CSS3
Peter Gasston / No Starch Press / 2011-5-13 / USD 34.95
CSS3 is the technology behind most of the eye-catching visuals on the Web today, but the official documentation can be dry and hard to follow. Luckily, The Book of CSS3 distills the heady technical la......一起来看看 《The Book of CSS3》 这本书的介绍吧!