Tensorflow 错误集锦

栏目: 编程工具 · 发布时间: 6年前

内容简介:首先保证在一个IP为192.168.1.100的机器上启动ps或worker进程:因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。
2018-12-05 22:18:24.565303: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:24.565372: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:24.569212: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3376
2018-12-05 22:18:26.170901: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:26.170969: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:26.174856: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3330
2018-12-05 22:18:27.177003: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job ps -> {0 -> localhost:3376}
2018-12-05 22:18:27.177071: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:222] Initialize GrpcChannelCache for job worker -> {0 -> localhost:3330, 1 -> localhost:3331}
2018-12-05 22:18:27.180980: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:381] Started server with target: grpc://localhost:3331
2018-12-05 22:18:34.625459: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0
2018-12-05 22:18:34.625513: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:36.231936: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:36.231971: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:1
2018-12-05 22:18:37.235899: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2018-12-05 22:18:37.235952: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:worker/replica:0/task:0

首先保证 job_name,task_index,ps_hosts,worker_hosts 这四个参数都是正确的,考虑以下这种情况是不正确的:

在一个IP为192.168.1.100的机器上启动ps或worker进程:

--job_name=worker
--task_index=1
--ps_hosts=192.168.1.100:2222,192.168.1.101:2222
--worker_hosts=192.168.1.100:2223,192.168.1.101:2223

因为该进程启动位置是192.168.1.100,但是运行参数中指定的task_index为1,对应的IP地址是ps_hosts或worker_hosts的第二项(第一项的task_index为0),也就是192.168.1.101,和进程本身所在机器的IP不一致。

另外一种情况也会导致该问题的发生,从TensorFlow-1.4开始,分布式会自动使用环境变量中的代理去连接,如果运行的节点之间不需要代理互连,那么将代理的环境变量移除即可,在脚本的开始位置添加代码:

注意这段代码必须写在import tensorflow as tf或者import moxing.tensorflow as mox之前

import os
os.enrivon.pop('http_proxy')
os.enrivon.pop('https_proxy')

— 摘自( https://bbs.huaweicloud.com/blogs/463145f7a1d111e89fc57ca23e93a89f )


以上所述就是小编给大家介绍的《Tensorflow 错误集锦》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深度探索Linux操作系统

深度探索Linux操作系统

王柏生 / 机械工业出版社 / 2013-10-15 / 89.00

《深度探索linux操作系统:系统构建和原理解析》是探索linux操作系统原理的里程碑之作,在众多的同类书中独树一帜。它颠覆和摒弃了传统的从阅读linux内核源代码着手学习linux操作系统原理的方式,而是基于实践,以从零开始构建一个完整的linux操作系统的过程为依托,指引读者在实践中去探索操作系统的本质。这种方式的妙处在于,让读者先从宏观上全面认清一个完整的操作系统中都包含哪些组件,各个组件的......一起来看看 《深度探索Linux操作系统》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具