内容简介:翻译自:https://stackoverflow.com/questions/30741061/parallel-python-iteration
我想基于pandas.DataFrame中的值创建一个类的实例.我已经失败了.
import itertools
import multiprocessing as mp
import pandas as pd
class Toy:
id_iter = itertools.count(1)
def __init__(self, row):
self.id = self.id_iter.next()
self.type = row['type']
if __name__ == "__main__":
table = pd.DataFrame({
'type': ['a', 'b', 'c'],
'number': [5000, 4000, 30000]
})
for index, row in table.iterrows():
[Toy(row) for _ in range(row['number'])]
多处理尝试
我已经能够通过添加以下内容来并行化这种(某种程度):
pool = mp.Pool(processes=mp.cpu_count())
m = mp.Manager()
q = m.Queue()
for index, row in table.iterrows():
pool.apply_async([Toy(row) for _ in range(row['number'])])
如果行[‘number’]中的数字明显长于表的长度,这似乎会更快.但在我的实际情况中,表格长达数千行,每行[‘number’]相对较小.
尝试将表拆分为cpu_count()块并在表中进行迭代似乎更聪明.但现在我们处于我的 python 技能的边缘.
我已经尝试过python解释器对我尖叫的事情,比如:
pool.apply_async(
for index, row in table.iterrows():
[Toy(row) for _ in range(row['number'])]
)
还有“无法腌制”的事情
Parallel(n_jobs=4)(
delayed(Toy)([row for _ in range(row['number'])]) \
for index, row in table.iterrows()
)
编辑
这可能让我更接近,但仍然没有.我在一个单独的函数中创建类实例,
def create_toys(row):
[Toy(row) for _ in range(row['number'])]
....
Parallel(n_jobs=4, backend="threading")(
(create_toys)(row) for i, row in table.iterrows()
)
但我被告知’NoneType’对象不可迭代.
对我来说有点不清楚你期望的输出是什么.你只想要一份表格的大清单吗?
[Toy(row_1) ... Toy(row_n)]
每个玩具(row_i)出现多重row_i.number?
根据@JD Long提到的答案,我认为你可以这样做:
def process(df):
L = []
for index, row in table.iterrows():
L += [Toy(row) for _ in range(row['number'])]
return L
table = pd.DataFrame({
'type': ['a', 'b', 'c']*10,
'number': [5000, 4000, 30000]*10
})
p = mp.Pool(processes=8)
split_dfs = np.array_split(table,8)
pool_results = p.map(process, split_dfs)
p.close()
p.join()
# merging parts processed by different processes
result = [a for L in pool_results for a in L]
翻译自:https://stackoverflow.com/questions/30741061/parallel-python-iteration
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- 少说话多写代码之Python学习030——条件语句07(如何迭代-并行迭代)
- 迭代器萃取与反向迭代器
- 浅谈python可迭代对象,迭代器
- 可迭代对象,迭代器(对象),生成器(对象)
- sqltoy-orm-4.17.6 发版,支持 Greenplum、并行查询可设置并行数量
- 终于把动态规划与策略迭代、值迭代讲清楚了
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Is Parallel Programming Hard, And, If So, What Can You Do About
Paul E. McKenney
The purpose of this book is to help you understand how to program shared-memory parallel machines without risking your sanity.1 By describing the algorithms and designs that have worked well in the pa......一起来看看 《Is Parallel Programming Hard, And, If So, What Can You Do About 》 这本书的介绍吧!