2017-05-23 52 views
2

嗨,我是新来的蟒蛇,我想通过提供一个可用的函数来增加我的知识库。我试图建立一个函数,它创建一个从1到59范围内的一组数字中取出的6个随机数字的列表。现在我已经破解了这部分,它是下一个棘手的部分。我现在想检查随机集中数字的csv文件,然后打印出一个通知,如果从该集合中找到两个或更多的数字。现在我已经尝试了print (df[df[0:].isin(luckyDip)]),它有一点成功,它检查数据帧中的数字,然后显示数据帧中匹配的数字,但它也显示数据帧的其余部分为NaN,这是技术上不太令人愉快,并不是我想要的。从CSV列表中检查数据

我只是在寻找一些关于下一步做什么的指针,或者只是搜索google的东西,bellow是我一直在搞的代码。

import random 
import pandas as pd 

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv' 
df = pd.read_csv(url, sep=',', na_values=".") 

lottoNumbers = [1,2,3,4,5,6,7,8,9,10, 
      11,12,13,14,15,16,17,18,19,20, 
      21,22,23,24,25,26,27,28,29,30, 
      31,32,33,34,35,36,37,38,39,40, 
      41,42,43,44,45,46,47,48,49,50, 
      51,52,53,54,55,56,57,58,59] 
luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random 
print (sorted(luckyDip))  
print (df[df[0:].isin(luckyDip)]) 

回答

0

如果你只是希望扁平化阵列,并删除NaN值,你可以添加到您的代码的末尾:

matches = df[df[0:].isin(luckyDip)].values.flatten().astype(np.float64) 
    print matches[~np.isnan(matches)] 
0

不一样优雅的@ayhan解决方案,但这个工程:

import random 
import pandas as pd 

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv' 
df = pd.read_csv(url, index_col=0, sep=',') 

lottoNumbers = range(1, 60) 

tries = 0 
while True: 
    tries+=1 
    luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random 

    # subset of balls 
    draws = df.iloc[:,0:7] 

    # True where there is match 
    matches = draws.isin(luckyDip) 

    # Gives the sum of Trues 
    sum_of_trues = matches.sum(1) 

    # you are looking for matches where sum_of_trues is 6 
    final = sum_of_trues[sum_of_trues == 6] 
    if len(final) > 0: 
     print("Took", tries) 
     print(final) 
     break 

的结果是这样的:

Took 15545 
DrawDate 
16-May-2017 6 
dtype: int64 
0

您可以通过计算每行中的notnull值来添加到您拥有的内容。然后显示匹配大于或等于2的行。

match_count = df[df[0:].isin(luckyDip)].notnull().sum(axis=1) 
print(match_count[match_count >= 2]) 

这会为您提供匹配行的索引值和匹配数量。

输出示例:

6  2 
26 2 
40 3 
51 2 

如果你也想从这些行的匹配值,您可以添加:

index = match_count[match_count >= 2].index 
matches = [tuple(x[~pd.isnull(x)]) for x in df.loc[index][df[0:].isin(luckyDip)].values] 
print(matches) 

输出示例:

[(19.0, 23.0), (19.0, 41.0), (19.0, 23.0, 34.0), (23.0, 28.0)]