2017-10-09 39 views
1

嗨,我有一个看起来像这样的数据train.dat。我试图创建一个变量,它将包含包含(-1或1)的列的[ith]值,以及另一个变量来保存包含字符串的列的值。如何在熊猫中将数据分成不同的变量

到目前为止,我已经试过了,

df=pd.read_csv("train.dat",delimiter="\t", sep=',') 
# print(df.head()) 


# separate names from classes 
vals = df.ix[:,:].values 
names = [n[0][3:] for n in vals] 
cls = [n[0][0:] for n in vals] 
print(cls) 

但是输出看起来都混乱了,任何帮助,将不胜感激。我在python

+0

请将您的数据样本作为文本发布,而不是图片。 –

回答

1

一个begineer如果数值之后的字符是一个标签,你没事,所有你需要的

import io # using io.StringIO for demonstration 
import pandas as pd 

ratings = "-1\tThis movie really sucks.\n-1\tRun colored water through 
a reflux condenser and call it a science movie?\n+1\tJust another zombie flick? You'll be surprised!" 

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
       header=None, names=['change', 'rating']) 
  • 传递header=None可以确保第一行是解释为数据。
  • 传递names=['change', 'rating']提供了一些(合理的)列标题。

当然,该字符不是一个选项卡:D。

import io # using io.string 
import pandas as pd 

ratings = "-1 This movie really sucks.\n-1 Run colored water through a 
reflux condenser and call it a science movie?\n+1 Just another zombie 
flick? You'll be surprised!" 

df = pd.read_csv(io.StringIO(ratings), sep='\t', 
       header=None, names=['stuff']) 

df['change'], df['rating'] = df.stuff.str[:3], df.stuff.str[3:] 
df.drop('stuff', axis=1) 

一个可行的选择是将整个评分读作一个临时列,拆分字符串,将其分配到两列并最终删除临时列。