2014-01-10 23 views
1

因此,我正在开发一个项目,在该项目中,我必须对一个大型的34mb文本文件进行排序,该文件充满了歌曲数据。文本文件的每一行都有一年,唯一编号,艺术家和歌曲。我无法弄清楚的是如何有效地将数据分类到其他文本文件中。我想按艺术家名称和歌曲名称排序。可悲的是,这是我的全部:如何分类这些数据?

#Opening the file to read here 
with open('tracks_per_year.txt', 'r',encoding='utf8') as in_file: 
#Creating 'lists' to put information from array into 
years=[] 
uics=[] 
artists=[] 
songs=[] 

#Filling up the 'lists' 
for line in in_file: 
    year,uic,artist,song=line.split("<SEP>") 
    years.append(year) 
    uics.append(uic) 
    artists.append(artist) 
    songs.append(song) 
    print(year) 
    print(uic) 
    print(artist) 
    print(song) 

#Sorting: 
with open('artistsort.txt', 'w',encoding='utf8') as artist: 

for x in range(1,515576): 

    if artists[x]==artists[x-1]: 
     artist.write (years[x]) 
     artist.write(" ") 
     artist.write(uics[x]) 
     artist.write(" ") 
     artist.write(artists[x]) 
     artist.write(" ") 
     artist.write(songs[x]) 
     artist.write("\n") 


with open('Onehitwonders.txt','w',encoding='utf8') as ohw: 

for x in range(1,515576): 

    if artists[x]!= artists[x-1]: 
     ohw.write (years[x]) 
     ohw.write(" ") 
     ohw.write(uics[x]) 
     ohw.write(" ") 
     ohw.write(artists[x]) 
     ohw.write(" ") 
     ohw.write(songs[x]) 
     ohw.write("\n") 

请记住我是新手,所以请尽量把你的解释深入浅出。如果你们有其他的想法,我也很乐意听到他们的意见。谢谢!

+1

你不应该使用'range'这一点。如果文件中的条目数量发生变化,将会破坏您的逻辑。你可以使用'为艺术家排队:'确保你总是遍历每一行。 – IanAuld

+0

@IanAuld感谢您的建议,但我在开始时就这么做了。问题在于没有任何文件以这种方式写在artistsort.txt文件中,并且一个命中奇迹文件变得太大(〜32mb)。 – Bobbert

+0

这与'for'循环无关。在你之前的问题中,你的逻辑存在一个问题,它阻止了任何写入该文件的内容。 for循环只是迭代你的数据,它是在它决定了你的数据实际发生了什么后。 – IanAuld

回答

0

您可以将数据导入基于字典的结构,即对于每一个歌手和歌曲:

data = {artist_name: {song_name: {'year': year, 'uid': uid}, 
         ... }, 
     ...} 

然后,当你输出,使用sorted让他们按字母顺序排列:

for artist in sorted(data): 
    for song in sorted(data[artist]): 
     # use data[artist][song] to access details 
0

请尝试这样的:

from operator import attrgetter 

class Song: 
    def __init__(self, year, uic, artist, song): 
     self.year = year 
     self.uic = uic 
     self.artist = artist 
     self.song = song 

songs = [] 

with open('tracks_per_year.txt', 'r', encoding='utf8') as in_file: 
    for line in in_file: 
     year, uic, artist, song = line.split("<SEP>") 
     songs.append(Song(year, uic, artist, song)) 
     print(year) 
     print(uic) 
     print(artist) 
     print(song) 

with open('artistsort.txt', 'w', encoding='utf8') as artist: 
    for song in sorted(songs, key=attrgetter('artist', 'song')): 
     artist.write (song.year) 
     artist.write(" ") 
     artist.write(song.uic) 
     artist.write(" ") 
     artist.write(song.artist) 
     artist.write(" ") 
     artist.write(song.song) 
     artist.write("\n") 
+0

非常感谢这个想法。它的唯一部分,我没有得到的是“歌曲排序(歌曲,键= attrgetter('艺术家','歌')):”。介意解释。 – Bobbert

+0

内置的python函数'sorted()'从'歌曲'列表中返回一个新的排序列表。可选参数'key'是来自'songs'每个元素的函数返回键。在这种情况下,'attrgetter'函数返回'Song'对象的'artists'和'song'字段。 – vmario

0

你不能击败的简单。要阅读您的文件:

import pandas as pd 

data = pd.read_csv('tracks_per_year.txt', sep='<SEP>') 
data 
# year uic  artist  song 
#0 1981 uic1 artist1  song1 
#1 1934 uic2 artist2  song2 
#2 2004 uic3 artist3  song3 

然后通过特定的列进行排序,并写入新文件只是做:

data.sort(columns='year').to_csv('year_sort.txt')