如何分类这些数据？

因此，我正在开发一个项目，在该项目中，我必须对一个大型的34mb文本文件进行排序，该文件充满了歌曲数据。文本文件的每一行都有一年，唯一编号，艺术家和歌曲。我无法弄清楚的是如何有效地将数据分类到其他文本文件中。我想按艺术家名称和歌曲名称排序。可悲的是，这是我的全部：如何分类这些数据？

#Opening the file to read here 
with open('tracks_per_year.txt', 'r',encoding='utf8') as in_file: 
#Creating 'lists' to put information from array into 
years=[] 
uics=[] 
artists=[] 
songs=[] 

#Filling up the 'lists' 
for line in in_file: 
    year,uic,artist,song=line.split("<SEP>") 
    years.append(year) 
    uics.append(uic) 
    artists.append(artist) 
    songs.append(song) 
    print(year) 
    print(uic) 
    print(artist) 
    print(song) 

#Sorting: 
with open('artistsort.txt', 'w',encoding='utf8') as artist: 

for x in range(1,515576): 

    if artists[x]==artists[x-1]: 
     artist.write (years[x]) 
     artist.write(" ") 
     artist.write(uics[x]) 
     artist.write(" ") 
     artist.write(artists[x]) 
     artist.write(" ") 
     artist.write(songs[x]) 
     artist.write("\n") 


with open('Onehitwonders.txt','w',encoding='utf8') as ohw: 

for x in range(1,515576): 

    if artists[x]!= artists[x-1]: 
     ohw.write (years[x]) 
     ohw.write(" ") 
     ohw.write(uics[x]) 
     ohw.write(" ") 
     ohw.write(artists[x]) 
     ohw.write(" ") 
     ohw.write(songs[x]) 
     ohw.write("\n")

请记住我是新手，所以请尽量把你的解释深入浅出。如果你们有其他的想法，我也很乐意听到他们的意见。谢谢！

来源

2014-01-10 Bobbert

你不应该使用'range'这一点。如果文件中的条目数量发生变化，将会破坏您的逻辑。你可以使用'为艺术家排队：'确保你总是遍历每一行。 – IanAuld

@IanAuld感谢您的建议，但我在开始时就这么做了。问题在于没有任何文件以这种方式写在artistsort.txt文件中，并且一个命中奇迹文件变得太大（〜32mb）。 – Bobbert

这与'for'循环无关。在你之前的问题中，你的逻辑存在一个问题，它阻止了任何写入该文件的内容。 for循环只是迭代你的数据，它是在它决定了你的数据实际发生了什么后。 – IanAuld

您可以将数据导入基于字典的结构，即对于每一个歌手和歌曲：

data = {artist_name: {song_name: {'year': year, 'uid': uid}, 
         ... }, 
     ...}

然后，当你输出，使用sorted让他们按字母顺序排列：

for artist in sorted(data): 
    for song in sorted(data[artist]): 
     # use data[artist][song] to access details

来源

2014-01-10 16:05:11 jonrsharpe

请尝试这样的：

from operator import attrgetter 

class Song: 
    def __init__(self, year, uic, artist, song): 
     self.year = year 
     self.uic = uic 
     self.artist = artist 
     self.song = song 

songs = [] 

with open('tracks_per_year.txt', 'r', encoding='utf8') as in_file: 
    for line in in_file: 
     year, uic, artist, song = line.split("<SEP>") 
     songs.append(Song(year, uic, artist, song)) 
     print(year) 
     print(uic) 
     print(artist) 
     print(song) 

with open('artistsort.txt', 'w', encoding='utf8') as artist: 
    for song in sorted(songs, key=attrgetter('artist', 'song')): 
     artist.write (song.year) 
     artist.write(" ") 
     artist.write(song.uic) 
     artist.write(" ") 
     artist.write(song.artist) 
     artist.write(" ") 
     artist.write(song.song) 
     artist.write("\n")

来源

2014-01-10 16:27:18 vmario

非常感谢这个想法。它的唯一部分，我没有得到的是“歌曲排序（歌曲，键= attrgetter（'艺术家'，'歌'））：”。介意解释。 – Bobbert

内置的python函数'sorted（）'从'歌曲'列表中返回一个新的排序列表。可选参数'key'是来自'songs'每个元素的函数返回键。在这种情况下，'attrgetter'函数返回'Song'对象的'artists'和'song'字段。 – vmario

你不能击败的简单。要阅读您的文件：

import pandas as pd 

data = pd.read_csv('tracks_per_year.txt', sep='<SEP>') 
data 
# year uic  artist  song 
#0 1981 uic1 artist1  song1 
#1 1934 uic2 artist2  song2 
#2 2004 uic3 artist3  song3

然后通过特定的列进行排序，并写入新文件只是做：

data.sort(columns='year').to_csv('year_sort.txt')

来源

2014-01-10 16:35:00 elyase

如何分类这些数据？

回答

相关问题