2017-04-26 52 views
0

我想基于已填充的artist_title列更新album_title列。Python的SQLite更新列

我可以做与环中reapeatdly最后ALBUM_TITLE整个ALBUM_TITLE列的更新: 的标签在专辑:

for album in tag: 
    cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

    for artist in artists: 
     artist = artist.string   
     cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,))   
     cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 

或者,我可以让只有拥有正确ALBUM_TITLE最后一行更新。

for tag in albums: 

    for album in tag: 
     cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

     for artist in artists: 
      artist = artist.string   
      cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,)) 

     cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 

我明白为什么这些问题正在发生,但我不能工作,如何实现我想要的东西 - 用正确的专辑名称更新每一行。 album_title的名称将始终与artist_name的顺序相同。

我已经看到更新列在这里被广泛覆盖,但是我不能解决这个问题,因为我有自己的纠结的独特的循环。 如果我的问题是因为我的数据检索结构糟糕,很高兴听到如何解决它。

整个代码:

from urllib.request import Request, urlopen 
from urllib.parse import urlparse 
from urllib.parse import urljoin 
from bs4 import BeautifulSoup 

import urllib.error 
import sqlite3 
import json 
import time 
import ssl 


#connect/create database 
conn = sqlite3.connect('pitchscraper.sqlite') 
#create way to talk to database 
cur = conn.cursor() 

#create table 
cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Master (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE, artist_name TEXT UNIQUE)''') 

cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Albums (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE)''') 

cur.execute(''' 
    CREATE TABLE IF NOT EXISTS Artists (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, artist_name TEXT UNIQUE, album_title TEXT, FOREIGN KEY(album_title) REFERENCES Albums(album_title))''') 



#open and read page 
req = Request('http://pitchfork.com/reviews/albums/?page=1', headers={'User-Agent': 'Mozilla/5.0'}) 
pitchpage = urlopen(req).read() 


#parse with beautiful soup 
soup = BeautifulSoup(pitchpage, "lxml") 
albums = soup('h2') 
artists = soup.find_all(attrs={"class" : "artist-list"}) 


for tag in albums: 

    for album in tag: 
     cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album,)) 

     for artist in artists: 
      artist = artist.string   
      cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist,))   
      cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist)) 


print() 


conn.commit() 

失败输出:

+------+-------------------------------------------+-------------+ 
| id |    artist_name    | album_title | 
+------+-------------------------------------------+-------------+ 
| "1" | "Sylvan Esso"        | "Odd Hours" | 
| "2" | "Mew"          | "Odd Hours" | 
| "3" | "Tara Jane O’Neil"      | "Odd Hours" | 
| "4" | "Real Life Buildings"      | "Odd Hours" | 
| "5" | "Bruce Springsteen and the E Street Band" | "Odd Hours" | 
| "6" | "Ravyn Lenae"        | "Odd Hours" | 
| "7" | "Tee Grizzley"       | "Odd Hours" | 
| "8" | "Shugo Tokumaru"       | "Odd Hours" | 
| "9" | "Woods"         | "Odd Hours" | 
| "10" | "Formation"        | "Odd Hours" | 
| "11" | "Valgeir Sigurðsson"      | "Odd Hours" | 
| "12" | "Caddywhompus"       | "Odd Hours" | 
+------+-------------------------------------------+-------------+ 

所需的输出:

+------+-------------------------------------------+-------------------------------+ 
| id |    artist_name    |   album_title   | 
+------+-------------------------------------------+-------------------------------+ 
| "1" | "Sylvan Esso"        | "What Now"     | 
| "2" | "Mew"          | "Visuals"      | 
| "3" | "Tara Jane O’Neil"      | "Tara Jane O'Neil"   | 
| "4" | "Real Life Buildings"      | "Significant Weather"   | 
| "5" | "Bruce Springsteen and the E Street Band" | "Hammersmirth Odeon, London" | 
| "6" | "Ravyn Lenae"        | "Midnight Moonlight EP"  | 
| "7" | "Tee Grizzley"       | "My Moment"     | 
| "8" | "Shugo Tokumaru"       | "TOSS"      | 
| "9" | "Woods"         | "Love is Love"    | 
| "10" | "Formation"        | "Look at the Powerful People" | 
| "11" | "Valgeir Sigurðsson"      | "Dissonance"     | 
| "12" | "Caddywhompus"       | "Odd Hours"     | 
+------+-------------------------------------------+-------------------------------+ 
+0

显示一些示例数据和期望的结果。 –

+0

@CL。我为你添加了2个截图。 –

+0

显示所需的结果。 (请参阅[如何 格式化堆栈溢出中的SQL表格 ?](https://meta.stackexchange.com/q/96125)) –

回答

0
albums = soup('h2') 
artists = soup.find_all(attrs={"class" : "artist-list"}) 

的问题是artists列表包含所有艺术家。

您必须从每个专辑中提取循环内的艺术家列表。

+0

不知道我明白了,请详细介绍一下吗? –