2012-07-08 78 views
0

我写了一个蜘蛛,需要将项目存储在SQLite3数据库中,但每次出现错误。请帮助我,因为我现在卡住了!这是蜘蛛代码:Scrapy SQLite3错误?

response_fld=response.url 
text_input=hxs.select("//input[(@id or @name) and (@type = 'text')]/@id ").extract() 
pass_input=hxs.select("//input[(@id or @name) and (@type = 'password')]/@id").extract()  
file_input=hxs.select("//input[(@id or @name) and (@type = 'file')]/@id").extract() 

输出JSON格式:

{"pass_input": ["tbPassword"], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/Signup.aspx", "text_input": ["tbUsername"]} 
{"pass_input": [], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/default.aspx", "text_input": []} 
{"pass_input": ["tbPassword"], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/login.aspx", "text_input": ["tbUsername"]} 
{"pass_input": [], "file_input": [], "response_fld": "http://testaspnet.vulnweb.com/Comments.aspx?id=0", "text_input": []} 

管道代码:

import sqlite3 
from os import path 

class SQLiteStorePipeline(object): 

    def __init__(self): 
     self.conn = sqlite3.connect('./project.db') 
     self.cur = self.conn.cursor() 

    def process_item(self, domain, item): 
     self.cur.execute("insert into links (link) values(item['response_fld'][0]") 
     self.cur.execute("insert into inputs (input_name) values(item['text_input'][0];") 
     self.cur.execute("insert into inputs (input_name) values(item['pass_input'][0];") 
     self.cur.execute("insert into inputs (input_name) values(item['file_input'][0];") 
     self.conn.commit() 
     return item 

    def handle_error(self, e): 
     log.err(e) 

错误:

File "/home/abdallah/isa/isa/pipelines.py", line 22, in process_item 
    self.cur.execute("insert into links (link) values(item['response_fld'][0]") 
sqlite3.OperationalError: near "['response_fld']": syntax error 

数据库方案:

CREATE TABLE "Targets" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "domain" TEXT); 

CREATE TABLE "Links" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "link" TEXT, "target" INT, FOREIGN KEY (target) REFERENCES Targets(id)); 

CREATE TABLE "Input_Types" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "type" TEXT); 

CREATE TABLE "Inputs" ("id" INTEGER PRIMARY KEY AUTOINCREMENT, "input_name" TEXT, "link_id" INT, "input_type" INT, FOREIGN KEY (input_type) REFERENCES Input_Types(id)); 

回答

0

这是您的无效SQL查询:

self.cur.execute("insert into links (link) values(item['response_fld'][0]") 

按照该docs,你应该做的:

self.cur.execute("insert into links (link) values(?)", (item['response_fld'][0],)) 
+0

self.cur.execute(“插入链接(链接)值(?)“,(item ['response_fld'] [0],)) \t exceptions.TypeError:'MySpider'对象没有属性'__getitem__' – 2012-07-08 17:34:48

+0

@ right.sowrd,这是一个不同的问题。你的'process_item'签名是错误的。根据文档,它应该是'def process_item(self,item,spider)'而不是'def process_item(self,domain,item)' – warvariuc 2012-07-08 18:01:03

+0

非常感谢,它适用于某些,但仍然有错误'first:only' h是什么存储在数据库中,'当试图存储其他项目时,我得到了'exceptions.IndexError:列表索引超出范围' – 2012-07-08 18:17:50