2011-10-24 74 views
8

我有一个Scrapy项目,我试图将输出项保存为来自Django模型定义(我没有使用DjangoItem)的对象。从Scrapy项目保存Django模型

我正在导入Django设置,如指定here

def setup_django_env(path): 
    import imp, os 
    from django.core.management import setup_environ 

    f, filename, desc = imp.find_module('settings', [path]) 
    project = imp.load_module('settings', f, filename, desc)  

    setup_environ(project) 

setup_django_env(PATH_TO_DJANGO_PROJECT) 

在我的Scrapy项目中,我有一个进程在结束所有项目并将其保存到一个数据库中的管道类:

from my_django_project.apps.my_books.models import Book, Category, Image 

class DjangoPipeline(object): 

    def process_item(self, item, spider): 
     category = Category.objects.get(name='Horror') 
     book = Book(name='something', category=category) 
     book.save() 
     image = Image(name='something', book=book) 
     image.save() 
     return item 

然而,奇怪的事情发生,在第一个项目,我得到一个错误(见下文)。对于其余的项目一切都很好。假设我有7个项目要保存,所以我在第一个中得到一个错误,其他6个保存。

Traceback (most recent call last): 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/middleware.py", line 54, in _process_chain 
    return process_chain(self.methods[methodname], obj, *args) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/scrapy/utils/defer.py", line 65, in process_chain 
    d.callback(input) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 243, in callback 
    self._startRunCallbacks(result) 
    File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 312, in _startRunCallbacks 
    self._runCallbacks() 
--- <exception caught here> --- 
    File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks 
    self.result = callback(self.result, *args, **kw) 
    File "https://stackoverflow.com/users/ale/djcode/books/lib/scraper/scraper/djangopipeline.py", line 34, in process_item 
    selected_category = Category.objects.get(name='Horror') 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/manager.py", line 132, in get 
    return self.get_query_set().get(*args, **kwargs) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 333, in get 
    clone = self.filter(*args, **kwargs) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 550, in filter 
    return self._filter_or_exclude(False, *args, **kwargs) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/query.py", line 568, in _filter_or_exclude 
    clone.query.add_q(Q(*args, **kwargs)) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1131, in add_q 
    can_reuse=used_aliases) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1026, in add_filter 
    negate=negate, process_extras=process_extras) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1182, in setup_joins 
    field, model, direct, m2m = opts.get_field_by_name(name) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 291, in get_field_by_name 
    cache = self.init_name_map() 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 321, in init_name_map 
    for f, model in self.get_all_related_m2m_objects_with_model(): 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 396, in get_all_related_m2m_objects_with_model 
    cache = self._fill_related_many_to_many_cache() 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/options.py", line 410, in _fill_related_many_to_many_cache 
    for klass in get_models(): 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 167, in get_models 
    self._populate() 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 61, in _populate 
    self.load_app(app_name, True) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/db/models/loading.py", line 76, in load_app 
    app_module = import_module(app_name) 
    File "https://stackoverflow.com/users/ale/virtualenvs/books/lib/python2.6/site-packages/django/utils/importlib.py", line 35, in import_module 
    __import__(name) 
exceptions.ImportError: No module named my_books 

如果我做这样的事情,所有的7个项目得到保存:

from my_django_project.apps.my_app.models import Book, Category, Image 

class DjangoPipeline(object): 

    def process_item(self, item, spider): 
     try: 
      category = Category.objects.get(name='something') 
     except: 
      category = Category.objects.get(name='something') 
     book = Book(name='something', category=category) 
     try: 
      book.save() 
     except: 
      book.save() 
     image = Image(name='something', book=book) 
     try: 
      image.save() 
     except: 
      image.save() 
     return item 

我不知道我做错了。请有人帮助我吗?

谢谢!

+0

当你引用my_django_project时,你真的是指那个引用还是用你的项目的名字替换那个引用,比如mysite.apps import *? – emschorsch

+0

我用我的项目的名称替换该参考:) – Alex

+0

嗨亚历克斯,我试图做你做了什么,有问题。似乎你已经明白了这一点,所以我希望你愿意看看我的[问题](http://stackoverflow.com/questions/14686223/scrapy-project-cant-find-django-core-management)并提供建议。谢谢! – GChorn

回答

4

我有同样的问题,我找到了解决方案。至少,它对我有用。

在我的情况下,问题出在Django项目的setting.py文件中 - 我没有将我的应用的FQN (fully qualified name)添加到INSTALLED_APPS元组,但它是简称。

说起你的例子,它可能是你添加到INSTALLED_APPSmy_books元素,但不是my_django_project.apps.my_books

+0

感谢您的答案。我会尽快与代码一起尝试。 – Alex

0

我记得__init__.py文件丢失会导致一些奇怪的问题。你是否拥有所有的模块?

+0

是的,我已经在我的所有模块:) – Alex