我有一个Django 1.1应用程序需要每天从一些大json文件导入数据。为了给出一个想法,其中一个文件超过100 Mb,并有90K条目导入Postgresql数据库。在Django中优化Postgresql数据库写入的性能?
我遇到的问题是要导入数据需要很长时间,即按小时数量。我预料到将这些条目写入数据库需要一些时间,但肯定不会那么长,这使我认为我在做一些本质上错误的事情。我已经阅读过类似的stackexchange问题,并且提出的解决方案建议使用transaction.commit_manually
或transaction.commit_on_success
装饰器分批提交而不是每个.save()
,我已经在这样做。
正如我所说,我想知道如果我做错了任何事情(例如批量提交太大?,太多外键?),或者我是否应该从Django模型中退出此功能并直接使用DB API。任何想法或建议?
这里是我处理导入数据时与基本款(我已经删除了一些字段的原代码为简单起见)
class Template(models.Model):
template_name = models.TextField(_("Name"), max_length=70)
sourcepackage = models.TextField(_("Source package"), max_length=70)
translation_domain = models.TextField(_("Domain"), max_length=70)
total = models.IntegerField(_("Total"))
enabled = models.BooleanField(_("Enabled"))
priority = models.IntegerField(_("Priority"))
release = models.ForeignKey(Release)
class Translation(models.Model):
release = models.ForeignKey(Release)
template = models.ForeignKey(Template)
language = models.ForeignKey(Language)
translated = models.IntegerField(_("Translated"))
而这里的代码位的是似乎采取年龄完成:
@transaction.commit_manually
def add_translations(translation_data, lp_translation):
releases = Release.objects.all()
# There are 5 releases
for release in releases:
# translation_data has about 90K entries
# this is the part that takes a long time
for lp_translation in translation_data:
try:
language = Language.objects.get(
code=lp_translation['language'])
except Language.DoesNotExist:
continue
translation = Translation(
template=Template.objects.get(
sourcepackage=lp_translation['sourcepackage'],
template_name=lp_translation['template_name'],
translation_domain=\
lp_translation['translation_domain'],
release=release),
translated=lp_translation['translated'],
language=language,
release=release,
)
translation.save()
# I realize I should commit every n entries
transaction.commit()
# I've also got another bit of code to fill in some data I'm
# not getting from the json files
# Add missing templates
languages = Language.objects.filter(visible=True)
languages_total = len(languages)
for language in languages:
templates = Template.objects.filter(release=release)
for template in templates:
try:
translation = Translation.objects.get(
template=template,
language=language,
release=release)
except Translation.DoesNotExist:
translation = Translation(template=template,
language=language,
release=release,
translated=0,
untranslated=0)
translation.save()
transaction.commit()
看看这个最近的答案,这可能有一些有用的通用技巧。 http://stackoverflow.com/questions/9407442/optimise-postgresql-for-fast-testing/9407940#comment11914305_9407940 – 2012-02-24 03:11:00
问题的第二部分被分解为一个[后续问题](http:// stackoverflow。 com/q/9447506/939860) – 2012-03-05 01:36:39