2017-12-18 92 views
0

假设这样的文字:检索数据结构的多个层级

In [1]: import re 
In [2]: with open('text.md', 'r') as f: 
    ...:  cont = f.read() 
In [3]: cont 
Out[3]: '- ## First steps[¶](https://docs.djangoproject.com/en/2.0/#first-steps)\n\n Are you new to Django or to programming? This is the place to start!\n\n - **From scratch:** [Overview](https://docs.djangoproject.com/en/2.0/intro/overview/) | [Installation](https://docs.djangoproject.com/en/2.0/intro/install/)\n - **Tutorial:** [Part 1: Requests and responses](https://docs.djangoproject.com/en/2.0/intro/tutorial01/) | [Part 2: Models and the admin site](https://docs.djangoproject.com/en/2.0/intro/tutorial02/) | [Part 3: Views and templates](https://docs.djangoproject.com/en/2.0/intro/tutorial03/) | [Part 4: Forms and generic views](https://docs.djangoproject.com/en/2.0/intro/tutorial04/) | [Part 5: Testing](https://docs.djangoproject.com/en/2.0/intro/tutorial05/) | [Part 6: Static files](https://docs.djangoproject.com/en/2.0/intro/tutorial06/) | [Part 7: Customizing the admin site](https://docs.djangoproject.com/en/2.0/intro/tutorial07/)\n - **Advanced Tutorials:** [How to write reusable apps](https://docs.djangoproject.com/en/2.0/intro/reusable-apps/) | [Writing your first patch for Django](https://docs.djangoproject.com/en/2.0/intro/contributing/)\n\n ## The model layer[¶](https://docs.djangoproject.com/en/2.0/#the-model-layer)\n\n Django provides an abstraction layer (the “models”) for structuring and manipulating the data of your Web application. Learn more about it below:\n\n - **Models:** [Introduction to models](https://docs.djangoproject.com/en/2.0/topics/db/models/) | [Field types](https://docs.djangoproject.com/en/2.0/ref/models/fields/) | [Indexes](https://docs.djangoproject.com/en/2.0/ref/models/indexes/) | [Meta options](https://docs.djangoproject.com/en/2.0/ref/models/options/) | [Model class](https://docs.djangoproject.com/en/2.0/ref/models/class/)\n - **QuerySets:** [Making queries](https://docs.djangoproject.com/en/2.0/topics/db/queries/) | [QuerySet method reference](https://docs.djangoproject.com/en/2.0/ref/models/querysets/) | [Lookup expressions](https://docs.djangoproject.com/en/2.0/ref/models/lookups/)\n - **Model instances:** [Instance methods](https://docs.djangoproject.com/en/2.0/ref/models/instances/) | [Accessing related objects](https://docs.djangoproject.com/en/2.0/ref/models/relations/)\n - **Migrations:** [Introduction to Migrations](https://docs.djangoproject.com/en/2.0/topics/migrations/) | [Operations reference](https://docs.djangoproject.com/en/2.0/ref/migration-operations/) | [SchemaEditor](https://docs.djangoproject.com/en/2.0/ref/schema-editor/) | [Writing migrations](https://docs.djangoproject.com/en/2.0/howto/writing-migrations/)\n - **Advanced:** [Managers](https://docs.djangoproject.com/en/2.0/topics/db/managers/) | [Raw SQL](https://docs.djangoproject.com/en/2.0/topics/db/sql/) | [Transactions](https://docs.djangoproject.com/en/2.0/topics/db/transactions/) | [Aggregation](https://docs.djangoproject.com/en/2.0/topics/db/aggregation/) | [Search](https://docs.djangoproject.com/en/2.0/topics/db/search/) | [Custom fields](https://docs.djangoproject.com/en/2.0/howto/custom-model-fields/) | [Multiple databases](https://docs.djangoproject.com/en/2.0/topics/db/multi-db/) | [Custom lookups](https://docs.djangoproject.com/en/2.0/howto/custom-lookups/) |[Query Expressions](https://docs.djangoproject.com/en/2.0/ref/models/expressions/) | [Conditional Expressions](https://docs.djangoproject.com/en/2.0/ref/models/conditional-expressions/) | [Database Functions](https://docs.djangoproject.com/en/2.0/ref/models/database-functions/)\n - **Other:** [Supported databases](https://docs.djangoproject.com/en/2.0/ref/databases/) | [Legacy databases](https://docs.djangoproject.com/en/2.0/howto/legacy-databases/) | [Providing initial data](https://docs.djangoproject.com/en/2.0/howto/initial-data/) | [Optimize database access](https://docs.djangoproject.com/en/2.0/topics/db/optimization/) | [PostgreSQL specific features](https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/)' 

它章节是由检索,

In [9]: chapters = re.findall(r'## (.+)\[', cont) 
In [10]: chapters 
Out[10]: ['First steps', 'The model layer'] 

它部分由获得,

In [21]: sections = re.findall(r'- \*\*(.+)\*\*',cont) 
In [23]: sections 
Out[23]: 
['From scratch:', 
'Tutorial:', 
'Advanced Tutorials:', 
'Models:', 
'QuerySets:', 
'Model instances:', 
'Migrations:', 
'Advanced:', 
'Other:'] 

我想喜欢输出如下数据结构:

['First steps',['From scratch:', 
       'Tutorial:', 
       'Advanced Tutorials:'], 
'The model layer',['Models:', 
       'QuerySets:', 
       'Model instances:', 
       'Migrations:', 
       'Advanced:', 
       'Other:']] 

如何实现这样的任务?

回答

1

找到既章节simultanously:

>>> content = re.findall(r'## (.+)\[|- \*\*(.+)\*\*', cont) 

然后把它们放在你想要的结构:

>>> structure = [] 
>>> for c, s in results: 
     if c: 
      structure.extend([c, []]) 
     elif s: 
      structure[-1].append(s) 

这导致:

>>> structure 
['First steps', ['From scratch:', 'Tutorial:', 'Advanced Tutorials:'], 'The model layer', ['Models:', 'QuerySets:', 'Model instances:', 'Migrations:', 'Advanced:', 'Other:']]