2016-11-22 40 views
0

我正在处理一些测试数据,我格式化文本,清理它,标记它并获取字数。Python转置列表将数据写入一个csv

这里有一些数据

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8") 

data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Many Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8") 

data2 = unicode("Tropical forests cover many a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8") 

后,我处理的代码,我有话短语长度和他们的计数的数组。然后我有这个代码将数据打印到一个文件。

with open("rainforest.txt", "w+") as of: 
    for i in range(sample_data_items): 
     for index, item in enumerate(payload_data[i]): 
      _str = "[%04d]\t[%s]"%(index, item) 
      of.write(_str.encode("utf-8")+"\n") 
of.close() 

这让我看起来像这样的文件:

[0000] [(u'10', 2)] 
[0001] [(u'two', 1)] 
[0002] [(u'belize', 1)] 
[0003] [(u'eastern', 1)] 
[0004] [(u'98', 1)] 
[0005] [(u'exceed', 1)] 
[0006] [(u'destroyed', 1)] 
[0007] [(u'part', 2)] 
[0008] [(u'japan', 1)] 
[0009] [(u'south', 5)] 
[0010] [(u'g', 44)] 
[0011] [(u'new', 2)] 
[0012] [(u'washington', 1)] 
[0013] [(u'indigenous', 1)] 
[0014] [(u'production', 1)] 
[0015] [(u'necessary', 1)] 
[0016] [(u'equatorial', 1)] 
[0017] [(u'europe', 1)] 
[0018] [(u'some', 2)] 
[0019] [(u'equator', 2)] 
[0020] [(u'have', 2)] 
[0021] [(u'restricted', 1)] 
[0022] [(u'along', 1)] 
[0023] [(u'level', 1)] 
[0024] [(u'dense', 1)] 
[0025] [(u'lungs', 1)] 
[0026] [(u'korea', 1)] 
[0027] [(u'tangled', 1)] 
[0028] [(u'peninsula', 1)] 
[0029] [(u'zone', 2)] 
[0030] [(u'called', 2)] 
[0031] [(u'monsoon', 2)] 
[0032] [(u'jungle', 1)] 
[0033] [(u'such', 2)] 
[0034] [(u'significant', 1)] 
[0035] [(u'warm', 1)] 
[0036] [(u'around', 2)] 
[0037] [(u'only', 1)] 
[0038] [(u'largest', 1)] 
[0039] [(u'zealand', 1)] 
[0040] [(u'shrubs', 1)] 
[0041] [(u'still', 1)] 
[0042] [(u'peten', 1)] 
[0043] [(u'island', 2)] 
[0044] [(u'china', 1)] 
[0045] [(u'characterized', 2)] 
[0046] [(u'64', 1)] 
[0047] [(u'malaysia', 1)] 
[0048] [(u'species', 1)] 
[0049] [(u'high', 2)] 
[0050] [(u'growth', 2)] 
[0051] [(u'during', 1)] 
[0052] [(u'no', 11)] 
[0053] [(u'occur', 1)] 
[0054] [(u'creating', 1)] 
[0055] [(u'north', 3)] 
[0056] [(u'calakmul', 1)] 
[0057] [(u'adriatic', 1)] 
[0058] [(u'175', 1)] 
[0059] [(u'taiwan', 1)] 
[0060] [(u'may', 1)] 
[0061] [(u'western', 1)] 
[0062] [(u'annual', 2)] 
[0063] [(u'degrees', 1)] 
[0064] [(u'moist', 1)] 
[0065] [(u'climatic', 1)] 
[0066] [(u'net', 2)] 
[0067] [(u'trees', 1)] 
[0068] [(u'oxygen', 2)] 
[0069] [(u'typically', 1)] 
[0070] [(u'e', 150)] 
[0071] [(u'georgia', 1)] 
[0072] [(u'wet', 1)] 
[0073] [(u'5', 5)] 
[0074] [(u'66', 1)] 
[0075] [(u'390', 1)] 
[0076] [(u'be', 9)] 
[0077] [(u'turkey', 1)] 
[0078] [(u'applied', 1)] 
[0079] [(u'australia', 2)] 
[0080] [(u'alaska', 1)] 
[0081] [(u'black', 1)] 
[0082] [(u'coast', 2)] 
[0083] [(u'168', 1)] 
[0084] [(u'southeast', 1)] 
[0085] [(u'7', 5)] 
[0086] [(u'located', 1)] 
[0087] [(u'regions', 1)] 
[0088] [(u'months', 1)] 
[0089] [(u'parts', 1)] 
[0090] [(u'sub', 2)] 
[0091] [(u'east', 3)] 
[0092] [(u'ground', 1)] 
[0093] [(u'forests', 6)] 
[0094] [(u'much', 1)] 
[0095] [(u'a', 160)] 
[0096] [(u'soon', 3)] 
[0097] [(u'450', 1)] 
[0098] [(u'if', 5)] 
[0099] [(u'4', 4)] 
[0100] [(u'consuming', 1)] 
[0101] [(u'addition', 1)] 
[0102] [(u'jewels', 1)] 
[0103] [(u'within', 1)] 
[0104] [(u'1', 8)] 
[0105] [(u'areas', 2)] 
[0106] [(u'northwest', 1)] 
[0107] [(u'california', 1)] 
[0108] [(u'chile', 1)] 
[0109] [(u'balkans', 1)] 
[0110] [(u'plants', 1)] 
[0111] [(u'on', 34)] 
[0112] [(u'for', 10)] 
[0113] [(u'season', 1)] 
[0114] [(u'less', 1)] 
[0115] [(u'exist', 1)] 
[0116] [(u'sakhalin', 1)] 
[0117] [(u'trough', 2)] 
[0118] [(u'in', 30)] 
[0119] [(u'earth', 2)] 
[0120] [(u'from', 2)] 
[0121] [(u'also', 3)] 
[0122] [(u'america', 2)] 
[0123] [(u'75', 2)] 
[0124] [(u'coastal', 1)] 
[0125] [(u'guinea', 1)] 
[0126] [(u'rainfall', 2)] 
[0127] [(u'been', 2)] 
[0128] [(u'alternatively', 1)] 
[0129] [(u'undiscovered', 1)] 
[0130] [(u'role', 1)] 
[0131] [(u'large', 2)] 
[0132] [(u'russian', 1)] 
[0133] [(u'bosawas', 1)] 
[0134] [(u'islands', 1)] 
[0135] [(u'el', 7)] 
[0136] [(u'microorganisms', 1)] 
[0137] [(u'rainforest', 5)] 
[0138] [(u'asia', 2)] 
[0139] [(u'200', 1)] 
[0140] [(u'or', 27)] 
[0141] [(u'category', 1)] 
[0142] [(u'6', 5)] 
[0143] [(u'africa', 1)] 
[0144] [(u'capricorn', 1)] 
[0145] [(u'2', 4)] 
[0146] [(u'year', 1)] 
[0147] [(u'tropic', 6)] 
[0148] [(u'79', 1)] 
[0149] [(u'with', 3)] 
[0150] [(u'highlands', 1)] 
[0151] [(u'british', 1)] 
[0152] [(u'can', 5)] 
[0153] [(u'of', 3)] 
[0154] [(u'adjacent', 1)] 
[0155] [(u'world', 3)] 
[0156] [(u'indonesia', 1)] 
[0157] [(u'cover', 3)] 
[0158] [(u'poor', 1)] 
[0159] [(u'sea', 2)] 
[0160] [(u'28', 1)] 
[0161] [(u'to', 7)] 
[0162] [(u'many', 3)] 
[0163] [(u'yucatan', 1)] 
[0164] [(u'oregon', 1)] 
[0165] [(u'pharmacy', 1)] 
[0166] [(u'cm', 1)] 
[0167] [(u'contribute', 1)] 
[0168] [(u'conditions', 1)] 
[0169] [(u'thinned', 1)] 
[0170] [(u'medicines', 1)] 
[0171] [(u'temperatures', 1)] 
[0172] [(u'one', 4)] 
[0173] [(u'cameroon', 1)] 
[0174] [(u'although', 1)] 
[0175] [(u'associated', 1)] 
[0176] [(u'are', 5)] 
[0177] [(u'dry', 1)] 
[0178] [(u'hawaii', 1)] 
[0179] [(u'centimetres', 1)] 
[0180] [(u'philippines', 1)] 
[0181] [(u'canopy', 1)] 
[0182] [(u'including', 1)] 
[0183] [(u'177', 1)] 
[0184] [(u'responsible', 1)] 
[0185] [(u'misnamed', 1)] 
[0186] [(u'atmosphere', 1)] 
[0187] [(u'globe', 1)] 
[0188] [(u'the', 10)] 
[0189] [(u'leaf', 1)] 
[0190] [(u'turnover', 1)] 
[0191] [(u'plays', 1)] 
[0192] [(u'degreef', 1)] 
[0193] [(u'amazon', 1)] 
[0194] [(u'burma', 1)] 
[0195] [(u'southern', 2)] 
[0196] [(u'over', 5)] 
[0197] [(u'has', 1)] 
[0198] [(u'69', 1)] 
[0199] [(u'250', 1)] 
[0200] [(u'galicia', 1)] 
[0201] [(u'climate', 1)] 
[0202] [(u'isles', 1)] 
[0203] [(u's', 112)] 
[0204] [(u'processing', 1)] 
[0205] [(u'found', 1)] 
[0206] [(u'temperate', 2)] 
[0207] [(u'biotic', 1)] 
[0208] [(u'respiration', 1)] 
[0209] [(u'broader', 1)] 
[0210] [(u'substantial', 1)] 
[0211] [(u'columbia', 1)] 
[0212] [(u'cancer', 1)] 
[0213] [(u'types', 1)] 
[0214] [(u'tropical', 5)] 
[0215] [(u'rainforests', 3)] 
[0216] [(u'penetration', 1)] 
[0217] [(u'discovered', 2)] 
[0218] [(u'few', 1)] 
[0219] [(u'photosynthesis', 2)] 
[0220] [(u'sometimes', 1)] 
[0221] [(u'by', 2)] 
[0222] [(u'intertropical', 2)] 
[0223] [(u'small', 1)] 
[0224] [(u'ireland', 1)] 
[0225] [(u'but', 2)] 
[0226] [(u'central', 1)] 
[0227] [(u'millions', 1)] 
[0228] [(u'quarter', 1)] 
[0229] [(u'generally', 1)] 
[0230] [(u'all', 9)] 
[0231] [(u'it', 9)] 
[0232] [(u'insects', 1)] 
[0233] [(u'natural', 1)] 
[0234] [(u'colonized', 1)] 
[0235] [(u'than', 1)] 
[0236] [(u'is', 14)] 
[0237] [(u'norway', 1)] 
[0238] [(u'average', 1)] 
[0239] [(u'location', 1)] 
[0240] [(u'they', 1)] 
[0241] [(u'000', 1)] 
[0242] [(u'now', 3)] 
[0243] [(u'little', 1)] 
[0244] [(u'estimated', 1)] 
[0245] [(u'beneath', 1)] 
[0246] [(u'monthly', 1)] 
[0247] [(u'as', 18)] 
[0248] [(u'known', 2)] 
[0249] [(u'lies', 1)] 
[0250] [(u'between', 2)] 
[0251] [(u'papua', 1)] 
[0252] [(u'40', 1)] 
[0253] [(u'there', 1)] 
[0254] [(u'carbon', 1)] 
[0255] [(u'sri', 1)] 
[0256] [(u'scotland', 1)] 
[0257] [(u'pacific', 2)] 
[0258] [(u'degreec', 1)] 
[0259] [(u'that', 2)] 
[0260] [(u'saharan', 1)] 
[0261] [(u'vines', 1)] 
[0262] [(u'well', 1)] 
[0263] [(u'18', 1)] 
[0264] [(u'3', 2)] 
[0265] [(u'far', 1)] 
[0266] [(u'through', 2)] 
[0267] [(u'myanmar', 1)] 
[0268] [(u'because', 1)] 
[0269] [(u'sunlight', 1)] 
[0270] [(u'term', 1)] 
[0271] [(u'dioxide', 1)] 
[0272] [(u'and', 9)] 
[0273] [(u'lanka', 1)] 
[0274] [(u'congo', 1)] 
[0275] [(u'undergrowth', 1)] 
[0276] [(u'convergence', 2)] 
[0277] [(u'mean', 1)] 
[0000] [(u'as in', 1)] 
[0001] [(u'large part', 1)] 
[0002] [(u'sea including', 1)] 
[0003] [(u'a dense', 1)] 
[0004] [(u'40 to', 1)] 
[0005] [(u'undergrowth in', 1)] 
[0006] [(u'norway parts', 1)] 
[0007] [(u'98 and', 1)] 
[0008] [(u'areas of', 2)] 
[0009] [(u'10 degrees', 1)] 
[0010] [(u'america in', 1)] 
[0011] [(u'the monsoon', 2)] 
[0012] [(u'ground beneath', 1)] 
[0013] [(u'no substantial', 1)] 
[0014] [(u'including georgia', 1)] 
[0015] [(u'and scotland', 1)] 
[0016] [(u'the jewels', 1)] 
[0017] [(u'discovered there', 1)] 
[0018] [(u's oxygen', 1)] 
[0019] [(u'and wet', 1)] 
[0020] [(u'been estimated', 1)] 
[0021] [(u'rainforests generally', 1)] 
[0022] [(u'soon colonized', 1)] 
[0023] [(u'69 in', 1)] 
[0024] [(u'associated with', 1)] 
[0025] [(u'coastal areas', 1)] 
[0026] [(u'in 6', 1)] 
[0027] [(u'rainforests around', 1)] 
[0028] [(u'in east', 1)] 
[0029] [(u'the broader', 1)] 
[0030] [(u'indonesia papua', 1)] 
[0031] [(u'penetration of', 1)] 
[0032] [(u'colonized by', 1)] 
[0033] [(u'washington oregon', 1)] 
[0034] [(u'can be', 1)] 
[0035] [(u'one quarter', 1)] 
[0036] [(u'pacific northwest', 1)] 
[0037] [(u'year 5', 1)] 
[0038] [(u'species of', 1)] 
[0039] [(u'plays a', 1)] 
[0040] [(u'the adriatic', 1)] 
[0041] [(u'photosynthesis from', 1)] 
[0042] [(u'average annual', 1)] 
[0043] [(u'cm 69', 1)] 
[0044] [(u'consuming it', 1)] 
[0045] [(u'degrees north', 1)] 
[0046] [(u'role in', 1)] 
[0047] [(u'the ground', 1)] 
[0048] [(u'columbia washington', 1)] 
[0049] [(u'and the', 2)] 
[0050] [(u'ireland and', 1)] 
[0051] [(u'creating the', 1)] 
[0052] [(u'within 10', 1)] 
[0053] [(u'africa from', 1)] 
[0054] [(u'far east', 1)] 
[0055] [(u'north america', 1)] 
[0056] [(u'the climatic', 1)] 
[0057] [(u'alternatively known', 1)] 
[0058] [(u'is destroyed', 1)] 
[0059] [(u'carbon dioxide', 1)] 
[0060] [(u'eastern black', 1)] 
[0061] [(u'rainforests only', 1)] 
[0062] [(u'and temperate', 1)] 
[0063] [(u'the coastal', 1)] 
[0064] [(u'in few', 1)] 
[0065] [(u'during all', 1)] 
[0066] [(u'the tropic', 1)] 
[0067] [(u'79 in', 1)] 
[0068] [(u'3 rainforests', 1)] 
[0069] [(u'jewels of', 1)] 
[0070] [(u'are indigenous', 1)] 
[0071] [(u'capricorn tropical', 1)] 
[0072] [(u'dense tangled', 1)] 
[0073] [(u'as well', 1)] 
[0074] [(u'new zealand', 1)] 
[0075] [(u'saharan africa', 1)] 
[0076] [(u'the adjacent', 1)] 
[0077] [(u'yucatan peninsula', 1)] 
[0078] [(u'if the', 1)] 
[0079] [(u'natural medicines', 1)] 
[0080] [(u'shrubs and', 1)] 
[0081] [(u'lungs although', 1)] 
[0082] [(u'addition to', 1)] 
[0083] [(u'restricted by', 1)] 
[0084] [(u'the globe', 1)] 
[0085] [(u'the world', 3)] 
[0086] [(u'coast as', 1)] 
[0087] [(u'earth and', 1)] 
[0088] [(u'it typically', 1)] 
[0089] [(u'burma to', 1)] 
[0090] [(u'oxygen production', 1)] 
[0091] [(u'168 cm', 1)] 
[0092] [(u'temperate regions', 1)] 
[0093] [(u'through photosynthesis', 2)] 
[0094] [(u'alaska british', 1)] 
[0095] [(u'sunlight to', 1)] 
[0096] [(u'are also', 1)] 
[0097] [(u'all months', 1)] 
[0098] [(u'found within', 1)] 
[0099] [(u'substantial dry', 1)] 
[0100] [(u'sakhalin island', 1)] 
[0101] [(u'isles such', 1)] 
[0102] [(u'rainforest tropical', 1)] 
[0103] [(u'in galicia', 1)] 
[0104] [(u'part of', 1)] 
[0105] [(u'may be', 1)] 
[0106] [(u'rainforest and', 1)] 
[0107] [(u'the year', 1)] 
[0108] [(u'british columbia', 1)] 
[0109] [(u'small trees', 1)] 
[0110] [(u'that there', 1)] 
[0111] [(u'typically lies', 1)] 
[0112] [(u'western balkans', 1)] 
[0113] [(u'cm 66', 1)] 
[0114] [(u'is soon', 1)] 
[0115] [(u'peninsula el', 1)] 
[0116] [(u'as the', 3)] 
[0117] [(u'japan and', 1)] 
[0118] [(u'southern yucatan', 1)] 
[0119] [(u'turnover sometimes', 1)] 
[0120] [(u'in australia', 1)] 
[0121] [(u'earth s', 2)] 
[0122] [(u'of capricorn', 1)] 
[0123] [(u'zealand 10', 1)] 
[0124] [(u'growth of', 1)] 
[0125] [(u'and microorganisms', 1)] 
[0126] [(u'the pacific', 2)] 
[0127] [(u'and 450', 1)] 
[0128] [(u'and california', 1)] 
[0129] [(u'by high', 1)] 
[0130] [(u'trees called', 1)] 
[0131] [(u'tropic of', 1)] 
[0132] [(u'trough alternatively', 1)] 
[0133] [(u'in some', 1)] 
[0134] [(u'of rainforest', 1)] 
[0135] [(u'from myanmar', 1)] 
[0136] [(u'it through', 1)] 
[0137] [(u'world temperate', 1)] 
[0138] [(u'australia and', 2)] 
[0139] [(u'177 in', 1)] 
[0140] [(u'the equatorial', 1)] 
[0141] [(u'myanmar burma', 1)] 
[0142] [(u'asia from', 1)] 
[0143] [(u'amazon rainforest', 1)] 
[0144] [(u'scotland southern', 1)] 
[0145] [(u'200 cm', 1)] 
[0146] [(u'the rainforests', 1)] 
[0147] [(u'season typically', 1)] 
[0148] [(u'sometimes misnamed', 1)] 
[0149] [(u'the earth', 2)] 
[0150] [(u'175 cm', 1)] 
[0151] [(u'rainforest south', 1)] 
[0152] [(u'in north', 1)] 
[0153] [(u'characterized by', 2)] 
[0154] [(u'occur in', 1)] 
[0155] [(u'the equator', 2)] 
[0156] [(u'species are', 1)] 
[0157] [(u'sub saharan', 1)] 
[0158] [(u'can exceed', 1)] 
[0159] [(u'450 centimetres', 1)] 
[0160] [(u'island and', 1)] 
[0161] [(u'by a', 2)] 
[0162] [(u'around 40', 1)] 
[0163] [(u'beneath is', 1)] 
[0164] [(u'are characterized', 1)] 
[0165] [(u'in tropical', 1)] 
[0166] [(u'some areas', 1)] 
[0167] [(u'and on', 2)] 
[0168] [(u'g bosawas', 1)] 
[0169] [(u'misnamed oxygen', 1)] 
[0170] [(u'the amazon', 1)] 
[0171] [(u'of species', 1)] 
[0172] [(u'climatic conditions', 1)] 
[0173] [(u'plants insects', 1)] 
[0174] [(u'18 degreec', 1)] 
[0175] [(u'is also', 1)] 
[0176] [(u'necessary for', 1)] 
[0177] [(u'are two', 1)] 
[0178] [(u'rainforest can', 1)] 
[0179] [(u'rainforests are', 3)] 
[0180] [(u'g the', 3)] 
[0181] [(u'equator mean', 1)] 
[0182] [(u'because over', 1)] 
[0183] [(u'types of', 1)] 
[0184] [(u'the philippines', 1)] 
[0185] [(u'respiration the', 1)] 
[0186] [(u'malaysia indonesia', 1)] 
[0187] [(u'is now', 1)] 
[0188] [(u'southern china', 1)] 
[0189] [(u'of sunlight', 1)] 
[0190] [(u'annual rainfall', 2)] 
[0191] [(u'rainforest central', 1)] 
[0192] [(u'many a', 2)] 
[0193] [(u'rainforests exist', 1)] 
[0194] [(u'through respiration', 1)] 
[0195] [(u'forests characterized', 1)] 
[0196] [(u'the term', 1)] 
[0197] [(u'of plants', 1)] 
[0198] [(u'contribute little', 1)] 
[0199] [(u's tropical', 3)] 
[0200] [(u'to 75', 1)] 
[0201] [(u'75 of', 1)] 
[0202] [(u'6 many', 1)] 
[0203] [(u'rainfall between', 1)] 
[0204] [(u'east asia', 2)] 
[0205] [(u'black sea', 1)] 
[0206] [(u'high rainfall', 1)] 
[0207] [(u'pacific islands', 1)] 
[0208] [(u'two types', 1)] 
[0209] [(u'the intertropical', 2)] 
[0210] [(u'for 28', 1)] 
[0211] [(u'north and', 1)] 
[0212] [(u'jungle the', 1)] 
[0213] [(u'64 degreef', 1)] 
[0214] [(u'america e', 1)] 
[0215] [(u'located in', 1)] 
[0216] [(u'climate with', 1)] 
[0217] [(u'largest pharmacy', 1)] 
[0218] [(u'pharmacy because', 1)] 
[0219] [(u'poor penetration', 1)] 
[0220] [(u'of taiwan', 1)] 
[0221] [(u'atmosphere through', 1)] 
[0222] [(u'moist forests', 1)] 
[0223] [(u'coastal turkey', 1)] 
[0224] [(u's largest', 1)] 
[0225] [(u'of ireland', 1)] 
[0226] [(u'conditions necessary', 1)] 
[0227] [(u'tropical forests', 2)] 
[0228] [(u'cancer and', 1)] 
[0229] [(u'are associated', 1)] 
[0230] [(u'typically found', 1)] 
[0231] [(u'as hawaii', 1)] 
[0232] [(u'temperate rainforest', 2)] 
[0233] [(u'philippines malaysia', 1)] 
[0234] [(u'mean monthly', 1)] 
[0235] [(u'turkey in', 1)] 
[0236] [(u'few regions', 1)] 
[0237] [(u'dry season', 1)] 
[0238] [(u'trough also', 1)] 
[0239] [(u'of japan', 1)] 
[0240] [(u'and new', 1)] 
[0241] [(u'southern chile', 1)] 
[0242] [(u'islands such', 1)] 
[0243] [(u'papua new', 1)] 
[0244] [(u'congo rainforest', 1)] 
[0245] [(u'sometimes applied', 1)] 
[0246] [(u'globe but', 1)] 
[0247] [(u'congo congo', 1)] 
[0248] [(u'1 000', 1)] 
[0249] [(u'wet climate', 1)] 
[0250] [(u'it is', 1)] 
[0251] [(u'there 3', 1)] 
[0252] [(u'of cancer', 1)] 
[0253] [(u'the atmosphere', 1)] 
[0254] [(u'in 1', 2)] 
[0255] [(u'over one', 1)] 
[0256] [(u'many australia', 1)] 
[0257] [(u'significant role', 1)] 
[0258] [(u'months of', 1)] 
[0259] [(u'monsoon trough', 2)] 
[0260] [(u'rainfall with', 1)] 
[0261] [(u'peten belize', 1)] 
[0262] [(u'regions they', 1)] 
[0263] [(u'el peten', 1)] 
[0264] [(u'tropical rainforests', 2)] 
[0265] [(u'cover many', 1)] 
[0266] [(u'of all', 1)] 
[0267] [(u'a significant', 1)] 
[0268] [(u'that rainforests', 1)] 
[0269] [(u'they occur', 1)] 
[0270] [(u'to the', 2)] 
[0271] [(u'lanka sub', 1)] 
[0272] [(u'been called', 2)] 
[0273] [(u'cameroon to', 1)] 
[0274] [(u'world s', 2)] 
[0275] [(u'equatorial zone', 1)] 
[0276] [(u'medicines have', 1)] 
[0277] [(u'east coast', 1)] 
[0278] [(u'also responsible', 1)] 
[0279] [(u'term jungle', 1)] 
[0280] [(u'parts of', 1)] 
[0281] [(u'dioxide and', 1)] 
[0282] [(u'in creating', 1)] 
[0283] [(u'warm and', 1)] 
[0284] [(u'millions of', 1)] 
[0285] [(u'forests cover', 1)] 
[0286] [(u'georgia and', 1)] 
[0287] [(u'galicia and', 1)] 
[0288] [(u'vines shrubs', 1)] 
[0289] [(u'china highlands', 1)] 
[0290] [(u'between 250', 1)] 
[0291] [(u'rainfall is', 1)] 
[0292] [(u'exceed 18', 1)] 
[0293] [(u'production 4', 1)] 
[0294] [(u'and tropic', 1)] 
[0295] [(u'there are', 1)] 
[0296] [(u'the western', 1)] 
[0297] [(u'rainforests 2', 1)] 
[0298] [(u'southeast asia', 1)] 
[0299] [(u'rainforests contribute', 1)] 
[0300] [(u'responsible for', 1)] 
[0301] [(u'and also', 1)] 
[0302] [(u'to tropical', 1)] 
[0303] [(u'around the', 1)] 
[0304] [(u'less than', 1)] 
[0305] [(u'little net', 1)] 
[0306] [(u'bosawas southern', 1)] 
[0307] [(u'the congo', 1)] 
[0308] [(u'7 the', 1)] 
[0309] [(u'a warm', 1)] 
[0310] [(u'california in', 1)] 
[0311] [(u'many millions', 1)] 
[0312] [(u'4 processing', 1)] 
[0313] [(u'in although', 1)] 
[0314] [(u'28 of', 1)] 
[0315] [(u'jungle is', 1)] 
[0316] [(u'with no', 1)] 
[0317] [(u'location of', 1)] 
[0318] [(u'zone between', 1)] 
[0319] [(u'korea and', 1)] 
[0320] [(u'rainforests tropical', 1)] 
[0321] [(u'between 175', 1)] 
[0322] [(u'oregon and', 1)] 
[0323] [(u'with annual', 1)] 
[0324] [(u'the location', 1)] 
[0325] [(u'sri lanka', 1)] 
[0326] [(u'to ground', 1)] 
[0327] [(u'coast in', 1)] 
[0328] [(u'taiwan much', 1)] 
[0329] [(u'all biotic', 1)] 
[0330] [(u'also known', 1)] 
[0331] [(u'northwest in', 1)] 
[0332] [(u'although it', 1)] 
[0333] [(u'zone plays', 1)] 
[0334] [(u'in south', 3)] 
[0335] [(u'or thinned', 1)] 
[0336] [(u'monthly temperatures', 1)] 
[0337] [(u'calakmul many', 1)] 
[0338] [(u'the leaf', 1)] 
[0339] [(u'centimetres 98', 1)] 
[0340] [(u'from carbon', 1)] 
[0341] [(u'on sakhalin', 1)] 
[0342] [(u'a rainforest', 1)] 
[0343] [(u'also sometimes', 1)] 
[0344] [(u'be many', 1)] 
[0345] [(u'and consuming', 1)] 
[0346] [(u'tangled growth', 1)] 
[0347] [(u'be restricted', 1)] 
[0348] [(u'but temperate', 1)] 
[0349] [(u'america southern', 1)] 
[0350] [(u'of natural', 1)] 
[0351] [(u'many of', 1)] 
[0352] [(u'and 177', 1)] 
[0353] [(u'exceed 1', 2)] 
[0354] [(u'is no', 2)] 
[0355] [(u'along the', 1)] 
[0356] [(u'66 in', 1)] 
[0357] [(u'in the', 2)] 
[0358] [(u'by poor', 1)] 
[0359] [(u'only occur', 1)] 
[0360] [(u'000 cm', 1)] 
[0361] [(u'with the', 1)] 
[0362] [(u'south of', 1)] 
[0363] [(u'rainforest the', 1)] 
[0364] [(u'quarter of', 1)] 
[0365] [(u'cm 390', 1)] 
[0366] [(u'for the', 1)] 
[0367] [(u'1 there', 1)] 
[0368] [(u'are rainforests', 1)] 
[0369] [(u'zone 7', 1)] 
[0370] [(u'destroyed or', 1)] 
[0371] [(u'well as', 1)] 
[0372] [(u'of a', 2)] 
[0373] [(u'there may', 1)] 
[0374] [(u'are located', 1)] 
[0375] [(u'regions around', 1)] 
[0376] [(u'in and', 1)] 
[0377] [(u'5 average', 1)] 
[0378] [(u'highlands of', 1)] 
[0379] [(u'and small', 1)] 
[0380] [(u'a jungle', 1)] 
[0381] [(u'and korea', 1)] 
[0382] [(u'are forests', 1)] 
[0383] [(u'have been', 2)] 
[0384] [(u'cm 79', 1)] 
[0385] [(u'much of', 1)] 
[0386] [(u'insects and', 1)] 
[0387] [(u'e g', 3)] 
[0388] [(u'the british', 1)] 
[0389] [(u'oxygen turnover', 1)] 
[0390] [(u'broader category', 1)] 
[0391] [(u'new guinea', 1)] 
[0392] [(u'rainforests have', 1)] 
[0393] [(u'also in', 1)] 
[0394] [(u'has been', 1)] 
[0395] [(u'hawaii tropical', 1)] 
[0396] [(u'250 and', 1)] 
[0397] [(u'leaf canopy', 1)] 
[0398] [(u'known as', 2)] 
[0399] [(u'adriatic coast', 1)] 
[0400] [(u'asia in', 1)] 
[0401] [(u'of vines', 1)] 
[0402] [(u'and south', 2)] 
[0403] [(u'and 200', 1)] 
[0404] [(u'temperate rainforests', 1)] 
[0405] [(u'and can', 1)] 
[0406] [(u'level if', 1)] 
[0407] [(u'europe parts', 1)] 
[0408] [(u'of tropical', 1)] 
[0409] [(u'rainforests in', 1)] 
[0410] [(u'british isles', 1)] 
[0411] [(u'in europe', 1)] 
[0412] [(u'such as', 2)] 
[0413] [(u'undiscovered in', 1)] 
[0414] [(u'southern norway', 1)] 
[0415] [(u'intertropical convergence', 2)] 
[0416] [(u'indigenous to', 1)] 
[0417] [(u'central america', 1)] 
[0418] [(u'390 in', 1)] 
[0419] [(u'forests have', 2)] 
[0420] [(u'russian far', 1)] 
[0421] [(u'thinned the', 1)] 
[0422] [(u'belize calakmul', 1)] 
[0423] [(u'of the', 3)] 
[0424] [(u'adjacent russian', 1)] 
[0425] [(u'processing it', 1)] 
[0426] [(u'convergence zone', 2)] 
[0427] [(u's lungs', 1)] 
[0428] [(u'estimated that', 1)] 
[0429] [(u'biotic species', 1)] 
[0430] [(u'forests are', 4)] 
[0431] [(u'category of', 1)] 
[0432] [(u'and coastal', 1)] 
[0433] [(u'the eastern', 1)] 
[0434] [(u'in southeast', 1)] 
[0435] [(u'ground level', 1)] 
[0436] [(u'called a', 1)] 
[0437] [(u'south america', 2)] 
[0438] [(u'microorganisms still', 1)] 
[0439] [(u'chile and', 1)] 
[0440] [(u'balkans along', 1)] 
[0441] [(u'from cameroon', 1)] 
[0442] [(u'no less', 1)] 
[0443] [(u'lies between', 1)] 
[0444] [(u'still undiscovered', 1)] 
[0445] [(u'in southern', 1)] 
[0446] [(u'degreef during', 1)] 
[0447] [(u'2 it', 1)] 
[0448] [(u'temperatures exceed', 1)] 
[0449] [(u'known that', 1)] 
[0450] [(u'canopy is', 1)] 
[0451] [(u'a large', 1)] 
[0452] [(u'in temperate', 1)] 
[0453] [(u'called the', 2)] 
[0454] [(u'in alaska', 1)] 
[0455] [(u'between the', 1)] 
[0456] [(u'oxygen addition', 1)] 
[0457] [(u'degreec 64', 1)] 
[0458] [(u'now known', 1)] 
[0459] [(u'been discovered', 1)] 
[0460] [(u'tropical moist', 1)] 
[0461] [(u'the undergrowth', 1)] 
[0462] [(u'it has', 1)] 
[0463] [(u'than 168', 1)] 
[0464] [(u'tropical rainforest', 3)] 
[0465] [(u'guinea sri', 1)] 
[0466] [(u'on many', 1)] 
[0467] [(u'net oxygen', 1)] 
[0468] [(u'exist in', 1)] 
[0469] [(u'applied to', 1)] 

我试图将数据写出到一个CSV文件,这样就可以根据短语和数量进行排序它自己计数。例如:

|  phrase  | match count | phrase length | 
------------------------------------------------------- 
    sub     2     1 
    east    3     1 
    ground    1     1 
    forest    6     1 
south america   2     2 

与喜欢,我可以再排序匹配计数和短语长度电子表格中的数据,以便能够更好地看到数据。

如何转置我现在拥有的数据,以便我可以如上所述将它写出到csv文件?

当前数据是在列表中,如下所示。

phrase_len_1 = all_data[1] 
phrase_len_2 = all_data[2] 
phrase_len_3 = all_data[3] 
phrase_len_N = all_data[N] 

回答

0

本身不需要换位。相反,制作一个(短语,匹配,len(短语))的列表,并将第三列作为排序关键字排序(see tutorial)。

首先,退出将聚集数据存储在单独的变量中;将它们放入列表,元组,字典或其他序列 - 您希望内容与原始内容类似all_data。现在...

csv_list = [row[0], row[1], len(row[0]) for row in all_data] 
csv_data = sorted(csv_list, key=lambda row: row[2]) 

请注意,如果您愿意,可以将它们组合成一行。