我已经下载了txt。来自Kenneth R. French图书馆的文件,可通过链接http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_48_ind_port.html找到。如何在R中将一个数据帧转换为另一个数据帧?
我需要使用这些所谓的SIC代码根据行业因素将我的样本分为不同的投资组合。下载的文件是这样的:
1 Food
0100-0199 Agric production - crops
0200-0299 Agric production - livestock
0700-0799 Agricultural services
0900-0999 Fishing, hunting & trapping
2000-2009 Food and kindred products
2010-2019 Meat products
2020-2029 Dairy products
2030-2039 Canned-preserved fruits-vegs
2040-2046 Flour and other grain mill products
2047-2047 Dog and cat food
2048-2048 Prepared feeds for animals
2050-2059 Bakery products
2060-2063 Sugar and confectionery products
2064-2068 Candy and other confectionery
2070-2079 Fats and oils
2080-2080 Beverages
2082-2082 Malt beverages
2083-2083 Malt
2084-2084 Wine
2085-2085 Distilled and blended liquors
2086-2086 Bottled-canned soft drinks
2087-2087 Flavoring syrup
2090-2092 Misc food preps
2095-2095 Roasted coffee
2096-2096 Potato chips
2097-2097 Manufactured ice
2098-2099 Misc food preparations
5140-5149 Wholesale - groceries & related prods
5150-5159 Wholesale - farm products
5180-5182 Wholesale - beer, wine
5191-5191 Wholesale - farm supplies
2 Mines
1000-1009 Metal mining
1010-1019 Iron ores
1020-1029 Copper ores
1030-1039 Lead and zinc ores
1040-1049 Gold & silver ores
1060-1069 Ferroalloy ores
1080-1089 Mining services
1090-1099 Misc metal ores
1200-1299 Bituminous coal
1400-1499 Mining and quarrying non-metalic minerals
5050-5052 Wholesale - metals and minerals
3 Oil
1300-1300 Oil and gas extraction
1310-1319 Crude petroleum & natural gas
1320-1329 Natural gas liquids
1380-1380 Oil and gas field services
1381-1381 Drilling oil & gas wells
1382-1382 Oil-gas field exploration
1389-1389 Oil and gas field services
2900-2912 Petroleum refining
5170-5172 Wholesale - petroleum and petro prods
4 Clths
2200-2269 Textile mill products
2270-2279 Floor covering mills
2280-2284 Yarn and thread mills
2290-2295 Misc textile goods
2296-2296 Tire cord and fabric
2297-2297 Nonwoven fabrics
2298-2298 Cordage and twine
2299-2299 Misc textile products
2300-2390 Apparel and other finished products
2391-2392 Curtains, home furnishings
2393-2395 Textile bags, canvas products
2396-2396 Auto trim
2397-2399 Misc textile products
3020-3021 Rubber and plastics footwear
3100-3111 Leather tanning and finishing
3130-3131 Boot, shoe cut stock, findings
3140-3149 Footware except rubber
3150-3151 Leather gloves and mittens
3963-3965 Fasteners, buttons, needles, pins
5130-5139 Wholesale - apparel
我想要做的事情是创建数据帧,其中第一列给出了行业的域名(例如,食品,采矿和矿物等)和第二列中列出了与这个行业相关的所有SIC代码(标准工业代码)(因为大多数SIC代码是以5130-5139的方式给出的,这使得它更难一些)。
这个数据框会让我的分析更容易实现。
任何建议将是非常可观的。
我会考虑像谷歌瑞风(离线和免费的)真实数据预处理工具。 R并不适合这类任务,即使你可以用R来完成,但是会带来更多的痛苦。 – ATN
我认为使用其他程序来处理这个问题更好,因为你的数据看起来不像数据框(你有像“4 Clths”之类的东西)。不是一种非常有效的方法,但是您可以手动执行此操作。我可以看到所有的SIC代码都是以xxxx-xxxx的形式出现的,后面跟着一个空格。所以如果你使用sep =“”来读取文件,那么第一列应该是你的SIC代码,第二列应该是你的行业名称(我不确定是否所有的名字都是单个字符串,在你的例子中,他们是) ,剩下的就是他们卖的东西了? –