错误中的R

我想读R A Stata的数据集与foreign包，但是当我尝试使用读取文件读取Stata的数据：错误中的R

library(foreign) 
data <- read.dta("data.dta")

我得到了以下错误：

Error in read.dta("data.dta") : a binary read error occurred

该文件在Stata中正常工作。这个site建议将文件保存在没有标签的Stata中，然后将它读入R.通过这种解决方法，我可以将文件加载到R中，但之后我失去了标签。为什么我得到这个错误，我怎样才能将文件读入标有R的文件？另一个person发现，当他们有没有值的变量时，他们会得到这个错误。我的数据至少有一个或两个这样的变量，但我没有简单的方法来确定stata中的这些变量。这是一个包含数千个变量的非常大的文件。

来源

2013-08-24 Michael

有几种方法，以测试在Stata的missings即使你有大量的变量。见[这里]（http://www.ats.ucla.edu/stat/stata/faq/nummiss_stata.htm）。 – Metrics

用于制作文件的Stata版本可能是问题所在。仔细阅读read.dta的帮助页面，然后执行所需的任何工作来构建所需的版本。 –

在阅读Stata数据之前，您应该致电library(foreign)。

library(foreign) 
data <- read.dta("data.dta")

更新：如前所述here，

“错误消息意味着该文件被发现，而且它开始与字节正确的顺序是一个Stata的.dta文件，但东西（文件可能结束）从读什么期待阅读防止R上。“

但是，我们可能只是猜测，没有任何进一步的信息。

更新到OP的自问自答：

我已经试过这是否是使用来自塔塔，但其not.So汽车数据的情况下，应该有其他方面的原因：

* 1。 2：如果有missings变量或有数据集的标签，R read.dta将产生错误*

sysuse auto #this dataset has labels 
replace mpg=. #generates missing for mpg variable 
br in 1/10 
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign 
AMC Concord 4099  3 2.5 11 2930 186 40 121 3.58 Domestic 
AMC Pacer 4749  3 3.0 11 3350 173 40 258 2.53 Domestic 
AMC Spirit 3799   3.0 12 2640 168 35 121 3.08 Domestic 
Buick Century 4816  3 4.5 16 3250 196 40 196 2.93 Domestic 
Buick Electra 7827  4 4.0 20 4080 222 43 350 2.41 Domestic 
Buick LeSabre 5788  3 4.0 21 3670 218 43 231 2.73 Domestic 
Buick Opel 4453   3.0 10 2230 170 34 304 2.87 Domestic 
Buick Regal 5189  3 2.0 16 3280 200 42 196 2.93 Domestic 
Buick Riviera 10372  3 3.5 17 3880 207 43 231 2.93 Domestic 
Buick Skylark 4082  3 3.5 13 3400 200 42 231 3.08 Domestic 

save "~myauto" 
de(myauto) 

Contains data from ~\myauto.dta 
    obs:   74       1978 Automobile Data 
vars:   12       25 Aug 2013 11:32 
size:   3,478 (99.9% of memory free) (_dta has notes) 
----------------------------------------------------------------------------------------------------------------------------------------------------------------- 
       storage display  value 
variable name type format  label  variable label 
----------------------------------------------------------------------------------------------------------------------------------------------------------------- 
make   str18 %-18s     Make and Model 
price   int %8.0gc     Price 
mpg    int %8.0g     Mileage (mpg) 
rep78   int %8.0g     Repair Record 1978 
headroom  float %6.1f     Headroom (in.) 
trunk   int %8.0g     Trunk space (cu. ft.) 
weight   int %8.0gc     Weight (lbs.) 
length   int %8.0g     Length (in.) 
turn   int %8.0g     Turn Circle (ft.) 
displacement int %8.0g     Displacement (cu. in.) 
gear_ratio  float %6.2f     Gear Ratio 
foreign   byte %8.0g  origin  Car type 
----------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Sorted by: foreign 


library(foreign) 
myauto<-read.dta("myauto.dta") #works perfect 
    str(myauto) 
'data.frame': 74 obs. of 12 variables: 
$ make  : chr "AMC Concord" "AMC Pacer" "AMC Spirit" "Buick Century" ... 
$ price  : int 4099 4749 3799 4816 7827 5788 4453 5189 10372 4082 ... 
$ mpg   : int NA NA NA NA NA NA NA NA NA NA ... 
$ rep78  : int 3 3 NA 3 4 3 NA 3 3 3 ... 
$ headroom : num 2.5 3 3 4.5 4 4 3 2 3.5 3.5 ... 
$ trunk  : int 11 11 12 16 20 21 10 16 17 13 ... 
$ weight  : int 2930 3350 2640 3250 4080 3670 2230 3280 3880 3400 ... 
$ length  : int 186 173 168 196 222 218 170 200 207 200 ... 
$ turn  : int 40 40 35 40 43 43 34 42 43 42 ... 
$ displacement: int 121 258 121 196 350 231 304 196 231 231 ... 
$ gear_ratio : num 3.58 2.53 3.08 2.93 2.41 ... 
$ foreign  : Factor w/ 2 levels "Domestic","Foreign": 1 1 1 1 1 1 1 1 1 1 ... 
- attr(*, "datalabel")= chr "1978 Automobile Data" 
- attr(*, "time.stamp")= chr "25 Aug 2013 11:23" 
- attr(*, "formats")= chr "%-18s" "%8.0gc" "%8.0g" "%8.0g" ... 
- attr(*, "types")= int 18 252 252 252 254 252 252 252 252 252 ... 
- attr(*, "val.labels")= chr "" "" "" "" ... 
- attr(*, "var.labels")= chr "Make and Model" "Price" "Mileage (mpg)" "Repair Record 1978" ... 
- attr(*, "expansion.fields")=List of 2 
    ..$ : chr "_dta" "note1" "from Consumer Reports with permission" 
    ..$ : chr "_dta" "note0" "1" 
- attr(*, "version")= int 12 
- attr(*, "label.table")=List of 1 
    ..$ origin: Named int 0 1 
    .. ..- attr(*, "names")= chr "Domestic" "Foreign"

来源

2013-08-24 21:42:35 Metrics

我是这样做的，我刚刚离开了这个问题。 – Michael

好的。在这种情况下，您需要向我们展示示例数据 – Metrics

在上面添加了更多信息，文件没有损坏，因为我可以将它读入stata。不幸的是，这个文件太大了，我不能在这里发布。 – Michael

我不知道为什么发生这种情况，并有兴趣，如果任何人都可以explai n，但read.dta确实无法处理全部为NA的变量。一种解决方法是删除在Stata这些变量具有以下code：

foreach varname of varlist * { 
quietly sum `varname' 
if `r(N)'==0 { 
    drop `varname' 
    disp "dropped `varname' for too much missing data" 
} 
}

来源

2013-08-24 22:45:53 Michael

对你引用的博客条目的评论（1）给出比这更短的代码，（2）提及'dropmiss'作为用户编写的解决方案。 'findit dropmiss'，从提到的最新网站安装，然后'dropmiss'就足够了。 –

查看我的更新。至少对于Stata中的小数据集，您在这个答案中的主张是不正确的 – Metrics

这里有一个解算器列表。我的猜测是，第一个项目有75％的可能性来解决你的问题。

在Stata，重新保存您的dta文件的全新副本与saveold，然后再试一次。
如果失败，请提供一个样本以显示什么样的值会终止read.dta函数。
如果缺失值是责任，请从其他答案运行循环。

过去那个时候需要对数据集进行更彻底的描述。这个问题似乎是可以解决的，我从来没有遇到太多的麻烦，使用了大量的Stata文件的foreign。

您也可以试试memisc包中的Stata.file函数以查看是否也失败了。

来源

2013-08-30 16:19:51

已经有很多时间了，但我解决了将.dta数据导出为.csv的同样问题。这个问题与因子变量的标签有关，特别是因为标签是西班牙文，ASCII编码是混乱的。我希望这项工作适用于存在同样问题和Stata软件的人员。

在Stata：

export delimited using "/Users/data.csv", nolabel replace

在R：

df <- read.csv("lapop2014.csv")

来源

2016-03-01 03:16:27

回答

相关问题