R中的数据导入分隔符问题

我试图将文本文件导入到R中，并将其与其他数据一起放入数据框中。R中的数据导入分隔符问题

我的分隔符是"|"和我的数据样本是在这里：

|无痛办理登机手续。 AC上的两条腿：AC105，YYZ-YVR。宽敞而干净的A321与奇妙的船员。 AC33：YVR-SYD，非常轻的负载，并有3个席位给我自己。像往常一样，我非常热情友好的工作人员每年参加几次这个太平洋航线。提前20分钟到达。我们的国旗航空公司加拿大航空公司的的预期高水平服务。 Altitude Elite会员。 |我们最近从都柏林返回多伦多，然后返回温尼伯。除了削减它关闭由于有限的在多伦多的工作人员，我们的飞行非常好。由于在多伦多匆匆忙忙，我们的一个随行人员被放入货舱。当我们抵达温尼伯时，它住在多伦多，他们在温尼伯机场最有帮助和善良，我们第二天接到3个电话，关于错放的包，它被送到我们的家。我们非常感谢并感谢我们收到的服务，这是一个美好假期的完美结局。 |飞往希思罗机场的多伦多。远比出路更糟糕的飞行。我们为出口座位付了很高的额外费用，其中没有任何存储，甚至没有座位下的任何房间。荒谬。船员很穷，不友善。一位年长的男性工作人员态度很好，就好像他正在通过为他们服务来帮助每个人一样。一顿合理的晚餐，但早餐是一块香蕉面包。而已！最糟糕的航空公司早餐我有。 enter image description here

正如你所看到的，有很多"|"，但正如下面的屏幕截图所示，当我在R中导入数据时，它只分离了一次，而不是大约152次。

如何在数据框内的不同列中获取每段单独的文本？我想长度152的数据帧，而不是2

编辑：代码行是：

myData <- read.table("C:/Users/Norbert/Desktop/research/Important files/Airline Reviews/Reviews/air_can_Review.txt", sep="|",quote=NULL, comment='',fill = TRUE, header=FALSE) 

length(myData) 
[1] 2 
class(myData) 
[1] "data.frame" 
str(myData) 
'data.frame': 1244 obs. of 2 variables: 
$ V1: Factor w/ 1093 levels "","'delayed' on departure (I reference flights between March 2014 and January 2015 in this regard: Denver, SFO,",..: 210 367 698 853 1 344 483 87 757 52 ... 
$ V2: Factor w/ 154 levels ""," hotel","5/9/2014, LHR to Vancouver, AC855. 23/9/2014, Vancouver to LHR, AC854. For Economy the leg room was OK compared to",..: 1 1 1 1 78 1 1 1 1 1 ... 

myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", stringsAsFactors = FALSE) 
str(myDataFrame) 
'data.frame': 531 obs. of 3 variables: 
    $ text  : chr "BRU-YUL, May 26th, A330-300. Departed on-time, landed 30 minutes late due to strong winds, nice flight, food" "excellent, cabin-crew smiling and attentive except for one old lady throwing meal trays like boomerangs. Seat-" "pitch was very generous, comfortable seat, IFE a bit outdated but selection was Okay. Air Canadas problem is\nthat the new pro"| __truncated__ "" ... 
$ otherVar2 : num 1 1 1 1 1 1 1 1 1 1 ... 
$ otherVar2.1: chr "blue" "blue" "blue" "blue" ... 

length(myDataFrame) 
[1] 3

来源

2015-06-01 Uther Pendragon

看看[这里]（http://stackoverflow.com/questions/24679042/problems-with-reading-a-txt-file-eof-within-quoted-string）。你可能需要在read.table（）中添加两个参数：'quote = NULL，comment =''' – Parfait

@Parfait它工作正常，但警告信息消失了。数据帧的长度仍然是2，当它应该是152 –

'str（myData）'输出什么？ – Parfait

一种更好的方式在文本读取使用scan()，然后把它放进一个数据与你的其他变量框架（在这里我只是做了一些）。请注意，我将上面的文字粘贴到一个名为sample.txt的文件中，删除开始的“|”后。

myData <- scan("sample.txt", what = "character", sep = "|") 
myDataFrame <- data.frame(text = myData, otherVar2 = 1, otherVar2 = "blue", 
          stringsAsFactors = FALSE) 
str(myDataFrame) 
## 'data.frame': 3 obs. of 3 variables: 
## $ text  : chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__ 
## $ otherVar2 : num 1 1 1 
## $ otherVar2.1: Factor w/ 1 level "blue": 1 1 1

的otherVar1，otherVar2是自己的变量只是占位，因为你说你想与其他变量data.frame。我选择了一个整数变量和一个文本变量，并且通过指定一个单一的值，它将被回收用于数据集中的所有观测值（在本例中为3）。

我意识到你的问题是问如何让每个文本在不同的列中，但这不是一个使用data.frame的好方法，因为data.frames被设计用来保存列中的变量。（每列有一个文本，您不能添加其他变量。）

如果你真的要做到这一点，你必须调换之后要挟数据，如下所示：

myDataFrame <- as.data.frame(t(data.frame(text = myData, stringsAsFactors = FALSE)), stringsAsFactors = FALSE) 
str(myDataFrame) 
## 'data.frame': 1 obs. of 3 variables: 
## $ V1: chr "Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD, very light loa"| __truncated__ 
## $ V2: chr "We recently returned from Dublin to Toronto, then on to Winnipeg. Other than cutting it close due to limited staffing in Toront"| __truncated__ 
## $ V3: chr "Flew Toronto to Heathrow. Much worse flight than on the way out. We paid a hefty extra fee for exit seats which had no storage "| __truncated__ 
length(myDataFrame) 
## [1] 3

“可怜的香蕉面包”？绝对是经济舱。

来源

2015-06-01 16:11:58

另一个变量2代表你编码的行中代表什么？它应该代表什么？ @Ken Benoit –

我想你错误地理解了我真正想要完成的事情。我试图将每个评论放在不同的列中，但是您的代码将所有文本放在1列中......我也知道如何做，但我想在分隔符处分割文本，并将下一个评论放入新的专栏...我编辑了问题中的代码和输出。 @Ken Benoit –

不，我明白，试图轻轻地建议你不应该使用这样的data.frame。查看修改。 –

R中的数据导入分隔符问题

回答

相关问题