2014-05-13 146 views
3

我正在通过jekyll编写一个关于rmarkdown的相关文档,我将编译成一个网站。在这样做的过程中,我遇到了一个问题:从knitr儿童文档中剥离YAML

我使用的一些Rmd文件调用其他Rmd文件作为子文档。当我使用knitr进行渲染时,生成的文档包含来自父文档和子文档的yaml前端问题。下面给出一个例子。

到目前为止,我没有看到任何方式来指定当文档是Rmd时只有子文档的一部分。是否有人知道在knit()期间读入父Rmd时可以将子文档从子文档中删除的方法?

我很乐意考虑R之外的答案,最好是我可以嵌入到rake文件中的东西。尽管如此,我并不想永久性地修改子文档。所以剥离洋葱不可能是永久性的。最后,在YAML从文件到文件长度发生变化,所以我猜,任何解决方案必须能够找到YAML开始和结束中美战略经济对话的regex/grep的/的/ etc ...

例:

%%%% Parent_Doc.rmd %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 

```{r child import, child="./child_doc."} 
``` 

%%%% child_doc.rmd %%%%

--- 
title: child doc 
layout: default 
etc: etc 
--- 

lorem ipsum etc 

%%%% output.md %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 
--- 
title: child doc 
layout: default 
etc: etc 
--- 

lorem ipsum etc 

%%%%理想Output.md %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 

lorem ipsum etc 
+1

我可以认为这是在下一个版本的knitr中的功能请求,如果你将它提交到https://github.com/yihui/knitr/issues –

+0

@Yihuri:我会提出一个功能请求,但它是可能不值得你的结局。我的用例可能相当具体。谢谢你的回应。 – Tom

+1

不是。我不介意小功能请求:) –

回答

5

与此同时,也许下面就为你工作;这是一种丑陋和低效的解决方法(我对编程者来说是新手,而不是一个真正的程序员),但它实现了我相信你想要做的事情。

我写了一个function类似的个人用途,其中包括以下relevant bit;原来是在西班牙,所以我把它翻译如下一些:

extraction <- function(matter, escape = FALSE, ruta = ".", patron) { 

    require(yaml) 

    # Gather together directory of documents to be processed 

    doc_list <- list.files(
    path = ruta, 
    pattern = patron, 
    full.names = TRUE 
    ) 

    # Extract desired contents 

    lapply(
    X = doc_list, 
    FUN = function(i) { 
     raw_contents <- readLines(con = i, encoding = "UTF-8") 

     switch(
     EXPR = matter, 

     # !YAML (e.g., HTML) 

     "no_yaml" = { 

      if (escape == FALSE) { 

      paste(raw_contents, sep = "", collapse = "\n") 

      } else if (escape == TRUE) { 

      require(XML) 
      to_be_escaped <- paste(raw_contents, sep = "", collapse = "\n") 
      xmlTextNode(value = to_be_escaped) 

      } 

     }, 

     # YAML header and Rmd contents 

     "rmd" = { 
      yaml_pattern <- "[-]{3}|[.]{3}" 
      limits_yaml <- grep(pattern = yaml_pattern, x = raw_contents)[1:2] 
      indices_yaml <- seq(
      from = limits_yaml[1] + 1, 
      to = limits_yaml[2] - 1 
      ) 
      yaml <- mapply(
      FUN = function(i) {yaml.load(string = i)}, 
      raw_contents[indices_yaml], 
      USE.NAMES = FALSE 
      ) 
      indices_rmd <- seq(
      from = limits_yaml[2] + 1, 
      to = length(x = raw_contents) 
      ) 
      rmd<- paste(raw_contents[indices_rmd], sep = "", collapse = "\n") 
      c(yaml, "contents" = rmd) 
     }, 

     # Anything else (just in case) 

     { 
      stop("Matter not extractable") 
     } 

    ) 

    } 
    ) 

} 

说我的主要RMD文件main.Rmd生活my_directory和我的孩子文件,01-abstract.Rmd02-intro.Rmd,...,06-conclusion.Rmd被安置在./sections;请注意,对于我的业余功能,最好将子文档按照它们将被传入主文档的顺序保存(见下文)。我有我的功能extraction.R./assets。这是我的例子目录结构:

. 
+--assets 
| +--extraction.R 
+--sections 
| +--01-abstract.Rmd 
| +--02-intro.Rmd 
| +--03-methods.Rmd 
| +--04-results.Rmd 
| +--05-discussion.Rmd 
| +--06-conclusion.Rmd 
+--stats 
| +--analysis.R 
+--main.Rmd 

main.Rmd导入我的子文档从./sections

--- 
title: Main 
author: me 
date: Today 
output: 
    html_document 
--- 

```{r, 'setup', include = FALSE} 
opts_chunk$set(autodep = TRUE) 
dep_auto() 
``` 

```{r, 'import_children', cache = TRUE, include = FALSE} 
source('./assets/extraction.R') 
rmd <- extraction(
    matter = 'rmd', 
    ruta = './sections', 
    patron = "*.Rmd" 
) 
``` 

# Abstract 

```{r, 'abstract', echo = FALSE, results = 'asis'} 
cat(x = rmd[[1]][["contents"]], sep = "\n") 
``` 

# Introduction 

```{r, 'intro', echo = FALSE, results = 'asis'} 
cat(x = rmd[[2]][["contents"]], sep = "\n") 
``` 

# Methods 

```{r, 'methods', echo = FALSE, results = 'asis'} 
cat(x = rmd[[3]][["contents"]], sep = "\n") 
``` 

# Results 

```{r, 'results', echo = FALSE, results = 'asis'} 
cat(x = rmd[[4]][["contents"]], sep = "\n") 
``` 

# Discussion 

```{r, 'discussion', echo = FALSE, results = 'asis'} 
cat(x = rmd[[5]][["contents"]], sep = "\n") 
``` 

# Conclusion 

```{r, 'conclusion', echo = FALSE, results = 'asis'} 
cat(x = rmd[[6]][["contents"]], sep = "\n") 
``` 

# References 

我再编织这个文件,只有我的子文档的内容纳入到其中,例如:

--- 
title: Main 
author: me 
date: Today 
output: 
    html_document 
--- 





# Abstract 


This is **Child Doc 1**, my abstract. 

# Introduction 


This is **Child Doc 2**, my introduction. 

- Point 1 
- Point 2 
- Point *n* 

# Methods 


This is **Child Doc 3**, my "Methods" section. 

| method 1 | method 2 | method *n* | 
|---------------|---------------|----------------| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 

# Results 


This is **Child Doc 4**, my "Results" section. 

## Result 1 

```{r} 
library(knitr) 
``` 

```{r, 'analysis', cache = FALSE} 
source(file = '../stats/analysis.R') 
``` 

# Discussion 


This is **Child Doc 5**, where the results are discussed. 

# Conclusion 


This is **Child Doc 6**, where I state my conclusions. 

# References 

前述文件是main.Rmd针织版本,即main.md。注意## Result 1在我的孩子文档04-results.Rmd中,我提供了一个外部R脚本,./stats/analysis.R,它现在被编入我的针织文档中作为一个新的编织块;因此,我现在需要再次编织文件。

当子文档还包括块,而不是编织成.md我编织的主要文件到另一个.Rmd很多次我都块嵌套,例如,继续上面的例子:

  1. 使用knit(input = './main.Rmd', output = './main_2.Rmd'),而不是编织main.Rmdmain.md,我会编织成另一个.RMD,以便能够编织包含新导入的块的结果文件,例如,我的R脚本analysis.R上面。
  2. 我现在可以将我的main_2.Rmd编织成main.md或通过rmarkdown::render(input = './main_2.Rmd', output_file = './main.html')呈现为main.html

注意:在上面的main.md的例子中,路径到我的[R脚本是../stats/analysis.R。这是相对于源自它的子文档的路径,./sections/04-results.Rmd。一旦我将子文档导入位于my_directory根目录的主文档,即./main.md./main_2.Rmd,路径就会出错;因此,我必须在下一针织之前手动将其更正为./stats/analysis.R

我在上面提到过,最好将子文档保存为与导入到主文档中的顺序相同的顺序。这是因为我的简单功能extraction()只是将指定给它的所有文件的内容存储在一个未命名的列表中,因此我必须通过编号访问main.Rmd中的每个文件,即rmd[[5]][["contents"]]指的是子文档./sections/05-discussion.Rmd;考虑:

> str(rmd) 
List of 6 
$ :List of 4 
    ..$ title  : chr "child doc 1" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 1**, my abstract." 
$ :List of 4 
    ..$ title  : chr "child doc 2" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 2**, my introduction.\n\n- Point 1\n- Point 2\n- Point *n*" 
$ :List of 4 
    ..$ title  : chr "child doc 3" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 3**, my \"Methods\" section.\n\n| method 1 | method 2 | method *n* |\n|--------------|--------------|----"| __truncated__ 
$ :List of 4 
    ..$ title  : chr "child doc 4" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 4**, my \"Results\" section.\n\n## Result 1\n\n```{r}\nlibrary(knitr)\n```\n\n```{r, cache = FALSE}\nsour"| __truncated__ 
$ :List of 4 
    ..$ title  : chr "child doc 5" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 5**, where the results are discussed." 
$ :List of 4 
    ..$ title  : chr "child doc 6" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 6**, where I state my conclusions." 

所以,extraction()这里实际上是两个存储指定的子文档中的R降价内容,以及他们YAML,如果你有这方面的一个应用,以及(我自己做的)。