Skip to contents

Convert files to markdown

Usage

read_as_markdown(x, ..., canonical = FALSE)

Arguments

x

A filepath or url

...

These dots are for future extensions and must be empty.

canonical

logical, whether to postprocess the output from MarkItDown with commonmark::markdown_commonmark().

Value

A single string of markdown

Examples

# convert html
read_as_markdown("https://r4ds.hadley.nz/base-R.html") |>
  substr(1, 1000) |> cat()
#> # 27  A field guide to base R – R for Data Science (2e)
#> 
#> 1. [Program](./program.html)
#> 2. [27  A field guide to base R](./base-R.html)
#> 
#> [R for Data Science (2e)](./)
#> 
#> * [Welcome](./index.html)
#> * [Preface to the second edition](./preface-2e.html)
#> * [Introduction](./intro.html)
#> * [Whole game](./whole-game.html)
#> 
#>   + [1  Data visualization](./data-visualize.html)
#>   + [2  Workflow: basics](./workflow-basics.html)
#>   + [3  Data transformation](./data-transform.html)
#>   + [4  Workflow: code style](./workflow-style.html)
#>   + [5  Data tidying](./data-tidy.html)
#>   + [6  Workflow: scripts and projects](./workflow-scripts.html)
#>   + [7  Data import](./data-import.html)
#>   + [8  Workflow: getting help](./workflow-help.html)
#> * [Visualize](./visualize.html)
#> 
#>   + [9  Layers](./layers.html)
#>   + [10  Exploratory data analysis](./EDA.html)
#>   + [11  Communication](./communication.html)
#> * [Transform](./transform.html)
#> 
#>   + [12  Logical vectors](./logicals.html)
#>   + [13  Numbers](./numbers.html)
#>   + [14  String

read_as_markdown("https://r4ds.hadley.nz/base-R.html", canonical = TRUE) |>
  substr(1, 1000) |> cat()
#> # 27  A field guide to base R – R for Data Science (2e)
#> 
#> 1.  [Program](./program.html)
#> 2.  [27  A field guide to base R](./base-R.html)
#> 
#> [R for Data Science (2e)](./)
#> 
#>   - [Welcome](./index.html)
#> 
#>   - [Preface to the second edition](./preface-2e.html)
#> 
#>   - [Introduction](./intro.html)
#> 
#>   - [Whole game](./whole-game.html)
#>     
#>       - [1  Data visualization](./data-visualize.html)
#>       - [2  Workflow: basics](./workflow-basics.html)
#>       - [3  Data transformation](./data-transform.html)
#>       - [4  Workflow: code style](./workflow-style.html)
#>       - [5  Data tidying](./data-tidy.html)
#>       - [6  Workflow: scripts and projects](./workflow-scripts.html)
#>       - [7  Data import](./data-import.html)
#>       - [8  Workflow: getting help](./workflow-help.html)
#> 
#>   - [Visualize](./visualize.html)
#>     
#>       - [9  Layers](./layers.html)
#>       - [10  Exploratory data analysis](./EDA.html)
#>       - [11  Communication](./communication.html)
#> 
#>   - [Transform](./transform.html)
#>     
#>       - [12  Logi

# convert pdf
pdf <- file.path(R.home("doc"), "NEWS.pdf")
read_as_markdown(pdf) |> substr(1, 1000) |> cat()
#> NEWS for R version 4.4.3 (2025-02-28)
#> 
#> NEWS
#> 
#> R News
#> 
#> CHANGES IN R 4.4.3
#> 
#> INSTALLATION:
#> 
#> (cid:136) R can be installed using C23 (for example with ‘-std=gnu23’ or ‘-std=gnu2x’) with
#> recent compilers including gcc 12(cid:21)14, Apple clang 15(cid:21)16, LLVM clang 17(cid:21)20 and
#> Intel icx 2024.2.
#> It can be installed with the upcoming (at the time of writing) gcc 15, which defaults
#> to C23.
#> 
#> C-LEVEL FACILITIES:
#> 
#> (cid:136) The functions R_strtod and R_atof now allow hexadecimal constants without an
#> 
#> exponent, for compatibility with their C99 versions (PR#18805).
#> 
#> UTILITIES:
#> 
#> (cid:136) R CMD build and R CMD check now allow reference output
#> 
#> for demo scripts
#> (‘demo/demo.Rout.save’ (cid:28)les) to be shipped with the package, as proposed by
#> Torsten Hothorn in PR#18816.
#> 
#> BUG FIXES:
#> 
#> (cid:136) kappa(A, exact=TRUE) for singular A returns Inf more generally, (cid:28)xing PR#18817
#> 
#> reported by Mikael Jagan.
#> 
#> (cid:136) Fixed URLs of the sun spots (sunspot.month etc) data sets and mention future
#> 
#> ch
## alternative:
# pdftools::pdf_text(pdf) |> substr(1, 2000) |> cat()