2017-06-06

自作関数の高速化

r bytecompile 高速化

efficient R programingを読んでいて今後使えそうなことに関するメモ。

3.7 では自作関数の高速化について述べられている。

自作関数にcompiler::cmpfun()を適用すればよいだけらしい。

library(compiler)

mean_r = function(x) {
  m = 0
  n = length(x)
  for(i in seq_len(n))
    m = m + x[i] / n
  m
}

cmp_mean_r = cmpfun(mean_r)

平均を求める関数についての比較結果はこのようになる。結構差が出ているので、複雑な処理をする際には有効に使えそうである。

x = rnorm(1000)
microbenchmark(times = 10, unit = "ms", # milliseconds
          mean_r(x), cmp_mean_r(x), mean(x))
#> Unit: milliseconds
#>           expr   min    lq  mean median    uq  max neval cld
#>      mean_r(x) 0.358 0.361 0.370  0.363 0.367 0.43    10   c
#>  cmp_mean_r(x) 0.050 0.051 0.052  0.051 0.051 0.07    10  b 
#>        mean(x) 0.005 0.005 0.008  0.007 0.008 0.03    10 a

2017/7/23追記 R言語逆引きハンドブックにバイトコンパイルによる高速化として、しっかり書いてあったので、詳細はそちらを確認。

2017-06-03

tmapとggplot2の出力結果の比較

r sf sp ggplot2 tmap

空間データのプロット

sfやspクラスオブジェクトのデータをプロットする方法について tmapとggplot2パッケージによる出力を比較してみる。

library(tmap)
library(sf)
library(mapview)
library(ggplot2)

spクラスオブジェクトとsfクラスオブジェクトの2つを準備する。

nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = T)
data("atlStorms2005")

class(nc)

[1] "sf"         "data.frame"

class(atlStorms2005)

[1] "SpatialLinesDataFrame"
attr(,"package")
[1] "sp"

tmapを利用して書くと

tm_shape(nc) + tm_polygons(col = "AREA")

f:id:jerrarrdan:20170603015115p:plain

tm_shape(atlStorms2005) + tm_lines(col = "MaxWind") +
  tm_layout(frame = F)

f:id:jerrarrdan:20170603015130p:plain

ggplot2を利用して書くと

ggplot() +
  geom_sf(data = nc, aes(fill = AREA))

f:id:jerrarrdan:20170603015142p:plain

ggplot() +
  geom_sf(data = st_as_sf(atlStorms2005), aes(color = MaxWind))

f:id:jerrarrdan:20170603015153p:plain

tmapパッケージではsp, sfオブジェクトともに区別なく書くことができるが、 ggplot2ではsfオブジェクトのみ対応しており、変換が必要になる。

ggplot2パッケージのパレットが見づらいと感じるので、デフォルトのパレットを変更されることを期待している。

2017-05-31

行の結合にはrbind.fill()が便利

r plyr

各データフレームを行で結合したい場合がある。列で結合する場合はleft_join()などのjoin関数がある。

データがない部分はNAで保管される。

    library(plyr)
    rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")])

    mpg    wt cyl
    1 21.0 2.620  NA
    2 21.0 2.875  NA
    3 22.8 2.320  NA
    4 21.4 3.215  NA
    5 18.7 3.440  NA
    6 18.1 3.460  NA
    .................
    59  NA 2.140   4
    60  NA 1.513   4
    61  NA 3.170   8
    62  NA 2.770   6
    63  NA 3.570   8
    64  NA 2.780   4

2017/7/23追記 dplyrパッケージのbind_rows()、rbind_all()関数でも同様のことができるらしいので、わざわざplyrパッケージをロードして使用する必要もないらしい。ただし、bind_rows()はオブジェクトクラスによってはうまく機能しないらしいので注意。現時点ではsfオブジェクトに対してはエラーが発生する。

2017-05-21

leaflet、mapviewパッケージを用いた可視化のコードサンプル

r leaflet

いつも調べながら書いているので、よく利用するオプションを書いておく

# 必要とするパッケージ
library(mapview)
library(kokudosuuchi)
library(tidyverse)
library(sp)

# データの読込(岐阜県の行政区域)
getURL <- 
  kokudosuuchi::getKSJURL("N03", prefCode=21) %>% 
  filter(year == 2017) %>%
  .[,"zipFileUrl"]

adm_area <- kokudosuuchi::getKSJData(as.character(getURL))
adm_area <- adm_area[[1]]
proj4string(adm_area) <- CRS("+init=epsg:4612")

# mapview,leafletパッケージを用いて可視化
pal <- colorFactor(palette = "YlOrRd", domain = adm_area$市区町村名)
leaflet() %>%
  addProviderTiles("OpenStreetMap.BlackAndWhite") %>%
  addPolygons(data = adm_area, weight = 1, group = "adm area", 
              fillColor = ~ pal(adm_area$市区町村名),
              fillOpacity = 0.5,
              highlightOptions = highlightOptions(color = "green", weight = 7),
              popup = popupTable(adm_area@data)) %>%
  addLayersControl(overlayGroups = c("adm area"), position = "bottomright") %>%
  # mapview::mapshot(url = "test.html") # 静的な図として保存する場合はコメントアウト

f:id:jerrarrdan:20170521145920p:plain

2017-05-16

data.frameからsfクラスのデータを作成する方法

r sf

結論から言うとst_as_sf(data.frame, coords=c(x, y), crs=4612)というように行えばよい

library(sp)
library(sf)
data(meuse)

head(meuse)
##        x      y cadmium copper lead zinc  elev       dist   om ffreq soil lime landuse
## 1 181072 333611    11.7     85  299 1022 7.909 0.00135803 13.6     1    1    1      Ah
## 2 181025 333558     8.6     81  277 1141 6.983 0.01222430 14.0     1    1    1      Ah
## 3 181165 333537     6.5     68  199  640 7.800 0.10302900 13.0     1    1    1      Ah
## 4 181298 333484     2.6     81  116  257 7.655 0.19009400  8.0     1    2    0      Ga
## 5 181307 333330     2.8     48  117  269 7.480 0.27709000  8.7     1    2    0      Ah
## 6 181390 333260     3.0     61  137  281 7.791 0.36406700  7.8     1    2    0      Ga
##   dist.m
## 1     50
## 2     30
## 3    150
## 4    270
## 5    380
## 6    470

meuse_sf <-
  st_as_sf(meuse, coords = c("x", "y"), crs = 28992)

class(meuse_sf)

## [1] "sf"         "data.frame"

meuse_sf %>% st_geometry() %>% plot(axes=T)

f:id:jerrarrdan:20170516211900p:plain

2017-05-13

sfクラスのデータ抽出

r sf

ポリゴン内のポイントまたはポイントがあるポリゴンデータを抽出したい場合がある．

特に難しいことはないが，メモとして残しておく．

library(sf)
set.seed(1)

nc <- system.file("shape/nc.shp", package="sf") %>% st_read
nc1 <- nc[1:50, ]
nc_points <- st_sample(nc, size = 10)

# plot(nc)
plot(st_geometry(nc), border = "grey40")
# plot(nc1)
plot(st_geometry(nc1),border="grey40", col = "grey90", add = T)
# plot(nc_points)
plot(nc_points, add =T, pch = 16)
# plot(nc1_filtered)
nc1[nc_points, ] %>% plot(add=T, lwd = 2, border = "blue")
# plot(nc_points_filtered)
nc_points[nc1] %>% plot(add = T, col = "violet", cex = 3, lwd = 2)