Skip to contents

EMD can get very heavy with large datasets. For an example lemnatech dataset filtering for images from every 5th day there are 6332^2 = 40,094,224 pairwise EMD values. In long format that's a 40 million row dataframe, which is unwieldy. This function is to help reduce the size of datasets before comparing histograms and moving on with matrix methods or network analysis.

Usage

mv_ag(
  df,
  group,
  mvCols = "frequencies",
  n_per_group = 1,
  outRows = NULL,
  keep = NULL,
  parallel = getOption("mc.cores", 1),
  traitCol = "trait",
  labelCol = "label",
  valueCol = "value",
  id = "image"
)

Arguments

df

A dataframe with multi value traits. This can be in wide or long format, data is assumed to be long if traitCol, valueCol, and labelCol are present.

group

Vector of column names for variables which uniquely identify groups in the data to summarize data over. Typically this would be the design variables and a time variable.

mvCols

Either a vector of column names/positions representing multi value traits or a character string that identifies the multi value trait columns as a regex pattern. Defaults to "frequencies".

n_per_group

Number of rows to return for each group.

outRows

Optionally this is a different way to specify how many rows to return. This will often not be exact so that groups have the same number of observations each.

keep

A vector of single value traits to also average over groups, if there are a mix of single and multi value traits in your data.

parallel

Optionally the groups can be run in parallel with this number of cores, defaults to 1 if the "mc.cores" option is not set globally.

traitCol

Column with phenotype names, defaults to "trait".

labelCol

Column with phenotype labels (units), defaults to "label".

valueCol

Column with phenotype values, defaults to "value".

id

Column that uniquely identifies images if the data is in long format. This is ignored when data is in wide format.

Value

Returns a dataframe summarized by the specified groups over the multi-value traits.

Examples


s1 <- mvSim(
  dists = list(runif = list(min = 15, max = 150)),
  n_samples = 10,
  counts = 1000,
  min_bin = 1,
  max_bin = 180,
  wide = TRUE
)
mv_ag(s1, group = "group", mvCols = "sim_", n_per_group = 2)
#>             group sim_1 sim_2 sim_3 sim_4 sim_5 sim_6 sim_7 sim_8 sim_9 sim_10
#> runif_1.1 runif_1     0     0     0     0     0     0     0     0     0      0
#> runif_1.2 runif_1     0     0     0     0     0     0     0     0     0      0
#>           sim_11 sim_12 sim_13 sim_14 sim_15 sim_16 sim_17 sim_18 sim_19 sim_20
#> runif_1.1      0      0      0      0 0.0078 0.0082 0.0090 0.0082 0.0076 0.0064
#> runif_1.2      0      0      0      0 0.0082 0.0054 0.0082 0.0082 0.0060 0.0068
#>           sim_21 sim_22 sim_23 sim_24 sim_25 sim_26 sim_27 sim_28 sim_29 sim_30
#> runif_1.1 0.0070 0.0080 0.0068 0.0088 0.0054 0.0056 0.0058 0.0086 0.0076 0.0066
#> runif_1.2 0.0062 0.0066 0.0062 0.0092 0.0052 0.0082 0.0090 0.0088 0.0078 0.0078
#>           sim_31 sim_32 sim_33 sim_34 sim_35 sim_36 sim_37 sim_38 sim_39 sim_40
#> runif_1.1 0.0086 0.0084 0.0060 0.0084 0.0092 0.0056 0.0062 0.0056 0.0056 0.0066
#> runif_1.2 0.0092 0.0078 0.0074 0.0058 0.0084 0.0054 0.0080 0.0074 0.0064 0.0072
#>           sim_41 sim_42 sim_43 sim_44 sim_45 sim_46 sim_47 sim_48 sim_49 sim_50
#> runif_1.1 0.0072 0.0096 0.0062 0.0084 0.0082 0.0072 0.0112 0.0070 0.0064 0.0092
#> runif_1.2 0.0076 0.0084 0.0062 0.0074 0.0060 0.0054 0.0086 0.0086 0.0084 0.0068
#>           sim_51 sim_52 sim_53 sim_54 sim_55 sim_56 sim_57 sim_58 sim_59 sim_60
#> runif_1.1 0.0074 0.0064 0.0082 0.0064 0.0062 0.0068 0.0074 0.0054 0.0076 0.0056
#> runif_1.2 0.0078 0.0080 0.0094 0.0068 0.0080 0.0052 0.0070 0.0082 0.0076 0.0090
#>           sim_61 sim_62 sim_63 sim_64 sim_65 sim_66 sim_67 sim_68 sim_69 sim_70
#> runif_1.1 0.0100 0.0078 0.0072 0.0100 0.0062 0.0078  0.009 0.0080 0.0078 0.0088
#> runif_1.2 0.0084 0.0082 0.0058 0.0058 0.0096 0.0068  0.009 0.0066 0.0072 0.0076
#>           sim_71 sim_72 sim_73 sim_74 sim_75 sim_76 sim_77 sim_78 sim_79 sim_80
#> runif_1.1 0.0070 0.0068 0.0082 0.0076 0.0066 0.0088 0.0070 0.0084 0.0086 0.0062
#> runif_1.2 0.0064 0.0070 0.0088 0.0062 0.0070 0.0096 0.0092 0.0066 0.0084 0.0056
#>           sim_81 sim_82 sim_83 sim_84 sim_85 sim_86 sim_87 sim_88 sim_89 sim_90
#> runif_1.1 0.0086 0.0086 0.0064  0.007 0.0060 0.0072 0.0070 0.0064 0.0086 0.0082
#> runif_1.2 0.0070 0.0068 0.0056  0.006 0.0062 0.0066 0.0066 0.0064 0.0082 0.0076
#>           sim_91 sim_92 sim_93 sim_94 sim_95 sim_96 sim_97 sim_98 sim_99
#> runif_1.1 0.0098 0.0058 0.0072  0.008 0.0086 0.0094 0.0064 0.0072 0.0086
#> runif_1.2 0.0064 0.0086 0.0076  0.007 0.0076 0.0082 0.0078 0.0084 0.0072
#>           sim_100 sim_101 sim_102 sim_103 sim_104 sim_105 sim_106 sim_107
#> runif_1.1  0.0054  0.0076  0.0096  0.0084  0.0070  0.0072  0.0064  0.0070
#> runif_1.2  0.0064  0.0062  0.0064  0.0066  0.0074  0.0084  0.0046  0.0066
#>           sim_108 sim_109 sim_110 sim_111 sim_112 sim_113 sim_114 sim_115
#> runif_1.1  0.0076  0.0066  0.0076  0.0076  0.0066  0.0074  0.0078  0.0086
#> runif_1.2  0.0082  0.0082  0.0092  0.0052  0.0058  0.0086  0.0084  0.0066
#>           sim_116 sim_117 sim_118 sim_119 sim_120 sim_121 sim_122 sim_123
#> runif_1.1  0.0064  0.0046  0.0080  0.0106  0.0068  0.0088  0.0068  0.0076
#> runif_1.2  0.0064  0.0068  0.0058  0.0084  0.0098  0.0066  0.0072  0.0082
#>           sim_124 sim_125 sim_126 sim_127 sim_128 sim_129 sim_130 sim_131
#> runif_1.1  0.0074  0.0094  0.0072   0.006  0.0068  0.0086  0.0076  0.0062
#> runif_1.2  0.0050  0.0080  0.0060   0.009  0.0084  0.0062  0.0066  0.0090
#>           sim_132 sim_133 sim_134 sim_135 sim_136 sim_137 sim_138 sim_139
#> runif_1.1  0.0094  0.0062  0.0058  0.0074  0.0068  0.0088  0.0074  0.0040
#> runif_1.2  0.0078  0.0066  0.0094  0.0084  0.0068  0.0064  0.0072  0.0078
#>           sim_140 sim_141 sim_142 sim_143 sim_144 sim_145 sim_146 sim_147
#> runif_1.1  0.0068  0.0070   0.007  0.0072  0.0040  0.0082  0.0054  0.0066
#> runif_1.2  0.0092  0.0084   0.009  0.0086  0.0082  0.0068  0.0088  0.0090
#>           sim_148 sim_149 sim_150 sim_151 sim_152 sim_153 sim_154 sim_155
#> runif_1.1  0.0058  0.0106       0       0       0       0       0       0
#> runif_1.2  0.0050  0.0096       0       0       0       0       0       0
#>           sim_156 sim_157 sim_158 sim_159 sim_160 sim_161 sim_162 sim_163
#> runif_1.1       0       0       0       0       0       0       0       0
#> runif_1.2       0       0       0       0       0       0       0       0
#>           sim_164 sim_165 sim_166 sim_167 sim_168 sim_169 sim_170 sim_171
#> runif_1.1       0       0       0       0       0       0       0       0
#> runif_1.2       0       0       0       0       0       0       0       0
#>           sim_172 sim_173 sim_174 sim_175 sim_176 sim_177 sim_178 sim_179
#> runif_1.1       0       0       0       0       0       0       0       0
#> runif_1.2       0       0       0       0       0       0       0       0
#>           sim_180
#> runif_1.1       0
#> runif_1.2       0