博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
样本修改 sample_如何在R中使用sample()获取样本?
阅读量:2527 次
发布时间:2019-05-11

本文共 8657 字,大约阅读时间需要 28 分钟。

样本修改 sample

Let’s understand one of the frequently used functions, sample() in R. In data analysis, taking samples of the data is the most common process done by the analysts. To study and understand the data, sometimes taking a sample is the best way and it is mostly true in case of big data.

让我们了解R中最常用的函数之一sample()。在数据分析中,对数据进行采样是分析师最常用的过程。 要研究和理解数据,有时取样是最好的方法,并且在大数据的情况下通常是正确的。

R offers the standard function sample() to take a sample from the datasets. Many business and data analysis problems will require taking samples from the data. The random data is generated in this process with or without replacement, which is illustrated in the below sections.

R提供了标准函数sample()来从数据集中获取样本。 许多业务和数据分析问题都需要从数据中取样。 随机数据是在此过程中生成的,有无替换,如下节所示。



Let’s roll into the topic!!!

让我们进入主题!!!

R中sample()的语法 (Syntax of sample() in R)

sample(x, size, replace = FALSE, prob = NULL)
  • x – vector or a data set.

    x –向量或数据集。
  • size – sample size.

    大小 –样本大小。
  • replace – with or without replacement of values.

    替换 –替换或不替换值。
  • replace – with or without replacement of values.

    替换 –替换或不替换值。
  • prob – probability weights

    概率 –概率权重


更换样品 (Taking samples with replacement)

You may wonder, what is taking samples with replacement?

您可能想知道,正在取样替换的样品是什么?

Well, while you are taking samples from a list or a data, if you specify replace=TRUE or T, then the function will allow repetition of values.

好吧,当您从列表或数据中取样时,如果指定replace = TRUE或T ,则该函数将允许重复值。

Follow the below example which clearly explains the case.

请遵循以下示例,该示例清楚地说明了这种情况。

#sample range lies between 1 to 5x<- sample(1:5)#prints the samplesxOutput -> 3 2 1 5 4#samples range is 1 to 5 and number of samples is 3x<- sample(1:5, 3)#prints the samples (3 samples)xOutput -> 2 4 5#sample range is 1 to 5 and the number of samples is 6x<- sample(1:5, 6)x#shows error as the range should include only 5 numbers (1:5)Error in sample.int(length(x), size, replace, prob) :   cannot take a sample larger than the population when 'replace = FALSE'#specifing replace=TRUE or T will allow repetition of values so that the function will generate 6 samples in the range 1 to 5. Here 2 is repeated. x<- sample(1:5, 6, replace=T)Output -> 2 4 2 2 4 3


R中未更换的样品 (Samples Without Replacement in R)

In this case, we are going to take samples without replacement. The whole concept is shown below.

在这种情况下,我们将取样 而不更换 样品 。 整个概念如下所示。

In this case of without replacement, the function replace=F is used and it will not allow the repetition of values.

在不替换的情况下,将使用函数replace = F ,它将不允许重复值。

#samples without replacement x<-sample(1:8, 7, replace=F)xOutput -> 4 1 6 5 3 2 7x<-sample(1:8, 9, replace=F)Error in sample.int(length(x), size, replace, prob) :cannot take a sample larger than the population when 'replace = FALSE'#here the size of the sample is equal to range 'x'. x<- sample(1:5, 5, replace=F)xOutput -> 5 4 1 3 2


使用函数set.seed()进行采样 (Taking samples using the function set.seed())

As you may experience that when you take the samples, they will be random and change each time. In order to avoid that or if you don’t want different samples each time, you can make use of set.seed() function.

正如您可能会遇到的那样,当您采样时,它们将是随机的,并且每次都会改变。 为了避免这种情况,或者如果您不想每次都使用不同的样本,可以使用set.seed()函数。

set.seed() – set.seed function will produce the same sequence when you run it.

set.seed() – set.seed函数在运行时会产生相同的序列。

This case is illustrated below, execute the below code to get the same random samples each time.

下面说明了这种情况,执行以下代码以每次获得相同的随机样本。

#set the index set.seed(5)#takes the random samples with replacementsample(1:5, 4, replace=T)2 3 1 3set.seed(5)sample(1:5, 4, replace=T)2 3 1 3set.seed(5)sample(1:5, 4, replace=T)2 3 1 3


从数据集中获取样本 (Taking the sample from a dataset)

In this section, we are going to generate samples from a dataset in Rstudio.

在本节中,我们将从Rstudio中的数据集中生成样本。

This code will take the 10 rows as a sample from the ‘ToothGrowth’ dataset and display it. In this way, you can take the samples of the required size from the dataset.

此代码将从“ ToothGrowth”数据集中获取10行作为示例并显示它。 这样,您可以从数据集中获取所需大小的样本。

#reads the dataset 'Toothgrwoth' and take the 10 rows as sampledf<- sample(1:nrow(ToothGrowth), 10)df--> 53 12 16 26 37 27  9 22 28 10#sample 10 rowsToothGrowth[df,]    len supp dose53 22.4   OJ  2.012 16.5   VC  1.016 17.3   VC  1.026 32.5   VC  2.037  8.2   OJ  0.527 26.7   VC  2.09   5.2   VC  0.522 18.5   VC  2.028 21.5   VC  2.010  7.0   VC  0.5


使用set.seed()函数从数据集中获取样本 (Taking the samples from the dataset using the set.seed() function)

In this section, we are going to use the set.seed() function to take the samples from the dataset.

在本节中,我们将使用set.seed()函数从数据集中获取样本。

Execute the below code to generate the samples from the data set using set.seed().

执行以下代码,使用set.seed()从数据集中生成样本。

#set.seed functionset.seed(10)#taking sample of 10 rows from the iris dataset. x<- sample(1:nrow(iris), 10)x--> 137  74 112  72  88  15 143 149  24  13#displays the 10 rowsiris[x, ]    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species137          6.3         3.4          5.6         2.4  virginica74           6.1         2.8          4.7         1.2 versicolor112          6.4         2.7          5.3         1.9  virginica72           6.1         2.8          4.0         1.3 versicolor88           6.3         2.3          4.4         1.3 versicolor15           5.8         4.0          1.2         0.2     setosa143          5.8         2.7          5.1         1.9  virginica149          6.2         3.4          5.4         2.3  virginica24           5.1         3.3          1.7         0.5     setosa13           4.8         3.0          1.4         0.1     setosa

You will get the same rows when you execute the code multiple times. The values won’t change as we have used the set.seed() function.

多次执行代码时,将获得相同的行。 这些值不会更改,因为我们已经使用过set.seed()函数。



在R中使用sample()生成随机样本 (Generating a random sample using sample() in R)

Well, we will understand this concept with the help of a problem.

好吧,我们将在问题的帮助下理解这个概念。

Problem: A gift shop has decided to give a surprise gift to one of its customers. For this purpose, they have collected some names. The thing is to choose a random name out of the list.

问题:一家礼品店已决定向其一位顾客提供惊喜礼物。 为此,他们收集了一些名称。 事情是从列表中选择一个随机名称。

Hint: use the sample() function to generate random samples.

提示:使用sample()函数生成随机样本。

As you can see below, every time you run this code, it generates a random sample of participant names.

如下所示,每次运行此代码时,它都会随机生成参与者名称样本。

#creates a list of names and generates one sample from this listsample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)--> "Rossie" sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)--> "Jolie"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)--> "jack"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)--> "Edwards"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)--> "Kyle"


通过设置概率采样 (Taking samples by setting the probabilities )

With the help of the above examples and concepts, you have understood how you can generate random samples and extract specific data from a dataset.

借助以上示例和概念,您已经了解了如何生成随机样本并从数据集中提取特定数据。

Some of you may feel relaxed if I say that R allows you to set the probabilities, as it may solve many problems. Let’s see how it works with the help of a simple example.

如果我说R允许您设置概率,则有些人可能会感到轻松,因为它可以解决许多问题。 让我们在一个简单的示例的帮助下看看它是如何工作的。

Let’s think of a company that is able to manufacture 10 watches. Among these 10 watches, 20% of them are found defective. Let’s illustrate this with the help of the below code.

让我们考虑一家能够制造10只手表的公司。 在这10只手表中,有20%被发现有缺陷。 让我们借助以下代码来说明这一点。

#creates a probability of 80% good watches an 20% effective watches. sample (c('Good','Defective'), size=10, replace=T, prob=c(.80,.20)) "Good"      "Good"      "Good"      "Defective" "Good"      "Good"     "Good"      "Good"      "Defective" "Good"

You can also try for different probability adjustments as shown below.

您还可以尝试进行如下所示的不同概率调整。

sample (c('Good','Defective'), size=10, replace=T, prob=c(.60,.40)) --> "Good"      "Defective" "Good"      "Defective" "Defective" "Good"      "Good"      "Good"      "Defective" "Good"


结语 (Wrapping up)

In this tutorial, you have learned how to generate the sample from the dataset, , and a with or without replacement. The set.seed() function is helpful when you are generating the same sequence of samples.

在本教程中,您学习了如何从数据集, 以及有或没有替换的生成样本。 当您生成相同的样本序列时,set.seed()函数会很有用。

Try taking samples from various datasets available in R and also you can import some CSV files to take samples with probability adjustments as shown.

尝试从R中可用的各种数据集中获取样本,也可以导入一些CSV文件以通过概率调整来获取样本,如图所示。

More study:

更多研究:

翻译自:

样本修改 sample

转载地址:http://ogozd.baihongyu.com/

你可能感兴趣的文章
window7修改hosts文件
查看>>
【Leetcode_easy】720. Longest Word in Dictionary
查看>>
地铁时光机第一阶段冲刺一
查看>>
Code Smell那么多,应该先改哪一个?
查看>>
站立会议02(一期)
查看>>
oracle数据库导入导出问题
查看>>
Android中的动画
查看>>
LeetCode 119 Pascal's Triangle II
查看>>
【Noip2015pj】求和
查看>>
深入理解JavaScript——闭包
查看>>
C#-WebForm-css box-shadow 给边框添加阴影效果
查看>>
objective-c 编程总结(第七篇)运行时操作 - 动态属性
查看>>
C_数据结构_链表
查看>>
kettle-连接控件
查看>>
Coursera--Neural Networks and Deep Learning Week 2
查看>>
C#中的委托和事件(续)【来自张子扬】
查看>>
机器学习部分国内牛人
查看>>
模拟Sping MVC
查看>>
Luogu 3261 [JLOI2015]城池攻占
查看>>
java修饰符
查看>>