Statistics toolbox презентация

Август 6, 2021

Содержание

2. Decision Tree functions
3. Функция ‘treefit’ - fit a tree-based model for classification or regression. Syntax: t = treefit(X,y) Пример:
4. Cluster analysis functions
5. Функция kmeans IDX = kmeans(X,k) [IDX,C] = kmeans(X,k) [IDX,C,sumd] = kmeans(X,k) [IDX,C,sumd,D] = kmeans(X,k) [...] =
6. Параметр ‘distance’ 'sqEuclidean‘ - Squared Euclidean distance (default). 'cityblock‘ - Sum of absolute differences, i.e., L1.
7. Параметр ‘start’ Method used to choose the initial cluster centroid positions, sometimes known as "seeds". Valid
8. Classification load fisheriris; gscatter(meas(:,1), meas(:,2), species,'rgb','osd'); xlabel('Sepal length'); ylabel('Sepal width');
9. Linear and quadratic discriminant analysis linclass = classify(meas(:,1:2), meas(:,1:2),species); bad = ~strcmp(linclass,species); numobs = size(meas,1); pbad
10. Visualization regioning the plane [x,y] = meshgrid(4:.1:8,2:.1:4.5); x = x(:); y = y(:); j = classify([x
11. Decision trees tree = treefit(meas(:,1:2), species); [dtnum,dtnode,dtclass] = treeval(tree, meas(:,1:2)); bad = ~strcmp(dtclass,species); sum(bad) / numobs
12. Iris classification tree
13. Тестирование качества классификации resubcost = treetest(tree,'resub'); [cost,secost,ntermnodes,bestlevel] = treetest(tree,'cross',meas(:,1:2),species); plot(ntermnodes,cost,'b-', ntermnodes,resubcost,'r--') figure(gcf); xlabel('Number of terminal nodes');
14. Выбор уровня [mincost,minloc] = min(cost); cutoff = mincost + secost(minloc); hold on plot([0 20], [cutoff cutoff],
15. Оптимальное дерево классификации prunedtree = treeprune(tree,bestlevel); treedisp(prunedtree) cost(bestlevel+1) >> ans = 0.22
17. Скачать презентацию

Слайд 2

Decision Tree functions

Слайд 3

Функция ‘treefit’ - fit a tree-based model for classification or regression.

Syntax: t = treefit(X,y)

Пример:
load fisheriris;
t = treefit(meas,species);
treedisp(t,'names',{'SL' 'SW' 'PL' 'PW'});

Слайд 4

Cluster analysis functions

Слайд 5

Функция kmeans
IDX = kmeans(X,k)
[IDX,C] = kmeans(X,k)
[IDX,C,sumd] = kmeans(X,k)
[IDX,C,sumd,D] = kmeans(X,k)
[...] =

kmeans(...,'param1',val1,'param2',val2,...)
IDX = kmeans(X, k) partitions the points in the n-by-p data matrix X into k clusters. This iterative partitioning minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. Rows of X correspond to points, columns correspond to variables. By default, kmeans uses squared Euclidean distances.
IDX - n-by-1 vector containing the cluster indices of each point.
C - k-by-p matrix cluster centroid locations.
sumd - 1-by-k vector within-cluster sums of point-to-centroid distances.
D - n-by-k matrix of distances from each point to every centroid.

Слайд 6

Параметр ‘distance’
'sqEuclidean‘ - Squared Euclidean distance (default).
'cityblock‘ - Sum of

absolute differences, i.e., L1.
'cosine‘ - One minus the cosine of the included angle between points (treated as vectors).
'correlation‘ - One minus the sample correlation between points (treated as sequences of values).
'Hamming‘ - Percentage of bits that differ (only suitable for binary data).

Слайд 7

Параметр ‘start’
Method used to choose the initial cluster centroid positions, sometimes

known as "seeds". Valid starting values are:
'sample‘ - Select k observations from X at random (default).
'uniform‘ - Select k points uniformly at random from the range of X. Not valid with Hamming distance.
'cluster‘ - Perform a preliminary clustering phase on a random 10% subsample of X. This preliminary phase is itself initialized using 'sample'.
‘Matrix’ - k-by-p matrix of centroid starting locations. In this case, you can pass in [] for k, and kmeans infers k from the first dimension of the matrix. You can also supply a 3-dimensional array, implying a value for the 'replicates' parameter from the array's third dimension.

Слайд 8

Classification
load fisheriris;
gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');

Слайд 9

Linear and quadratic discriminant analysis
linclass = classify(meas(:,1:2), meas(:,1:2),species);
bad = ~strcmp(linclass,species);
numobs =

size(meas,1);
pbad = sum(bad) / numobs;
hold on;
plot(meas(bad,1), meas(bad,2), 'kx');
hold off;

Слайд 10

Visualization regioning the plane
[x,y] = meshgrid(4:.1:8,2:.1:4.5);
x = x(:);
y = y(:);
j =

classify([x y],meas(:,1:2), species);
gscatter(x,y,j,'grb','sod')

Слайд 11

Decision trees
tree = treefit(meas(:,1:2), species);
[dtnum,dtnode,dtclass] = treeval(tree, meas(:,1:2));
bad = ~strcmp(dtclass,species);
sum(bad) /

numobs

Слайд 12

Iris classification tree

Слайд 13

Тестирование качества классификации
resubcost = treetest(tree,'resub');
[cost,secost,ntermnodes,bestlevel] = treetest(tree,'cross',meas(:,1:2),species);
plot(ntermnodes,cost,'b-', ntermnodes,resubcost,'r--')
figure(gcf);
xlabel('Number of terminal nodes');
ylabel('Cost

(misclassification error)')
legend('Cross-validation','Resubstitution')

Слайд 14

Выбор уровня
[mincost,minloc] = min(cost);
cutoff = mincost + secost(minloc);
hold on
plot([0 20],

[cutoff cutoff], 'k:')
plot(ntermnodes(bestlevel+1), cost(bestlevel+1), 'mo')
legend('Cross-validation', 'Resubstitution', 'Min + 1 std. err.','Best choice')
hold off

Слайд 15

Statistics toolbox презентация

Содержание

Decision Tree functions

Функция ‘treefit’ - fit a tree-based model for classification or regression.

Cluster analysis functions

Функция kmeansIDX = kmeans(X,k)[IDX,C] = kmeans(X,k)[IDX,C,sumd] = kmeans(X,k)[IDX,C,sumd,D] = kmeans(X,k)[...] =

Параметр ‘distance’'sqEuclidean‘ - Squared Euclidean distance (default). 'cityblock‘ - Sum of

Параметр ‘start’Method used to choose the initial cluster centroid positions, sometimes

Classificationload fisheriris;gscatter(meas(:,1), meas(:,2), species,'rgb','osd');xlabel('Sepal length');ylabel('Sepal width');

Linear and quadratic discriminant analysislinclass = classify(meas(:,1:2), meas(:,1:2),species);bad = ~strcmp(linclass,species);numobs =

Visualization regioning the plane[x,y] = meshgrid(4:.1:8,2:.1:4.5);x = x(:);y = y(:);j =

Decision treestree = treefit(meas(:,1:2), species);[dtnum,dtnode,dtclass] = treeval(tree, meas(:,1:2));bad = ~strcmp(dtclass,species);sum(bad) /

Iris classification tree

Тестирование качества классификацииresubcost = treetest(tree,'resub');[cost,secost,ntermnodes,bestlevel] = treetest(tree,'cross',meas(:,1:2),species);plot(ntermnodes,cost,'b-', ntermnodes,resubcost,'r--')figure(gcf);xlabel('Number of terminal nodes');ylabel('Cost

Выбор уровня [mincost,minloc] = min(cost);cutoff = mincost + secost(minloc);hold onplot([0 20],

Оптимальное дерево классификацииprunedtree = treeprune(tree,bestlevel);treedisp(prunedtree)cost(bestlevel+1)>> ans = 0.22

Похожие презентации