Research Question 3

RQ3 : Whether and to what extent have the code clones in deep learning projects caused co-changes?

Glossary

  • POC : Percentage of the clones.
  • PCC : Percentage of co-changed clones.
  • Size : Size of a project.
  • LC : Loc of a clone.
  • PCTC : Percentages of co-changed clones to all clones.
  • PLOC : the number of lines of python code;
  • NormPCTC : the percentage of co-changed in each line of clones;

The association between POC and PCC

Overview

0 - 5% 5 - 10% 10 - 15% 15 - 20% 20 - 25% 25 - 30% > 30%
AvgPCC 7.3% 9.2% 16.7% 11.1% 9.0% 14.8% 49.0%

Parts

  • part 1 : 0 - 5%
POC PCC
deepvariant 4.5% 31.3%
pytorch-lightning 4.0% 0
GPflow 3.4% 0
OpenNMT-py 3.4% 0
torchio 3.2% 0
tensorpack 1.1% 12.5%
Avg 3.2% 7.3%
  • part 2 : 5 - 10%
POC PCC
DeepLabCut 9.5% 16.7%
DeepPavlov 8.7% 2.0%
PySyft 8.4% 0
raster-vision 8.2% 0
MONAI 7.0% 32.3%
spaCy 6.9% 0
ray 6.4% 5.9%
Hub 5.2% 16.7%
Avg 7.6% 9.2%
  • part 3 : 10 - 15%
POC PCC
ludwig 15.0% 38.7%
catalyst 13.5% 39.5%
tensor2tensor 13.4% 0.7%
allennlp 12.2% 7.5%
luminoth 11.9% 4.5%
coach 10.8% 17.9%
clearml 10.5% 17.3%
TTS 10.3% 7.0%
Avg 12.2% 16.7%
  • part 4 : 15 - 20%
POC PCC
tfx 20.0% 3.7%
chainer 19.2% 2.4%
autokeras 16.7% 18.8%
torch-points3d 16.0% 10.2%
stellargraph 15.1% 20.6%
Avg 17.4% 11.1%
  • part 5 : 20 - 25%
POC PCC
nni 23.8% 3.1%
texar 23.1% 17.5%
horovod 22.5% 2.8%
sonnet 22.4% 0
transformers 21.6% 6.6%
keras 21.2% 16.0%
DIG 20.6% 23.7%
addons 20.1% 2.6%
Avg 21.9% 9.0%
  • part 6 : 25 - 30%
POC PCC
TensorLayer 29.7% 14.2%
imgaug 28.8% 15.9%
deepchem 28.2% 4.5%
pyod 25.7% 24.7%
Avg 28.1% 14.8%
  • part 7 : > 30%
POC PCC
tianshou 51.1% 77.9%
ignite 38.9% 7.0%
tflearn 35.4% 62.1%
Avg 41.8% 49.0%

Association between Size and PCC

Subject PLOC PCC
tianshou 21,257 75.3%
tflearn 10,297 62.1%
catalyst 30,581 39.3%
ludwig 42,097 38.7%
MONAI 72,288 32.3%
deepvariant 35,254 30.6%
pyod 10,769 24.7%
DIG 22,257 23.1%
stellargraph 27,816 20.6%
autokeras 10,417 18.8%
coach 24,709 17.9%
texar 31,757 17.1%
clearml 84,867 17.3%
DeepLabCut 30,419 16.7%
Hub 22,636 16.7%
keras 146,799 15.9%
imgaug 89,115 15.9%
TensorLayer 29,952 14.2%
tensorpack 24,811 12.5%
torch-points3d 25,560 10.2%
allennlp 56,320 7.5%
TTS 25,247 7.0%
ignite 41,000 6.9%
transformers 449,759 6.6%
ray 260,916 5.9%
luminoth 11,467 4.5%
deepchem 60,320 4.5%
tfx 80,696 3.7%
nni 80,385 3.1%
horovod 36,344 2.8%
addons 28,214 2.6%
chainer 132,991 2.4%
DeepPavlov 27,778 2.0%
tensor2tensor 86,231 0.7%
GPflow 20,484 0
OpenNMT 14,224 0
PySyft 58,479 0
pytorch-lightning 60,251 0
raster-vision 16,718 0
sonnet 13,172 0
spaCy 76,679 0
torchio 12,408 0
DeepFaceLab 12,831 0
faceswap 28,756 0
pytorchvideo 19,058 0

We calculated the Pearson correlation coefficient between the size and percentage of co-changed clones of each project,which get a value of 0.05.

Association between LC and co-changed clones.

Association between LC and PCTC.

Overview

5-20 21-40 41-60 61-80 81-100 101-∞
PCTC 5.7% 11.9% 8.9% 33.5% 48.5% 60.4%

Distribution of the number of each co-modified clone type.

5-20 21-40 41-60 61-80 81-100 101-∞
Type1 41 159 38 24 58 78
Type2 313 123 30 32 12 82
Type3 1512 1753 582 538 253 598
Total 1866 2035 650 594 323 758

Distribution of the number of each co-modified clone type in every project

RQ3-1

Distribution of the number of each clone type.

5-20 21-40 41-60 61-80 81-100 101-∞
Type 1 2665 1387 208 54 92 164
Type 2 5830 1341 357 97 28 117
Type 3 24055 14429 6746 1621 546 973
Total 32550 17157 7311 1772 666 1254

Distribution of the number of each clone type in every project.

RQ3-2

Association between LC and NormPCTC.

Overview

5-20 21-40 41-60 61-80 81-100 >101
NormPCTC 8.7% 13.5% 18.8% 33.4% 44.4% 49.3%

Distribution of the gross LOC of each co-changed clone type in every group.

5-20 21-40 41-60 61-80 81-100 101-∞
Type 1 636 1209 1007 1048 1999 1363
Type 2 2269 2503 1359 1329 377 2473
Type 3 6145 14358 9792 8801 5978 19218
Total 9050 18070 12158 11178 8354 23054

Distribution of the gross LOC of each co-changed clone type in every group and project

RQ3-3

Distribution of the gross LOC of each clone type in every group.

5-20 21-40 41-60 61-80 81-100 101-∞
Type 1 12696 13866 5226 2777 2969 2040
Type 2 25450 22074 8937 4372 1630 6575
Type 3 65484 98066 50401 26344 14200 38155
Total 103630 134006 64564 33493 18799 46770

Distribution of the gross LOC of each clone type in every group and project.

RQ3-4

0%