Skip to main content

Table 2 Description of the used 33 datasets

From: MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences

TF

L seq (bp)

Res

L bs (min, max, round(avg))

N seq

N bs

data group 1 (dg1): 8 real datasets [32]

CREB

200

H

(05, 30, 12)

17

19

SRF

200

H

(09, 22, 12)

20

35

TBP

200

H

(05, 24, 07)

95

95

MEF2

200

H

(07, 15, 10)

17

17

MYOD

200

H

(06, 06, 06)

17

21

ERE

200

M

(13, 13, 13)

25

25

E2F

200

M

(11, 11, 11)

25

27

CRP

105

E

(22, 22, 22)

18

24

data group 2 (dg2): 20 artificial datasets [10]

dm01g

1500

D

(13, 28, 20)

04

07

dm04m

2000

D

(10, 26, 15)

04

09

hm02r

1000

H

(10, 36, 23)

09

11

hm03r

1500

H

(14, 46, 27)

10

15

hm06g

500

H

(06, 14, 08)

09

09

hm08m

500

H

(05, 34, 15)

15

13

hm09g

1500

H

(07, 26, 16)

10

10

hm10m

500

H

(07, 09, 08)

06

11

hm11g

1000

H

(06, 42, 14)

08

19

hm16g

3000

H

(09, 54, 23)

07

07

hm17g

500

H

(10, 18, 15)

11

10

hm20r

2000

H

(06, 71, 17)

35

76

hm21g

1000

H

(10, 23, 13)

05

07

hm24m

500

H

(08, 18, 12)

08

08

hm26m

1000

H

(11, 36, 25)

09

10

mus02r

1000

M

(10, 33, 19)

09

12

mus10g

1000

M

(05, 28, 15)

13

15

mus11m

500

M

(06, 27, 15)

12

15

yst08r

1000

M

(12, 49, 21)

11

14

yst09g

1000

Y

(09, 19, 17)

16

13

data group 3 (dg3): 5 real datasets [33]

CREB

500

H

(05, 30, 12)

17

19

SRF

500

H

(09, 22, 12)

20

36

TBP

500

H

(05, 24, 07)

95

95

MEF2

500

H

(07, 15, 10)

17

17

MYOD

500

H

(06, 06, 06)

17

21

  1. Notations: L seq denotes the average length of the sequences in base pair count (bp), Res is the resource: (D, H, M, Y, E) refer to (drosophila melanogaster, (human, mouse, rat), saccharomyces cerevisiae, e.coli) respectively, L bs denotes the length of the binding sites in bp, N seq is the number of the sequences in the dataset and N bs is the number of the binding sites in the dataset.