Translational Robustness of Neural Networks Trained for Transcription

Factor Binding Site Classiﬁcation

Gergely Pap

and Istv

an Megyeri

University of Szeged, Hungary

Keywords:

Transcription Factor Binding Site, Convolutional Neural Networks, Adversarial Training, Sequence Motifs.

Abstract:

Classifying DNA sequences based on their protein binding proﬁles using Deep Learning has enjoyed con-

siderable success in recent years. Although these models can recognize binding sites at high accuracy, their

underlying behaviour is unknown. Meanwhile, adversarial attacks against deep learning models have re-

vealed serious issues in the ﬁelds of image- and natural language processing related to their black box nature.

Analysing the robustness of Transcription Factor Binding Site classiﬁers urges us to develop adversarial at-

tacks for them. In this work, we introduce shifting as an adversarial data augmentation so that it quantiﬁes

the translational robustness. Our results show that despite its simplicity our attack can signiﬁcantly affect

performance. We evaluate two architectures using two data sets with three shifting strategies and train robust

models with adversarial data augmentation.

1 INTRODUCTION

1.1 Brief Biological Overview

One of the most important regulators in a cell’s biol-

ogy are Transcription Factors (TFs) (Stormo, 2000).

TFs are responsible for key processes regarding gene

expression, understanding the nature of their work-

ings is of paramount importance in microbiology and

related ﬁelds. TFs are proteins which can bind to

DNA strands to facilitate transcription: the process

of turning DNA nucleotide sequence data into RNA.

TFs generally have binding sites associated with them

called Transcription Factor Binding Sites (TFBSs).

These are identiﬁable regions, where a TF usually

binds the DNA strands. A TFBS is around 10 nu-

cleotide base pair in length. The nucleotides (basic

building blocks of DNA; A: adenine, C: cytosine, G:

guanine, T: thymine) inside the TFBSs are conserved

sequences, the sequence pattern that they form is re-

peated several times in the genome of biological or-

ganisms. This pattern of the order of nucleotides is

also called the TFs’ motif. Locating and detecting

these motifs and TFBSs are important steps to better

understand TFs’ biological mechanisms and to exam-

ine these key control points’ effects on gene regula-

https://orcid.org/0000-0002-6641-5845

https://orcid.org/0000-0002-7918-6295

tion.

1.2 Connection to Deep Learning

Through Next Generation Sequencing techniques, the

number of available data sets increased rapidly (Bern-

stein et al., 2012), thus it was feasible to use deep

learning to examine nucleotide sequence data. Deep

Learning models have achieved considerable suc-

cess in the ﬁeld of TFBS classiﬁcation (Zhou and

Troyanskaya, 2015). At ﬁrst, Convolutional Neu-

ral Networks (CNNs) (Alipanahi et al., 2015; Zeng

et al., 2016) were applied to nucleotide sequence

data, then Recurrent Neural Networks (RNNs) (Lan-

chantin et al., 2017) such as Long Short-Term Mem-

ory (LSTM) cells were used and in recent years, hy-

brid architectures containing both convolutional and

recurrent layers made their impact on this task (Has-

sanzadeh and Wang, 2016; Quang and Xie, 2019;

Park et al., 2020). The dominant success of the atten-

tion mechanism improved performance and opened a

new way to interpret TFBS classiﬁer network deci-

sions.

1.3 Relation to Interpretability and

Adversarial Robustness

Recent studies analyse the behaviour of trained TFBS

classiﬁers while searching for interpretable features

Pap, G. and Megyeri, I.

Translational Robustness of Neural Networks Trained for Transcription Factor Binding Site Classiﬁcation.

DOI: 10.5220/0010769100003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 3, pages 39-45

ISBN: 978-989-758-547-0; ISSN: 2184-433X

Expectation

Observation

TFBS

Binding ✓

Context window

Sequences

Model Prediction

Binding ✓

Not Binding ✗

Context window - shifted by 10

Context window

Figure 1: Visualization of the expected and observed transcription factor binding site classiﬁcation. The motifs (made of

consensus nucleotides) are generally located near the center of the sequences (e.g., starting from the 45. position with a length

of 10 bases in a 100 long sequence). Translating the sequence (as in shifting the position horizontally) should not inﬂuence

performance. However, we observe signiﬁcant accuracy decrease when utilizing transformations such that the context window

is moved in either direction by several nucleotides. Blue lines denote the large neighborhood of the TFBS (most of which is

not used for training). Red lines show the 100 nucleotides passed through the CNN. Blue arrows mark the position shift and

the red lines show the new 100 nucleotides with the TFBS. (The only difference between the original and the shifted sequence

is 20 nucleotides, 10 disappearing from one end and 10 new appearing on the other due to the shift of the context window,

i.e., the sequence input of the model. The TFBS and the nucleotides close to it are unaltered.)

(Koo and Ploenzke, 2020) (Lanchantin et al., 2017)

(Zhou and Troyanskaya, 2015; Alipanahi et al.,

2015). Such features might help biologists to bet-

ter characterize speciﬁc TF binding events. Although

neural networks achieve remarkable performance on

TFBS classiﬁcation, they are black box models. That

is, the underlying behaviour is unknown. Thus ex-

amining TFs using Deep Neural Networks (DNNs)

might require alternative approaches.

In other domains such as image classiﬁcation

or natural language processing, adversarial examples

revealed that state of the art models are prone to

learn non-robust features. These examples are gener-

ated from natural samples using semantics-preserving

transformations in such a way that the model will mis-

label the modiﬁed input. For images, a commonly

used transformation is applying tiny norm bounded

additive noise (Brendel et al., 2019). In NLP, deﬁning

the modiﬁcation is more challenging, but it is still fea-

sible to ﬁnd adversarial examples. A recent approach

replaces input words by their synonyms to mislead the

model (Morris et al., 2020). This sensitivity to adver-

sarial examples introduces concerns regarding their

interpretability.

These two results seem contradictory: the high

sensitivity of the models in other domains and the in-

terpretabality of the TFBS classiﬁers. In this work,

we aim to investigate this problem more deeply and

examine TFBS classiﬁers from a robustness perspec-

tive.

1.4 Examining and Evaluating

Translations

To the best of our knowledge no experiments or stud-

ies were communicated regarding TFBS and network

vulnerability. Our contributions in this work are as

follows:

• We apply input shifting to ﬁnd adversarial exam-

ples for state-of-the-art TFBS classiﬁers.

• We show that these models are sensitive to in-

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

put shifting despite their excellent performance on

unmodiﬁed data.

• We propose a training method inspired by adver-

sarial training which improves the models’ robust-

ness against these kinds of attacks.

Our experiments conducted on two datasets imply

that the features considered important by the origi-

nal networks are not necessarily the humanly desired

ones. Our code is available from

2 TRANSCRIPTION FACTOR

BINDING SITE

CLASSIFICATION

In this section we give an introductory overview in

connection to TFBS and CNNs. The main concepts

necessary to facilitate further understanding of the ad-

versarial robustness of TFBSs classiﬁer models are

explained below. A reader well-versed in the litera-

ture of TFBS classiﬁcation might wish to skip to Sec-

tion 3.

2.1 Default Approach

The success of DeepBind (Alipanahi et al., 2015) ush-

ered many follow-up works to use CNNs for TFBS

detection. The TFBS data sets usually contain two

classes, one of which consists of sequences with TF-

BSs (three positive example sequences belonging to

Sp1 are shown in Figure 2). The learner is expected

to use convolutional ﬁlters over the four channels

(here the nucleotides of A,C,G,T - instead of an im-

age’s colour channels of R,G,B). One hypothesis for

the success of convolutional neurons regarding TFBS

classiﬁcation is the motif scanner idea: the weights

inside the neurons are able to learn a representation

(very similar to) the sequence logo of a TF. That is,

for some neurons the weights over the channels can

be transformed to a format closely resembling a PWM

(Position Weight Matrix). PWMs are ways to store

and present information about the nucleotides of a

binding site. In a PWM each row corresponds to

one symbol of the alphabet, e.g., nucleic acids, and

each column corresponds to one position in the pat-

tern (Zhang, 2013). For a sequence length of 15, 4

rows of the nucleotides are shown, where the value in

a given position (column) and nucleotide (row) means

the probability (log-likelihood) of that nucleotide’s

occurrence at that index in the binding site. A simple

visual explanation is given in Figure 3. To summarize,

https://github.com/istvanmegyeri/tf translational

robustness

the learned weights of a convolutional neuron can be

extracted and numerically transformed to be similar

to a 2D matrix of a TF’s PWM.

2.2 Issue of Performance Regarding the

motif scanners

Given that some of the neurons learn representa-

tions similar to PWMs, when such a neuron con-

volves over a sequence containing the corresponding

TFBS, it should produce a high activation and the net-

work would be expected to classify the instance cor-

rectly due to the motif scanner observation. Further-

more most TFBSs are located near the middle of a

sequence (assuming a preprocessed dataset) and are

recognised relatively well. Since convolutions are

translation-invariant, moving the TFBS along the se-

quence should not result in a harder task\lower model

performance. The above-described shift of a TFBS

is similar to a pixel-wise location transformation of

an MNIST digit (Kauderer-Abrams, 2017), in which

case, if the digit is moved several pixels in either di-

rection, it should still be recognised and classiﬁed by

the CNN model correctly. However when the input

sequences are shifted, the model’s performance de-

creases (Figure 1).

3 ATTACKS AND DEFENCES

We selected shifting (lengthwise translation) as the

input modiﬁcation for our attacks. Our reasoning is

as follows: Firstly, TFBS classiﬁers are commonly

trained using varying input lengths. In (Alipanahi

et al., 2015; Zeng et al., 2016) 101 base pairs (bp)

while in (Zhou and Troyanskaya, 2015; Park et al.,

2020) 1000 bp are used. Usually, the longer the in-

put lengths, the better the model’s performance. Sec-

ondly, due to the fact that the convolution operation

is translation invariant, it seems reasonable to expect

CNNs to be also resilient to small shifting. Above

all, we can control arbitrarily the preservation or de-

struction of the semantics by deﬁning a bound on the

shifted positions

For each shifting strategy, we assume that the in-

put contains the binding site in the middle of the se-

quence and is longer than what the model expects

(e.g., the sequences in the data all have a length of

100 bp, and the models were trained using only 80 bp,

so that we have a window of 20 bp for translation).

We exclude slices that would interfere with the mo-

tif (i.e., remove nucleotides from the middle part of

http://jaspar.genereg.net/matrix/MA0079.3/

Translational Robustness of Neural Networks Trained for Transcription Factor Binding Site Classiﬁcation

Three example sequences containing a binding site of

SP1

aggcggcccgccgaggcgcaggtgcacagcccggccggcccgcccgcgccGCCCCGCCCCTggcggtagcacacgccccgccttcccatccggagcgtgacgtcaaggggg

cgggaaagggactggggcgtggaggcgatctgggggcgggtcaaacgctgGCCCCGCCCCTctgcaggccccgtccctcagtgtccatacgtgcttgtgctagggcgcctg

agaggcgagtgatcggacaacttctgcacctcccggaagcggctgcgccgGCCCCGCCCCTccagaggcaggcgtggcctcatgaataatgaatcgtgcgggggaggggcc

Figure 2: A transcription factor binding site is a sequence

pattern of nucleotides that can be found in several places in

a biological entity’s DNA and is bound by a speciﬁc tran-

scription factor protein to regulate gene expression.

A 0.098 0.000 0.011 0.000 0.000 0.156 0.000 0.000 0.000 0.121 0.075

C 0.204 0.712 0.767 1.000 0.992 0.000 1.000 1.000 0.765 0.728 0.569

G 0.489 0.074 0.000 0.000 0.000 0.529 0.000 0.000 0.000 0.000 0.084

T 0.208 0.215 0.221 0.000 0.008 0.315 0.000 0.000 0.235 0.151 0.272

Figure 3: Position Weight Matrix (PWM) and sequence

logo for MA0079.3

(Fornes et al., 2019).

the sequence). Based on this, we deﬁned three shift-

ing strategies. We denote them as No Shift: removing

an equal number of bases from both sides. Rnd: the

starting index is randomly selected. Worst: the shift

producing the highest loss value.

The Worst method leads to the largest increment

in network loss. For larger input sequences, we might

relax the worst criteria and simply use the one pro-

ducing the highest loss from n random shifts. We de-

note it by W-of-n. From the adversary’s point of view,

the Worst is a black box attack which uses only the

model’s output probability to seek for adversarial in-

put.

After evaluating the models with the above-

mentioned attacks, we incorporated all shifting strate-

gies into the training process in order to make the

models robust or at least less sensitive to these

changes. During training we used one of the follow-

ing strategies to help the networks learn more robust

features: shortened the sequences from both ends by

the same value [No Shift], shifted each sequence at

a random index [Rnd], shifted the sequence at the

position which gave the highest loss with respect to

the current model [Worst]. The maximum amount by

which the sequences could be shifted were calculated

by subtracting the models’ input length from the orig-

inal sequence length.

4 EXPERIMENTAL SETUP

In this section, we detail our experiments. First, we

describe the datasets, then the network architectures

and ﬁnally the evaluation metrics.

We used two datasets. The smaller dataset was

the ENCODE-DREAM in vivo Transcription Factor

Binding Site Prediction Challenge, we acquired the

sequences from

(Zeng et al., 2016) and denote it

as D

. The set contains two machine learning tasks,

in which the entities belonging to the negative class

are different. The ’motif discovery’ data set uses

the shufﬂed versions of the binding examples as the

other class while the ’motif occupancy’ contains se-

quences, that were not bound in the TF binding ex-

periments (ChIP-seq). From the D

TFBS data set 3

TFs were selected, each one of them has a discov-

ery and an occupancy task, and their learning pro-

ﬁles regarding accuracy shows a signiﬁcant differ-

ence. The original length of the sequences in the

database is 101 bp. We used the data sets from the dis-

covery and the occupancy tasks for the following TFs:

SydhImr90MafkIggrab (M), SydhK562Znf143Iggrab

(Z), HaibH1hescSp1Pcr1x (S) with lengths of 75, 90,

95 and 101. The amount of shifting was limited to

preserve the information from the central region.

The larger machine learning task (D

) is from

(Zhou and Troyanskaya, 2015). The data was down-

loaded from

. D

has 690 labels associated with 4.4

million train entities. Chromosomes 8 and 9 were sep-

arated as the test set (amounting to 0.45m examples).

Here we experiment with the inputs’ lengths also, but

instead of only reduction, we appended 100-100 nu-

cleotides on both sides of the original training entities

to have the opportunity to examine a larger scope. We

used Genome Reference Consortium Human Build 37

(GRCh37) to obtain the extra sequences. Beside the

1000 length bp, we also used a 900 length version of

this dataset.

Building upon the architectural choices of Deep-

Bind and the CNN from (Zeng et al., 2016), we con-

ducted experiments on a two-layer convolutional net-

work as our base(line) main subject for the task of

binary classiﬁcation. The network has two convo-

lutional layers with ﬁlter size 256 and 64, and with

kernel size 24 and 12 respectively, each followed by

a ReLU activation. Then a global max pooling and

two Dense layers are applied with 500 and 2 neu-

rons with ReLU and Softmax activation respectively.

The interpretability of convolutional kernels was es-

tablished in the DeepBind study and the fact that

(some) convolutional neurons learn TFBS nucleotide

http://cnn.csail.mit.edu/

http://deepsea.princeton.edu/help/

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

patterns enjoyed considerable acceptance from the

community. In theory, a properly trained model’s

ﬁrst layer kernels (or a subset of them) would en-

code the 4xL

bindingsite

(or motif) information respec-

tive of a given TF. This would enable CNNs to learn

new motifs and recognise unknown nucleotide se-

quence patterns from binding sites. Following the

machine learning task for the DeepSea article’s data

base (Zhou and Troyanskaya, 2015) and utilizing the

SOTA TbiNet (Park et al., 2020) architecture, we

tested the CNN+LSTM hybrid network, which also

employs an attention mechanism. The interpretability

of such an architecture could very well be established

with trials, passing through gradients and (layer) ac-

tivation visualizations. We tried to preserve the orig-

inal parameters and hyper-parameters of the learner

as much as possible. As this architecture contains a

CNN feature extractor, an attention module and a re-

current (bidirectional LSTM) layer to learn regulatory

grammar, we hoped to gather supporting evidence

about the workings of the model. Due to the attention

layer, it is expected from the model (at least to some

degree) to be able to recognise important parts of the

input for successful classiﬁcation. As we have seen in

the experimental results in Tables 1 and 2, relying on

the motif scanner hypothesis might not be the most

prudent way to unravel these TFBS models. How-

ever, for both interpretability cases, we hypothesize

that the networks learn a lot of ”noise” or otherwise

humanly uninterpretable features. These might just be

useful for the separation of the training sets and gener-

alise poorly to other unseen examples. The evaluation

metrics for the D

data set are Area Under the Receiv-

ing Operating Characteristics (AUROC) and the area

under the precision-recall curve (AUPR). For the D

dataset, we use simple accuracy.

5 RESULTS

According to the experimental setups, we trained the

corresponding architectures on both datasets and for

each shifting strategy. Then we evaluated the obtained

models using the three shifting strategies again on the

test set. That is for each task, we have three models

and three evaluation modes.

The results for the D

dataset are in Table 1. For

comparison, we present the results for the vanilla

model that is trained on unmodiﬁed sequence length

(101 bp). If we use the no shift strategy, the perfor-

mance remains almost the same even for the smallest

sequence length 75. The largest drop is 0.042 for S-

75, while the smallest is 0.0102. On average the re-

duction is 0.0195. It conﬁrms that the semantic of the

input is kept even for the smallest sequence length.

However, the worst-case performance of these mod-

els is different. Even at the largest examined sequence

lengths, a more noticeable performance degradation is

present, though the relative input length difference is

modest: 95 vs 101.

Considering other strategies, we see better re-

sults on the worst-case performance. It is somewhat

improved, conceding that we apply random shifting

during training. However, worst-case performance

is highest when worst shift is involved at training

time. This conﬁrms the model can learn translation-

invariant features, but those are used only when ad-

versarial training is applied. In some cases during

our training runs, we noticed that the models were un-

able to minimize the loss on the worst shifted training

data. Removing the regularization solved the prob-

lem except for the S-75. We hypothesize that learning

translation-invariant features requires more capacity

from the models.

In Table 2, we can see similar tendencies on the

dataset using the attention based TbiNet model.

Removing 100 bp has negligible impact for no shift

evaluation. In contrast, the performance according to

both metrics drops signiﬁcantly when the worst shift-

ing strategy is applied. Even simple random shifting

causes remarkable degradation.

6 CONCLUSIONS

After examining the effect of adversarial input shift-

ing for a Transcription Factor Binding Site classiﬁ-

cation task, a steady drop in performance can be ob-

served for a simpler CNN and for a more complex

CNN+LSTM model with an attention mechanism.

Both learners are supposed to be able to handle sim-

ple translation or random starting position picking for

shorter length entities, however they are not perform-

ing well under the above-mentioned conditions. We

show that incorporating these modiﬁed examples into

the training process results in models that are more ro-

bust and can better cope with the challenges that the

augmented sequences pose.

ACKNOWLEDGEMENTS

This research was supported by the Ministry of In-

novation and Technology NRDI Ofﬁce within the

framework of the Artiﬁcial Intelligence National Lab-

oratory Program and the Artiﬁcial Intelligence Na-

tional Excellence Program (grant 2018-1.2.1-NKP-

Translational Robustness of Neural Networks Trained for Transcription Factor Binding Site Classiﬁcation

Table 1: Results on D

dataset using the CNN and three shifting methods for evaluation and training. S, M and Z mean

HaibH1hescSp1Pcr1x, SydhImr90MafkIggrab and SydhK562Znf143Iggrab respectively. Each 3 by 3 block represents one

task from the dataset for a speciﬁc sequence length. * means that the model was trained without regularization. The perfor-

mance drops if worst shift is applied, which implies that the model uses features which are not position-invariant. Although

these models would be able to learn invariant features, they only seem to do so when worst shifting is involved at training.

TF train

discovery occupancy

evaluation strat. evaluation strat.

No Shift Rnd Worst No Shift Rnd Worst

S-75

No Shift 0.7158 0.7124 0.5029 0.7435 0.7399 0.5449

Rnd 0.7240 0.7240 0.5235 0.7749 0.7739 0.5677

Worst 0.5015 0.5015 0.5015 0.7301 0.7290 0.6423

S-90

No Shift 0.7321 0.7316 0.6394 0.7582 0.7582 0.6600

Rnd 0.7542 0.7558 0.6497 0.7653 0.7679 0.6765

Worst 0.7285* 0.7324* 0.6600* 0.7489 0.7485 0.6958

S-95

No Shift 0.7466 0.7463 0.6834 0.7542 0.7544 0.7017

Rnd 0.7468 0.7456 0.6894 0.7505 0.7507 0.6891

Worst 0.7607 0.7620 0.7212 0.7563 0.7571 0.7265

S-101 No Shift 0.7578 n/a. n/a. 0.7537 n/a. n/a.

M-75

No Shift 0.9292 0.9229 0.8493 0.7302 0.7302 0.5972

Rnd 0.9302 0.9270 0.8549 0.7336 0.7326 0.6209

Worst 0.9268 0.9213 0.8658 0.7239 0.7242 0.6848

M-90

No Shift 0.9303 0.9296 0.9061 0.7368 0.7349 0.6485

Rnd 0.9337 0.9337 0.9141 0.7382 0.7373 0.6734

Worst 0.9327 0.9322 0.9190 0.7372 0.7377 0.7193

M-95

No Shift 0.9352 0.9342 0.9218 0.7416 0.7416 0.6902

Rnd 0.9378 0.9382 0.9255 0.7433 0.7442 0.7107

Worst 0.9363 0.9358 0.9272 0.7385 0.7388 0.7226

M-101 No Shift 0.9394 n/a. n/a. 0.7478 n/a. n/a.

Z-75

No Shift 0.8655 0.8618 0.7505 0.6484 0.6484 0.4825

Rnd 0.8760 0.8703 0.7656 0.6590 0.6546 0.4759

Worst 0.8726 0.8679 0.7969 0.6177 0.6177 0.5701

Z-90

No Shift 0.8777 0.8790 0.8327 0.6615 0.6645 0.5649

Rnd 0.8748 0.8739 0.8264 0.6646 0.6653 0.5805

Worst 0.8791 0.8775 0.8540 0.6606 0.6599 0.6043

Z-95

No Shift 0.8821 0.8802 0.8519 0.6671 0.6675 0.6064

Rnd 0.8854 0.8885 0.8654 0.6676 0.6664 0.6125

Worst 0.8818 0.8811 0.8614 0.6635 0.6648 0.6327

Z-101 No Shift 0.8822 n/a. n/a. 0.6686 n/a. n/a.

Table 2: Results on the D

dataset using TBiNet with 900 and 1000 as sequence lengths. The performance drops according to

both metrics if worst shift is applied, which implies that the model uses features which are not position-invariant. The worst

performance improves on the test set, if the model was trained on augmented data.

length train

AUROC AUPR

No Shift Rnd W-of-20 No Shift Rnd W-of-20

900

No Shift 0.9423 0.9133 0.7702 0.3168 0.2305 0.0693

Rnd 0.9426 0.9364 0.8881 0.3088 0.2530 0.1086

W-of-20 0.9319 0.9303 0.9055 0.2258 0.2178 0.1431

1000

No Shift 0.9428 0.9302 0.8605 0.3185 0.2694 0.1268

Rnd 0.9453 0.9409 0.9105 0.3209 0.2860 0.1644

W-of-20 0.9380 0.9367 0.9194 0.2680 0.2564 0.1838

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

2018-00008), as well as grant NKFIH-1279-2/2020.

Both authors contributed equally.

REFERENCES

Alipanahi, B., Delong, A., Weirauch, M. T., and Frey, B. J.

(2015). Predicting the sequence speciﬁcities of DNA-

and rna-binding proteins by deep learning. Nature

biotechnology, 33(8):831–838.

Bernstein, B., Birney, E., Dunham, I., Green, E., Gunter, C.,

and Snyder, M. (2012). Consortium ep. an integrated

encyclopedia of DNA elements in the human genome.

Nature, 489(7414):57–74.

Brendel, W., Rauber, J., K

ummerer, M., Ustyuzhaninov, I.,

and Bethge, M. (2019). Accurate, reliable and fast ro-

bustness evaluation. In Wallach, H., Larochelle, H.,

Beygelzimer, A., d Alch

e-Buc, F., Fox, E., and Gar-

nett, R., editors, Advances in Neural Information Pro-

cessing Systems, volume 32. Curran Associates, Inc.

Fornes, O., Castro-Mondragon, J. A., Khan, A.,

van der Lee, R., Zhang, X., Richmond, P. A., Modi,

B. P., Correard, S., Gheorghe, M., Barana

c, D.,

Santana-Garcia, W., Tan, G., Ch

eneby, J., Ballester,

B., Parcy, F., Sandelin, A., Lenhard, B., Wasser-

man, W. W., and Mathelier, A. (2019). JASPAR

2020: update of the open-access database of transcrip-

tion factor binding proﬁles. Nucleic Acids Research,

48(D1):D87–D92.

Hassanzadeh, H. R. and Wang, M. D. (2016). Deeperbind:

Enhancing prediction of sequence speciﬁcities of dna

binding proteins. In 2016 IEEE International Con-

ference on Bioinformatics and Biomedicine (BIBM),

pages 178–183. IEEE.

Kauderer-Abrams, E. (2017). Quantifying translation-

invariance in convolutional neural networks. arXiv

preprint arXiv:1801.01450.

Koo, P. K. and Ploenzke, M. (2020). Deep learning for

inferring transcription factor binding sites. Current

Opinion in Systems Biology, 19:16–23.

Lanchantin, J., Singh, R., Wang, B., and Qi, Y. (2017).

Deep motif dashboard: Visualizing and understand-

ing genomic sequences using deep neural networks.

In Paciﬁc Symposium on Biocomputing 2017, pages

254–265. World Scientiﬁc.

Morris, J., Liﬂand, E., Yoo, J. Y., Grigsby, J., Jin, D., and

Qi, Y. (2020). Textattack: A framework for adversar-

ial attacks, data augmentation, and adversarial train-

ing in nlp. In Proceedings of the 2020 Conference on

Empirical Methods in Natural Language Processing:

System Demonstrations, pages 119–126.

Park, S., Koh, Y., Jeon, H., Kim, H., Yeo, Y., and Kang, J.

(2020). Enhancing the interpretability of transcription

factor binding site prediction using attention mecha-

nism. Scientiﬁc Reports, 10(1):13413.

Quang, D. and Xie, X. (2019). Factornet: A deep learn-

ing framework for predicting cell type speciﬁc tran-

scription factor binding from nucleotide-resolution se-

quential data. Methods, 166:40–47. Deep Learning in

Bioinformatics.

Stormo, G. D. (2000). DNA binding sites: representation

and discovery. Bioinformatics, 16(1):16–23.

Zeng, H., Edwards, M., Liu, G., and Gifford, D.

(2016). Convolutional neural network architectures

for predicting DNA-protein binding. Bioinformatics,

32:i121–i127.

Zhang, X. (2013). Position weight matrices. In Dubitzky,

W., Wolkenhauer, O., Cho, K.-H., and Yokota, H., ed-

itors, Encyclopedia of Systems Biology, pages 1721–

1722. Springer New York, New York, NY.

Zhou, J. and Troyanskaya, O. (2015). Predicting effects

of noncoding variants with deep learning-based se-

quence model. Nature methods, 12.

Translational Robustness of Neural Networks Trained for Transcription Factor Binding Site Classiﬁcation