Dynamic page for System – Amsterdam Algoritmeregister

City management

Overview

Reporting issues in public space

Contact information

Department
Research, Information & Statistics (OIS)
Contact team for inquiries
Adviser R&D (Adviseur Onderzoek en ontwikkeling)
External suppliers
Developed in-house

Contact email
CIO-office@amsterdam.nl
Contact phone
+31 20 624 1111

More detailed information on the system

Here you can get acquainted with the information used by the system, the operating logic, and its governance in the areas that interest you.

Datasets Show More Show Less

Key data sources utilised in the development and use of the system, their content and utilisation methods. The different data sources are separated by subheadings.

Name

Reports

Dataset description

The dataset consists of reports that have been made previously. The initial training set consisted of 300,000 reports. The model is periodically trained with new reports and corrections on prior reports. If the Action Service Center or other departments receive an incorrect categorisation (see Human Oversight), they can manually correct the mistake in the system. We are investigating whether additional training of the algorithm can be automated in the future.

The dataset cannot be published automatically via this register. Because the data is collected in a free text field, it is possible that the reporter enters personal data, even though this is explicitly not requested.

Personal data

No personal data

Name

Contact information for follow-up questions

Dataset description

The person reporting can choose to provide their phone number and/or email address if they want to be notified of any updates or for follow-up questions. The information is kept no longer than is necessary for this specific purpose. This information is not used in the algorithm.

For our privacy policy, see: https://www.amsterdam.nl/privacy/specifieke/privacyverklaringen-wonen/meldingen-overlast-privacy/

Personal data

Identified

Human oversight Show More Show Less

Human oversight during the use of the service.

All reports that are assigned by the system to a specific category with less than 40% certainty are forwarded to the Action Service Center. An employee of the Action Service Center will assess which category is best suited for the report. Also, if reports are tagged to the wrong category, the department responsible will manually recategorise them.

Data processing Show More Show Less

The operational logic of the automatic data processing and reasoning performed by the system and the models used.

System architecture description

The text of the report is broken down into single words. The model has been trained to recognize the weight of each word by using ‘TF-IDF’ or ’term frequency-inverse document frequency’. This representation will create weights for words that show how unique they are for the specific citizen report compared to the overall collection. A word such as ’the’ will get a low weight, and a word such as ‘garbage’ will get a higher weight. This makes it perfect for classes that have very specific words describing them. It also helps with bigrams or unigrams (Like: “thank you”, “please”) occurring in all documents not to affect the classification too much.

A logistic regression (a machine-learning technique) of that combination of words is then used to determine which category is most likely to fit, and therefore which department within the municipality needs to act on the report.

System architecture image

reporting_issues_in_public_space.png

Output data repository

http://ec2-54-171-141-211.eu-west-1.compute.amazonaws.com

Performance

This algorithm can detect very accurately which category a combination of words belongs to; the algorithm has a score of 0.88 (macro-weighted F1 score). Other methods have also been implemented (W2V, CNN + LSTM, BERT) but have been found to perform less. More information: https://medium.com/maarten-sukel/how-to-use-machine-learning-for-the-classification-of-citizen-service-requests-b71159a85f36

Source code repository

https://github.com/maartensukel/example-textual-classification-citizen-reports

Non-discrimination Show More Show Less

Promotion and realisation of equality in the use of the service.

The algorithm is language-based. If someone does not speak Dutch or uses ‘unusual’ words, the algorithm may not recognize those words. In that case, the Action Service Center will assess the report, and the algorithm will be retrained if needed.

Language

References Show More Show Less

Legal basis description

https://www.government.nl/topics/municipalities/municipalities-tasks

Live service address

https://meldingen.amsterdam.nl/incident/beschrijf

https://www.amsterdam.nl/privacy/specifieke/privacyverklaringen-wonen/meldingen-overlast-privacy

Risk management Show More Show Less

Risks related to the system and its use and their management methods.

This is a low-risk algorithm. The algorithm is meant to speed up the process of allocating reports to the relevant department. That means that if the algorithm is incorrect in allocating a report, it will take a bit more time for the report to reach the right destination, but no longer than it would have before the use of the algorithm.

Providing personal information is optional if the person reporting wants to get updates. This information is not used in the algorithm. Encryption is used to store this information securely.

Risk name

Personal data (by accident)

Risk description

The person reporting does so in a free text field. There is a posibility that they enter personal data.

Frequency

Low

completed

Risk mitigation description

There is an explicit warning above the text field not to enter any personal data

Probability

Low

Scale

Low

Severity

High

Risk type

Privacy loss

Risk name

Personal data (voluntary)

Risk description

Personal data (e-mail address and phone number) are registerd if the person reporting chooses to provide them in a specific field.

Personal data are registered (telephone number and email address) if the reporter chooses to provide it.

Frequency

Low

completed

Risk mitigation description

The data is stored securely in concordance to the Baseline Information Security Government (BIO) standard

Probability

Low

Scale

Low

Severity

High

Risk type

Privacy loss

Was this information useful?

City management

Overview

Reporting issues in public space

Tags

Contact information

Department

Contact team for inquiries

External suppliers

Contact email

Contact phone

More detailed information on the system