Holiday rental housing fraud risk
TagsReportSurveillance and Enforcement
Amsterdam has limited living space; both for citizens and visitors. If a citizen wants to rent out their home or houseboat to tourists, they need to meet certain requirements. For example, they can do so for a maximum of 30 nights per year and a maximum of 4 people at a time. They must also report it to the municipality.
Not everyone adheres to those conditions. The municipality sometimes receives reports, for instance from neighbours or rental platforms, who suspect that a home has been rented out without meeting those requirements. If such a report is filed, employees of the department of Surveillance & Enforcement can start an investigation.
From 1 July 2020, a pilot will be carried out for six months with an algorithm that supports the employees of the department of Surveillance & Enforcement in their investigation of the reports made concerning possible illegal holiday rentals. The algorithm helps prioritize the reports so that the limited enforcement capacity can be used efficiently and effectively. By analyzing the data of related housing fraud cases of the past 5 years, it calculates the probability of an illegal holiday rental situation on the reported address.
- Housing Department, Surveillance & Enforcement
Contact person for inquiries
- Team Holiday Rentals
- Developed in-house
- +31 20 624 1111
More detailed information on the system
Here you can get acquainted with the information used by the system, the operating logic, and its governance in the areas that interest you.
- DatasetsShow MoreShow Less
Key data sources utilised in the development and use of the system, their content and utilisation methods. The different data sources are separated by subheadings.
Identity and housing rights data
Minimized dataset from the Personal Records Database (BRP), showing information about the identity and housing rights of the residents; specifically:
– date of birth;
– date of residence in Amsterdam
– date of residence at the address;
– family composition;
– date of death.
Minimized dataset from the Registry of Addresses and Buildings (BAG), showing information about the building; specifically;
– address, street code, postal code;
– description of the property;
– Amsterdam BAG-code, national BAG-code;
– the type of home (rent, social rent / free sector, owner-occupied);
– number of rooms;
– floor surface area;
– floor number on which the front door of the apartment resides;
– number of building layers;
– description of the floor of the residential property.
Prior housing fraud cases
Data from any related housing fraud cases; specifically:
– starting date of investigation / report
– stage of investigation
– report code number
– violation code number
– investigator code number
– anonymous reporter yes/no
– situation sketch
– user that created the report (including date), or edited the report (including date)
– handling code number (type of case, allocation to team);
– date when case closed;
– reason why case closed.
- Data processingShow MoreShow Less
The operational logic of the automatic data processing and reasoning performed by the system and the models used.
An algorithm has been developed that can find relationships and patterns in a large amount of information about housing fraud. The algorithm calculates which information can be associated with housing fraud and to what degree, and which information cannot. The algorithm does this by performing mathematical calculations according to the probability tree principle. A large number of probability calculations are performed by the algorithm, and an average is then taken. This average is used to generate the mathematical expectation of illegal holiday rental at an address. This expectation of illegal holiday rental at an address is only calculated by the algorithm when a new report is received for suspicion of illegal holiday rental at an address.
This type of algorithm is called a “random forest regression”. To make sure employees understand the consideration that the algorithm is making, the “SHAP” method is used (SHapley Additive exPlanations: https://github.com/slundberg/shap). SHAP calculates, which features in the data have resulted in high or low suspicion of housing fraud. This ensures that an employee can always understand what the algorithm based its risk assessment on, so they can make a well-considered decision.
The advantage of a ‘random forest regression’ is that it is a fairly complex algorithm that can approximate reality quite well. However, there is a risk of overfitting. A “tree” with many layers squeezes the data to provide specific answers. It has been researched how many layers the model needs to remain generic and therefore, not to overfit. In addition, continuous data points are categorized (grouped), so that the model has a clear number of options instead of the infinite number of continuous values. This makes the model better suited to reach a conclusion.
- Non-discriminationShow MoreShow Less
Promotion and realisation of equality in the use of the service.
During the development of the algorithm, the available datasets were critically examined, using a privacy impact assessment. It was decided that only a minimal selection should be used for data processing. Only information that is critical to determine if the Housing Act is violated is included in the dataset on which the algorithm was developed. Information such as place of birth, nationality, marital status, and country of birth is not included in the algorithm. This ensures that there is no prejudice towards groups of people.
The data used for the algorithm comes from previous illegal holiday rental cases. Good-quality data must be used to substantiate an enforcement decision and to make it legally sustainable. It is therefore assumed that the underlying data does not contain such material biases that it is necessary to doubt the reliability of the data and the probability calculation.
However, an algorithm can be so good at finding patterns that excluding sensitive data is not enough. We therefore also investigated whether the non-sensitive data processed by the algorithm indirectly leads to undesirable differences in treatment between cases. For example, it could be that in certain neighborhoods many of the people living there are of a certain nationality; or that certain groups on average have larger families. If the algorithm then uses data such as the postal code or family size, it can still indirectly distinguish between certain groups, simply by distinguishing between neighborhoods or family size. In this case, a group can still be disadvantaged by the algorithm, even if the group is not explicitly known to the algorithm. We have therefore chosen to conduct further research into this form of algorithmic bias during the pilot. For this we use the “AI Fairness 360 toolkit”(https://aif360.mybluemix.net).
- Human oversightShow MoreShow Less
Human oversight during the use of the service.
There is no automated decision-making. An investigation into a suspected illegal holiday rental is always the result of a report. This report is, for instance, submitted by a citizen or rental platform. The algorithm helps the employee of the department of Surveillance & Enforcement to prioritize the most probable cases from the workload so that they can select them for a field investigation. The algorithm facilitates a planner’s specific consideration of starting a field investigation at an address. The employee is provided with a visualization that shows which data features play a key role in the “risk assessment” of the algorithm, and which don’t. With this visualization, they can assess if they should follow the risk assessment of the algorithm or not.
The responsible supervisor and the project enforcer are the ones to determine if there is actually a case of housing fraud. They determine this by conducting preliminary research and field investigations. The case is then discussed intensively in a debriefing with the employees who partake in the decision-making process. The algorithm, therefore, has a significant influence on the planner, but it does not make independent decisions on whether or not illegal holiday rental is determined.
A work instruction has been drawn to prevent employees from having excessive confidence in the algorithm. In addition, the employees undergo training to recognize the opportunities and risks of using algorithms.
- RisksShow MoreShow Less
Risks related to the system and its use and their management methods.
The system naturally has an impact on the alleged offender, as the report on their offence might get more (or less) priority than it would have without the system. There have been several mitigations to make sure that all probability calculations are based on causality, not on correlations. The primary risk mitigation for this algorithm is that its use is in a pilot phase, and its trustworthiness will be evaluated extensively and continuously during that pilot phase.
Was this information useful?